# PMLB Regression Datasets

## Loading regression datasets

First, load a trained agent and get PMLB regression datasets names list. Although there are hundreds of datasets available, let's sample 10% from the list to demonstrate the agents capabilities.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import random
import pmlb
from IPython.display import Markdown
from ostatslib.agents import PPOAgent

SAMPLE_FRACTION = 0.1
sample_size = int(len(pmlb.regression_dataset_names) * SAMPLE_FRACTION)
sampled_dataset_names = random.sample(pmlb.regression_dataset_names, sample_size)

AGENT_FILE = '../trained_ppo_model.zip'
agent = PPOAgent(AGENT_FILE)

Markdown(f'Sampled {sample_size} regression datasets: {", ".join(sampled_dataset_names)}.')

Sampled 12 regression datasets: 622_fri_c2_1000_50, 617_fri_c3_500_5, 207_autoPrice, 626_fri_c2_500_50, 648_fri_c1_250_50, 523_analcatdata_neavote, 1089_USCrime, 618_fri_c3_1000_50, 604_fri_c4_500_10, 344_mv, 537_houses, 645_fri_c3_500_50.

## Analyses

Next step is to fetch data and analyze each selected dataset. PMLB provides a function to fetch data from their repo. It's also required to add to the initial state which variable is the target.

In [3]:
%%capture
from ostatslib.states import State

results = []

for name in sampled_dataset_names:
    data = pmlb.fetch_data(name, local_cache_dir='.pmlb_cache/')
    initial_state = State()
    initial_state.set('response_variable_label', 'target')
    analysis = agent.analyze(data, initial_state)
    results.append({"name": name, "analysis": analysis})

## Results

In [4]:
from IPython.display import display

for result in results:
    display(Markdown(f"### {result['name']}"))
    print(result['analysis'].summary())

### 622_fri_c2_1000_50


Analysis executed at 2023-07-25 19:01:46.844361
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                       Reward  State Change
-------  --------------------------------------  ---------  -------------------------------------------------
      1  Infer Response DType                     0.1       response_inferred_dtype  floating
      2  Is Response Quantitative Check           0.1       is_response_quantitative  1
      3  Is Response Dichotomous Check            0.1       is_response_dichotomous  -1
      4  Get Log Rows Count                       0.1       log_rows_count  0.579598
      5  Is Response Balanced Check               0.1       is_response_balanced  1
      6  Is Response Positive Values Only Check   0.1       is_response_positive_values_only  -1
      7  Support Vector Regression               -0.869918  score                                    0.230082
                            

### 617_fri_c3_500_5


Analysis executed at 2023-07-25 19:01:48.749026
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------------
      1  Infer Response DType                    0.1       response_inferred_dtype  floating
      2  Is Response Quantitative Check          0.1       is_response_quantitative  1
      3  Is Response Dichotomous Check           0.1       is_response_dichotomous  -1
      4  Get Log Rows Count                      0.1       log_rows_count  0.521439
      5  Is Response Balanced Check              0.1       is_response_balanced  1
      6  Is Response Positive Values Only Check  0.1       is_response_positive_values_only  -1
      7  Support Vector Regression               0.828845  score                                   0.928845
                                       

### 207_autoPrice


Analysis executed at 2023-07-25 19:01:48.797859
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  -----------------------------------------------------------
      1  Infer Response DType                         0.1  response_inferred_dtype  floating
      2  Is Response Quantitative Check               0.1  is_response_quantitative  1
      3  Is Response Dichotomous Check                0.1  is_response_dichotomous  -1
      4  Get Log Rows Count                           0.1  log_rows_count  0.425309
      5  Is Response Discrete Check                   0.1  is_response_discrete  1
      6  Get Standarized Variables Ratio              0.1  standarized_variables_ratio  -1
      7  Is Response Positive Values Only Check       0.1  is_response_positive_values_only  1
      8  Poisson Regression                   

### 626_fri_c2_500_50


Analysis executed at 2023-07-25 19:01:49.639149
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                       Reward  State Change
-------  --------------------------------------  ---------  -------------------------------------------------
      1  Infer Response DType                     0.1       response_inferred_dtype  floating
      2  Is Response Quantitative Check           0.1       is_response_quantitative  1
      3  Is Response Dichotomous Check            0.1       is_response_dichotomous  -1
      4  Get Log Rows Count                       0.1       log_rows_count  0.521439
      5  Is Response Balanced Check               0.1       is_response_balanced  1
      6  Is Response Positive Values Only Check   0.1       is_response_positive_values_only  -1
      7  Support Vector Regression               -0.895552  score                                    0.204448
                            

### 648_fri_c1_250_50


Analysis executed at 2023-07-25 19:01:50.227731
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                       Reward  State Change
-------  --------------------------------------  ---------  -------------------------------------------------
      1  Infer Response DType                     0.1       response_inferred_dtype  floating
      2  Is Response Quantitative Check           0.1       is_response_quantitative  1
      3  Is Response Dichotomous Check            0.1       is_response_dichotomous  -1
      4  Get Log Rows Count                       0.1       log_rows_count  0.463281
      5  Is Response Balanced Check               0.1       is_response_balanced  1
      6  Is Response Positive Values Only Check   0.1       is_response_positive_values_only  -1
      7  Support Vector Regression               -0.852892  score                                    0.247108
                            

### 523_analcatdata_neavote


Analysis executed at 2023-07-25 19:01:50.257894
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------------------------------
      1  Infer Response DType                         0.1  response_inferred_dtype  floating
      2  Is Response Quantitative Check               0.1  is_response_quantitative  1
      3  Is Response Dichotomous Check                0.1  is_response_dichotomous  -1
      4  Get Log Rows Count                           0.1  log_rows_count  0.386399
      5  Is Response Discrete Check                   0.1  is_response_discrete  1
      6  Get Standarized Variables Ratio              0.1  standarized_variables_ratio  -1
      7  Is Response Positive Values Only Check       0.1  is_response_positive_values_only  1
      8  Poisson Regression            

### 1089_USCrime


Analysis executed at 2023-07-25 19:01:53.055650
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  -----------------------------------
      1  Infer Response DType                         0.1  response_inferred_dtype  floating
      2  Is Response Quantitative Check               0.1  is_response_quantitative  1
      3  Is Response Dichotomous Check                0.1  is_response_dichotomous  -1
      4  Get Log Rows Count                           0.1  log_rows_count  0.323048
      5  Is Response Discrete Check                   0.1  is_response_discrete  1
      6  Get Standarized Variables Ratio              0.1  standarized_variables_ratio  -1
      7  Is Response Positive Values Only Check       0.1  is_response_positive_values_only  1
      8  Get Log Columns Count                        0.1  log_columns

### 618_fri_c3_1000_50


Analysis executed at 2023-07-25 19:01:54.938633
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                       Reward  State Change
-------  --------------------------------------  ---------  -------------------------------------------------
      1  Infer Response DType                     0.1       response_inferred_dtype  floating
      2  Is Response Quantitative Check           0.1       is_response_quantitative  1
      3  Is Response Dichotomous Check            0.1       is_response_dichotomous  -1
      4  Get Log Rows Count                       0.1       log_rows_count  0.579598
      5  Is Response Balanced Check               0.1       is_response_balanced  1
      6  Is Response Positive Values Only Check   0.1       is_response_positive_values_only  -1
      7  Support Vector Regression               -0.803216  score                                    0.296784
                            

### 604_fri_c4_500_10


Analysis executed at 2023-07-25 19:01:56.041802
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------------
      1  Infer Response DType                    0.1       response_inferred_dtype  floating
      2  Is Response Quantitative Check          0.1       is_response_quantitative  1
      3  Is Response Dichotomous Check           0.1       is_response_dichotomous  -1
      4  Get Log Rows Count                      0.1       log_rows_count  0.521439
      5  Is Response Balanced Check              0.1       is_response_balanced  1
      6  Is Response Positive Values Only Check  0.1       is_response_positive_values_only  -1
      7  Support Vector Regression               0.636773  score                                   0.736773
                                       

### 344_mv


Analysis executed at 2023-07-25 19:01:56.762477
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------
      1  Infer Response DType                         0.1  response_inferred_dtype  floating
      2  Is Response Quantitative Check               0.1  is_response_quantitative  1
      3  Is Response Dichotomous Check                0.1  is_response_dichotomous  -1
      4  Get Log Rows Count                           0.1  log_rows_count  0.890711
      5  Is Response Balanced Check                   0.1  is_response_balanced  -1
      6  Is Response Discrete Check                   0.1  is_response_discrete  -1
      7  Is Response Positive Values Only Check       0.1  is_response_positive_values_only  -1
      8  Support Vector Regression                   -1
      9  Support V

### 537_houses


Analysis executed at 2023-07-25 19:01:56.877034
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  -----------------------------------
      1  Infer Response DType                         0.1  response_inferred_dtype  floating
      2  Is Response Quantitative Check               0.1  is_response_quantitative  1
      3  Is Response Dichotomous Check                0.1  is_response_dichotomous  -1
      4  Get Log Rows Count                           0.1  log_rows_count  0.833599
      5  Is Response Balanced Check                   0.1  is_response_balanced  -1
      6  Is Response Discrete Check                   0.1  is_response_discrete  1
      7  Is Response Positive Values Only Check       0.1  is_response_positive_values_only  1
      8  Linear Support Vector Regression            -1
      9  Linear Suppo

### 645_fri_c3_500_50


Analysis executed at 2023-07-25 19:01:57.963662
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                       Reward  State Change
-------  --------------------------------------  ---------  -------------------------------------------------
      1  Infer Response DType                     0.1       response_inferred_dtype  floating
      2  Is Response Quantitative Check           0.1       is_response_quantitative  1
      3  Is Response Dichotomous Check            0.1       is_response_dichotomous  -1
      4  Get Log Rows Count                       0.1       log_rows_count  0.521439
      5  Is Response Balanced Check               0.1       is_response_balanced  1
      6  Is Response Positive Values Only Check   0.1       is_response_positive_values_only  -1
      7  Support Vector Regression               -0.875858  score                                    0.224142
                            