# PMLB Regression Datasets

## Loading regression datasets

First, load a trained agent and get PMLB regression datasets names list. Although there are hundreds of datasets available, let's sample 10% from the list to demonstrate the agents capabilities.

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [5]:
import random
import pmlb
from IPython.display import Markdown
from ostatslib.agents import PPOAgent

SAMPLE_FRACTION = 0.1
sample_size = int(len(pmlb.regression_dataset_names) * SAMPLE_FRACTION)
sampled_dataset_names = random.sample(pmlb.regression_dataset_names, sample_size)

AGENT_FILE = '../trained_ppo_model.zip'
agent = PPOAgent(AGENT_FILE)

Markdown(f'Sampled {sample_size} regression datasets: {", ".join(sampled_dataset_names)}.')

Sampled 12 regression datasets: 649_fri_c0_500_5, 656_fri_c1_100_5, 623_fri_c4_1000_10, 584_fri_c4_500_25, 615_fri_c4_250_10, 654_fri_c0_500_10, 556_analcatdata_apnea2, 665_sleuth_case2002, 201_pol, 1199_BNG_echoMonths, 608_fri_c3_1000_10, 1096_FacultySalaries.

## Analyses

Next step is to fetch data and analyze each selected dataset. PMLB provides a function to fetch data from their repo. It's also required to add to the initial state which variable is the target.

In [9]:
%%capture
from ostatslib.states import State

results = []

for name in sampled_dataset_names:
    data = pmlb.fetch_data(name, local_cache_dir='.pmlb_cache/')
    initial_state = State()
    initial_state.set('response_variable_label', 'target')
    analysis = agent.analyze(data, initial_state)
    results.append({"name": name, "analysis": analysis})

ERROR:tornado.general:SEND Error: Host unreachable


## Results

In [8]:
from IPython.display import display

for result in results:
    display(Markdown(f"### {result['name']}"))
    print(result['analysis'].summary())

### 649_fri_c0_500_5


Analysis executed at 2023-10-15 18:18:27.510980
Final status is Complete
Initial State known features:
response_variable_label                       target
score                                         0.7852042156037796
response_unique_values_ratio                  1.0
is_response_positive_values_only              -1
adaboost_square_loss_regression_score_reward  0.7852042156037796
Steps:
  Order  Step                                    Reward  State Change
-------  ------------------------------------  --------  ------------------------------------------------------
      1  Is Response Positive Values Only      0.1
      2  Response Unique Values Ratio          0.1       response_unique_values_ratio  1
      3  AdaBoost Regression with Square Loss  0.785204  score                                         0.785204
                                                         adaboost_square_loss_regression_score_reward  0.785204


### 656_fri_c1_100_5


Analysis executed at 2023-10-15 18:18:28.032696
Final status is Complete
Initial State known features:
response_variable_label                       target
score                                         0.7549163944562183
response_unique_values_ratio                  1.0
is_response_positive_values_only              -1
adaboost_square_loss_regression_score_reward  0.7549163944562183
Steps:
  Order  Step                                    Reward  State Change
-------  ------------------------------------  --------  ------------------------------------------------------
      1  Is Response Positive Values Only      0.1
      2  Response Unique Values Ratio          0.1       response_unique_values_ratio  1
      3  AdaBoost Regression with Square Loss  0.754916  score                                         0.754916
                                                         adaboost_square_loss_regression_score_reward  0.754916


### 623_fri_c4_1000_10


Analysis executed at 2023-10-15 18:18:28.972196
Final status is Complete
Initial State known features:
response_variable_label                       target
score                                         0.789200856110073
response_unique_values_ratio                  1.0
is_response_positive_values_only              -1
adaboost_square_loss_regression_score_reward  0.789200856110073
Steps:
  Order  Step                                    Reward  State Change
-------  ------------------------------------  --------  ------------------------------------------------------
      1  Is Response Positive Values Only      0.1
      2  Response Unique Values Ratio          0.1       response_unique_values_ratio  1
      3  AdaBoost Regression with Square Loss  0.789201  score                                         0.789201
                                                         adaboost_square_loss_regression_score_reward  0.789201


### 584_fri_c4_500_25


Analysis executed at 2023-10-15 18:18:31.657107
Final status is Complete
Initial State known features:
response_variable_label                       target
score                                         0.7779126179981797
response_unique_values_ratio                  1.0
is_response_positive_values_only              -1
adaboost_square_loss_regression_score_reward  0.7779126179981797
Steps:
  Order  Step                                    Reward  State Change
-------  ------------------------------------  --------  ------------------------------------------------------
      1  Is Response Positive Values Only      0.1
      2  Response Unique Values Ratio          0.1       response_unique_values_ratio  1
      3  AdaBoost Regression with Square Loss  0.777913  score                                         0.777913
                                                         adaboost_square_loss_regression_score_reward  0.777913


### 615_fri_c4_250_10


Analysis executed at 2023-10-15 18:18:33.793695
Final status is Complete
Initial State known features:
response_variable_label                       target
score                                         0.704890474685613
response_unique_values_ratio                  1.0
is_response_positive_values_only              -1
adaboost_square_loss_regression_score_reward  0.704890474685613
Steps:
  Order  Step                                     Reward  State Change
-------  ------------------------------------  ---------  -------------------------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Response Unique Values Ratio           0.1       response_unique_values_ratio  1
      3  AdaBoost Regression with Square Loss  -0.315074  score                                          0.684926
                                                          adaboost_square_loss_regression_score_reward  -0.315074
      4  AdaBoost Regression with Square Loss  -0.3233

### 654_fri_c0_500_10


Analysis executed at 2023-10-15 18:18:34.692126
Final status is Complete
Initial State known features:
response_variable_label                       target
score                                         0.7751243054421953
response_unique_values_ratio                  1.0
is_response_positive_values_only              -1
adaboost_square_loss_regression_score_reward  0.7751243054421953
Steps:
  Order  Step                                    Reward  State Change
-------  ------------------------------------  --------  ------------------------------------------------------
      1  Is Response Positive Values Only      0.1
      2  Response Unique Values Ratio          0.1       response_unique_values_ratio  1
      3  AdaBoost Regression with Square Loss  0.775124  score                                         0.775124
                                                         adaboost_square_loss_regression_score_reward  0.775124


### 556_analcatdata_apnea2


Analysis executed at 2023-10-15 18:18:34.891779
Final status is Complete
Initial State known features:
response_variable_label                                     target
score                                                       0.711739995276534
time_convertible_variable
response_unique_values_ratio                                0.37473684210526315
response_inferred_dtype                                     floating
is_response_positive_values_only                            1
standardized_variables_ratio                                -1
n_100_estimators_gradient_boosting_regression_score_reward  0.711739995276534
Steps:
  Order  Step                                           Reward  State Change
-------  -------------------------------------------  --------  -------------------------------------------------------------------
      1  Is Response Positive Values Only              0.1
      2  Time Convertible Variable Search              0.1      time_convertible_variable
      3 

### 665_sleuth_case2002


Analysis executed at 2023-10-15 18:18:35.090745
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.1292517006802721
response_inferred_dtype           floating
is_response_positive_values_only  1
standardized_variables_ratio      -1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  --------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  floating
      4  Standardized Variables Ratio           0.1  standardized_variables_ratio  -1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.129252
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio        

### 201_pol


Analysis executed at 2023-10-15 18:18:35.386553
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.0007333333333333333
response_inferred_dtype           floating
is_response_positive_values_only  1
standardized_variables_ratio      -1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  -----------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  floating
      4  Standardized Variables Ratio           0.1  standardized_variables_ratio  -1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.000733333
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Rati

### 1199_BNG_echoMonths


Analysis executed at 2023-10-15 18:19:50.823089
Final status is Not Complete
Initial State known features:
response_variable_label                       target
score                                         0.2633249659679566
response_unique_values_ratio                  0.9995999085505258
is_response_positive_values_only              -1
adaboost_square_loss_regression_score_reward  -0.7366750340320434
Steps:
  Order  Step                                     Reward  State Change
-------  ------------------------------------  ---------  -------------------------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Response Unique Values Ratio           0.1       response_unique_values_ratio  0.9996
      3  AdaBoost Regression with Square Loss  -0.738311  score                                          0.261689
                                                          adaboost_square_loss_regression_score_reward  -0.738311
      4  AdaBoost Regressio

### 608_fri_c3_1000_10


Analysis executed at 2023-10-15 18:19:51.748633
Final status is Complete
Initial State known features:
response_variable_label                       target
score                                         0.787700374415138
response_unique_values_ratio                  1.0
is_response_positive_values_only              -1
adaboost_square_loss_regression_score_reward  0.787700374415138
Steps:
  Order  Step                                    Reward  State Change
-------  ------------------------------------  --------  ----------------------------------------------------
      1  Is Response Positive Values Only        0.1
      2  Response Unique Values Ratio            0.1     response_unique_values_ratio  1
      3  AdaBoost Regression with Square Loss    0.7877  score                                         0.7877
                                                         adaboost_square_loss_regression_score_reward  0.7877


### 1096_FacultySalaries

ValueError: Cannot write State delta, step 6 State is None