# PMLB Classification Datasets

## Loading classification datasets

First, load a trained agent and get PMLB classification datasets names list. Although there are hundreds of datasets available, let's sample 10% from the list to demonstrate the agents capabilities.

In [6]:
import warnings
warnings.filterwarnings('ignore')

In [7]:
import random
import pmlb
from IPython.display import Markdown
from ostatslib.agents import PPOAgent

SAMPLE_FRACTION = 0.1
sample_size = int(len(pmlb.classification_dataset_names) * SAMPLE_FRACTION)
sampled_dataset_names = random.sample(pmlb.classification_dataset_names, sample_size)

AGENT_FILE = '../trained_ppo_model.zip'
agent = PPOAgent(AGENT_FILE)

Markdown(f'Sampled {sample_size} classification datasets: {", ".join(sampled_dataset_names)}.')

Sampled 16 classification datasets: fars, ann_thyroid, page_blocks, analcatdata_happiness, xd6, monk1, pima, mfeat_fourier, clean2, GAMETES_Epistasis_2_Way_20atts_0.1H_EDM_1_1, breast_cancer_wisconsin, sleep, analcatdata_boxing1, monk2, car_evaluation, house_votes_84.

## Analyses

Next step is to fetch data and analyze each selected dataset. PMLB provides a function to fetch data from their repo. It's also required to add to the initial state which variable is the target.

In [8]:
%%capture
from ostatslib.states import State

results = []

for name in sampled_dataset_names:
    data = pmlb.fetch_data(name, local_cache_dir='.pmlb_cache/')
    initial_state = State()
    initial_state.set('response_variable_label', 'target')
    analysis = agent.analyze(data, initial_state)
    results.append({"name": name, "analysis": analysis})

## Results

In [9]:
from IPython.display import display

for result in results:
    display(Markdown(f"### {result['name']}"))
    print(result['analysis'].summary())


### fars


Analysis executed at 2023-07-25 19:00:27.546841
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                          Reward  State Change
-------  --------------------------  --------  ------------------------------------
      1  Infer Response DType        0.1       response_inferred_dtype  integer
      2  Get Log Rows Count          0.1       log_rows_count  0.966805
      3  Is Response Discrete Check  0.1       is_response_discrete  1
      4  Decision Tree               0.771191  score                       0.696191
                                               decision_tree_score_reward  0.696191


### ann_thyroid


Analysis executed at 2023-07-25 19:00:28.407073
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                          Reward  State Change
-------  --------------------------  --------  -----------------------------------
      1  Infer Response DType             0.1  response_inferred_dtype  integer
      2  Get Log Rows Count               0.1  log_rows_count  0.745234
      3  Is Response Discrete Check       0.1  is_response_discrete  1
      4  Decision Tree                    1    score                       0.99625
                                               decision_tree_score_reward  0.99625


### page_blocks


Analysis executed at 2023-07-25 19:00:29.576404
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                          Reward  State Change
-------  --------------------------  --------  ------------------------------------
      1  Infer Response DType             0.1  response_inferred_dtype  integer
      2  Get Log Rows Count               0.1  log_rows_count  0.722223
      3  Is Response Discrete Check       0.1  is_response_discrete  1
      4  Decision Tree                    1    score                       0.952263
                                               decision_tree_score_reward  0.952263


### analcatdata_happiness


Analysis executed at 2023-07-25 19:00:30.053794
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ----------------------------------------------------
      1  Infer Response DType                    0.1       response_inferred_dtype  integer
      2  Get Log Rows Count                      0.1       log_rows_count  0.343538
      3  Is Response Balanced Check              0.1       is_response_balanced  1
      4  Is Response Discrete Check              0.1       is_response_discrete  1
      5  Is Response Dichotomous Check           0.1       is_response_dichotomous  -1
      6  Is Response Positive Values Only Check  0.1       is_response_positive_values_only  1
      7  Support Vector Classification           0.533333  score                                       0.633333
                                     

### xd6


Analysis executed at 2023-07-25 19:00:30.224590
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------
      1  Infer Response DType                    0.1       response_inferred_dtype  integer
      2  Get Log Rows Count                      0.1       log_rows_count  0.577302
      3  Is Response Discrete Check              0.1       is_response_discrete  1
      4  Is Response Positive Values Only Check  0.1       is_response_positive_values_only  1
      5  Is Response Dichotomous Check           0.1       is_response_dichotomous  1
      6  Logistic Regression                     0.913977  score                             0.813977
                                                           logistic_regression_score_reward  0.813977


### monk1


Analysis executed at 2023-07-25 19:00:30.339794
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------
      1  Infer Response DType                    0.1       response_inferred_dtype  integer
      2  Get Log Rows Count                      0.1       log_rows_count  0.530347
      3  Is Response Discrete Check              0.1       is_response_discrete  1
      4  Is Response Positive Values Only Check  0.1       is_response_positive_values_only  1
      5  Is Response Dichotomous Check           0.1       is_response_dichotomous  1
      6  Logistic Regression                     0.765468  score                             0.665468
                                                           logistic_regression_score_reward  0.665468


### pima


Analysis executed at 2023-07-25 19:00:30.686781
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------
      1  Infer Response DType                    0.1       response_inferred_dtype  integer
      2  Get Log Rows Count                      0.1       log_rows_count  0.55745
      3  Is Response Discrete Check              0.1       is_response_discrete  1
      4  Is Response Positive Values Only Check  0.1       is_response_positive_values_only  1
      5  Is Response Dichotomous Check           0.1       is_response_dichotomous  1
      6  Logistic Regression                     0.883854  score                             0.783854
                                                           logistic_regression_score_reward  0.783854


### mfeat_fourier


Analysis executed at 2023-07-25 19:00:32.042367
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------------------------------
      1  Infer Response DType                    0.1       response_inferred_dtype  integer
      2  Get Log Rows Count                      0.1       log_rows_count  0.637757
      3  Is Response Discrete Check              0.1       is_response_discrete  1
      4  Is Response Positive Values Only Check  0.1       is_response_positive_values_only  1
      5  Is Response Dichotomous Check           0.1       is_response_dichotomous  -1
      6  Poisson Regression                      0.837421  score                                                     0.737421
                                                           does_poisson_regression_raises_pe

### clean2


Analysis executed at 2023-07-25 19:00:34.459835
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                          Reward  State Change
-------  --------------------------  --------  ------------------------------------
      1  Infer Response DType        0.1       response_inferred_dtype  integer
      2  Get Log Rows Count          0.1       log_rows_count  0.737908
      3  Is Response Discrete Check  0.1       is_response_discrete  1
      4  Decision Tree               0.956172  score                       0.881172
                                               decision_tree_score_reward  0.881172


### GAMETES_Epistasis_2_Way_20atts_0.1H_EDM_1_1


Analysis executed at 2023-07-25 19:00:42.263926
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                       Reward  State Change
-------  --------------------------------------  ---------  -----------------------------------------------------
      1  Infer Response DType                     0.1       response_inferred_dtype  integer
      2  Get Log Rows Count                       0.1       log_rows_count  0.619034
      3  Is Response Discrete Check               0.1       is_response_discrete  1
      4  Is Response Positive Values Only Check   0.1       is_response_positive_values_only  1
      5  Is Response Dichotomous Check            0.1       is_response_dichotomous  1
      6  Logistic Regression                     -0.350625  score                              0.549375
                                                            logistic_regression_score_reward  -0.450625
      7  Support 

### breast_cancer_wisconsin


Analysis executed at 2023-07-25 19:00:45.121895
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------
      1  Infer Response DType                         0.1  response_inferred_dtype  integer
      2  Get Log Rows Count                           0.1  log_rows_count  0.532286
      3  Is Response Discrete Check                   0.1  is_response_discrete  1
      4  Is Response Positive Values Only Check       0.1  is_response_positive_values_only  1
      5  Is Response Dichotomous Check                0.1  is_response_dichotomous  1
      6  Logistic Regression                          1    score                             0.970123
                                                           logistic_regression_score_reward  0.970123


### sleep


Analysis executed at 2023-07-25 19:00:59.614236
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                          Reward  State Change
-------  --------------------------  --------  ------------------------------------
      1  Infer Response DType        0.1       response_inferred_dtype  integer
      2  Get Log Rows Count          0.1       log_rows_count  0.970813
      3  Is Response Discrete Check  0.1       is_response_discrete  1
      4  Decision Tree               0.743952  score                       0.668952
                                               decision_tree_score_reward  0.668952


### analcatdata_boxing1


Analysis executed at 2023-07-25 19:00:59.762954
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                             Reward  State Change
-------  -----------------------------  --------  --------------------------------------
      1  Infer Response DType               0.1   response_inferred_dtype  integer
      2  Get Log Rows Count                 0.1   log_rows_count  0.401697
      3  Is Response Balanced Check         0.1   is_response_balanced  0.5
      4  Is Response Discrete Check         0.1   is_response_discrete  1
      5  Is Response Dichotomous Check      0.1   is_response_dichotomous  1
      6  Logistic Regression                0.75  score                             0.65
                                                  logistic_regression_score_reward  0.65


### monk2


Analysis executed at 2023-07-25 19:00:59.922892
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------------
      1  Infer Response DType                    0.1       response_inferred_dtype  integer
      2  Get Log Rows Count                      0.1       log_rows_count  0.536877
      3  Is Response Discrete Check              0.1       is_response_discrete  1
      4  Is Response Positive Values Only Check  0.1       is_response_positive_values_only  1
      5  Is Response Dichotomous Check           0.1       is_response_dichotomous  1
      6  Logistic Regression                     0.757238  score                             0.657238
                                                           logistic_regression_score_reward  0.657238


### car_evaluation


Analysis executed at 2023-07-25 19:01:01.805494
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                                      Reward  State Change
-------  --------------------------------------  --------  ------------------------------------
      1  Infer Response DType                         0.1  response_inferred_dtype  integer
      2  Get Log Rows Count                           0.1  log_rows_count  0.625491
      3  Is Response Discrete Check                   0.1  is_response_discrete  1
      4  Is Response Positive Values Only Check       0.1  is_response_positive_values_only  1
      5  Is Response Dichotomous Check                0.1  is_response_dichotomous  -1
      6  Poisson Regression                          -0.9  poisson_regression_score_reward  -1
      7  Get Log Columns Count                        0.1  log_columns_count  0.447474
      8  Get Standarized Variables Ratio              0.1  standar

### house_votes_84


Analysis executed at 2023-07-25 19:01:04.887150
Final status is Complete
Initial State known features:
response_variable_label  target
Steps:
  Order  Step                             Reward  State Change
-------  -----------------------------  --------  -----------------------------------------
      1  Infer Response DType                0.1  response_inferred_dtype  integer
      2  Get Log Rows Count                  0.1  log_rows_count  0.509754
      3  Is Response Balanced Check          0.1  is_response_balanced  0.5
      4  Is Response Discrete Check          0.1  is_response_discrete  1
      5  Is Response Dichotomous Check       0.1  is_response_dichotomous  1
      6  Logistic Regression                 1    score                             0.96092
                                                  logistic_regression_score_reward  0.96092
