# PMLB Classification Datasets

## Loading classification datasets

First, load a trained agent and get PMLB classification datasets names list. Although there are hundreds of datasets available, let's sample 10% from the list to demonstrate the agents capabilities.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import random
import pmlb
from IPython.display import Markdown
from ostatslib.agents import PPOAgent

SAMPLE_FRACTION = 0.1
sample_size = int(len(pmlb.classification_dataset_names) * SAMPLE_FRACTION)
sampled_dataset_names = random.sample(pmlb.classification_dataset_names, sample_size)

AGENT_FILE = '../trained_ppo_model.zip'
agent = PPOAgent(AGENT_FILE)

Markdown(f'Sampled {sample_size} classification datasets: {", ".join(sampled_dataset_names)}.')

Sampled 16 classification datasets: xd6, car, adult, soybean, ring, labor, hayes_roth, spectf, led7, analcatdata_happiness, german, breast_cancer, dermatology, hepatitis, heart_c, GAMETES_Epistasis_2_Way_20atts_0.1H_EDM_1_1.

## Analyses

Next step is to fetch data and analyze each selected dataset. PMLB provides a function to fetch data from their repo. It's also required to add to the initial state which variable is the target.

In [7]:
%%capture
from ostatslib.states import State

results = []

for name in sampled_dataset_names:
    data = pmlb.fetch_data(name, local_cache_dir='.pmlb_cache/')
    initial_state = State()
    initial_state.set('response_variable_label', 'target')
    analysis = agent.analyze(data, initial_state)
    results.append({"name": name, "analysis": analysis})

## Results

In [4]:
from IPython.display import display

for result in results:
    display(Markdown(f"### {result['name']}"))
    print(result['analysis'].summary())


### xd6


Analysis executed at 2023-10-15 18:22:26.205159
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.0020554984583761563
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ---------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.0020555
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
  

### car


Analysis executed at 2023-10-15 18:22:26.278753
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.0023148148148148147
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ----------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.00231481
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1


### adult


Analysis executed at 2023-10-15 18:22:26.405806
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      4.0948364112853693e-05
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  -----------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  4.09484e-05
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          

### soybean


Analysis executed at 2023-10-15 18:22:26.459962
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.02666666666666667
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ---------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.0266667
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
    

### ring


Analysis executed at 2023-10-15 18:22:26.528154
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.0002702702702702703
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ----------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.00027027
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1


### labor


Analysis executed at 2023-10-15 18:22:26.604123
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.03508771929824561
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ---------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.0350877
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
    

### hayes_roth


Analysis executed at 2023-10-15 18:22:26.656492
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.01875
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  -------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.01875
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
      8  Response Un

### spectf


Analysis executed at 2023-10-15 18:22:26.702374
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.0057306590257879654
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ----------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.00573066
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1


### led7


Analysis executed at 2023-10-15 18:22:26.806781
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.003125
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  --------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.003125
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
      8  Response

### analcatdata_happiness


Analysis executed at 2023-10-15 18:22:26.980224
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.05
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ----------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.05
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
      8  Response Unique Valu

### german


Analysis executed at 2023-10-15 18:22:27.030585
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.002
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  -----------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.002
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
      8  Response Unique V

### breast_cancer


Analysis executed at 2023-10-15 18:22:27.083031
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.006993006993006993
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ----------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.00699301
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
 

### dermatology


Analysis executed at 2023-10-15 18:22:27.132819
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.01639344262295082
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ---------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.0163934
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
    

### hepatitis


Analysis executed at 2023-10-15 18:22:29.101959
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.012903225806451613
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ---------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.0129032
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
   

### heart_c


Analysis executed at 2023-10-15 18:22:29.211604
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.006600660066006601
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  ----------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.00660066
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
 

### GAMETES_Epistasis_2_Way_20atts_0.1H_EDM_1_1


Analysis executed at 2023-10-15 18:22:29.322284
Final status is Not Complete
Initial State known features:
response_variable_label           target
time_convertible_variable
response_unique_values_ratio      0.00125
response_inferred_dtype           integer
is_response_discrete              1
is_response_positive_values_only  1
Steps:
  Order  Step                                Reward  State Change
-------  --------------------------------  --------  -------------------------------------
      1  Is Response Positive Values Only       0.1
      2  Time Convertible Variable Search       0.1  time_convertible_variable
      3  Infer Response DType                   0.1  response_inferred_dtype  integer
      4  Is Response Discrete                   0.1  is_response_discrete  1
      5  Response Unique Values Ratio           0.1  response_unique_values_ratio  0.00125
      6  Response Unique Values Ratio          -1
      7  Response Unique Values Ratio          -1
      8  Response Un