**Goals**

1. Set up a pipeline to incorporate the imputation
2. Do a random forest regressor to identify important features
3. Do a test run with one model (linear, most likely) that computes:
    - MSE for predicting PCIAT-Total
    - MSE for predicting sii when computed from predicted PCIAT-Total
    - MSE for predicting sii directly
    - kappa for predicting sii when computed from predicted PCIAT-Total
    - kappa for predicting sii directly
4. After getting the model working, measure these things for out-of-the box:
    - multiple linear regression
    - knn regression
    - random forest
    - support vector
    - gradient boost
    - adaboost
    - xgboost
5. After identifying a promising out-of-the-box model, try tuning it
6. Try implementing a sequential predictor (either logistic regression or random forest) that:
    - Starts by predicting 3's vs. non-threes
    - Predicts 2's vs. non-twos from the remaining cases
    - etc.
7. Try using different models for doing this sequential prediction

In [2]:
import pandas as pd
import numpy as np

from CustomImputers import *

**Loading the Data**

For the purpose of developing our model(s), we'll work with data that include the imputed outcome (PCIAT_Total and/or sii) scores AND have cleaned predictors.

In the final version of our code, we'll work with data with cleaned predictors but won't have any access to the outcome scores.

In [3]:
#Load the cleaned & outcome-imputed data
train_cleaned=pd.read_csv('train_cleaned_outcome_imputed.csv')

In [4]:
#Create an initial list of predictor and outcome columns

predictors = train_cleaned.columns.tolist()
if 'id' in predictors:
    predictors.remove('id')
if 'sii' in predictors:
    predictors.remove('sii')
predictors = [x for x in predictors if 'PCIAT' not in x]
predictors = [x for x in predictors if 'Season' not in x]

**Constructing a Random Forest for Feature Identification**

In [37]:
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import FunctionTransformer


pipe_mice = Pipeline([('mice_impute', Custom_MICE_Imputer()),
                    ('add_zones', FunctionTransformer(zone_encoder)),
                    ('rf', RandomForestRegressor(n_estimators = 300, max_features = 'sqrt', max_depth = 5, random_state = 216))])

pipe_mice.fit(train_cleaned[predictors],train_cleaned['PCIAT-PCIAT_Total'])

train_pred_mice = pipe_mice.predict(train_cleaned[predictors])

#Get feature importance from the rf inside pipe
score_mice_df = pd.DataFrame({'feature':train_cleaned[predictors].columns,
                            'importance_score': pipe_mice.named_steps['rf'].feature_importances_})

score_mice_df.sort_values('importance_score',ascending=False)


Unnamed: 0,feature,importance_score
0,Basic_Demos-Age,0.145
4,Physical-Height,0.133568
24,PreInt_EduHx-computerinternet_hoursday,0.090154
18,BIA-BIA_FFM,0.078891
23,SDS-SDS_Total_Raw,0.07749
26,ENMO_Avg_Active_Days_MVPA110,0.076964
5,Physical-Weight,0.074631
11,FGC-FGC_CU,0.037543
19,BIA-BIA_FFMI,0.035215
21,BIA-BIA_Fat,0.032714


In [5]:
keyfeatures = ['Basic_Demos-Age',
 'Physical-Height',
 'PreInt_EduHx-computerinternet_hoursday',
 'BIA-BIA_FFM',
 'SDS-SDS_Total_Raw',
 'Physical-Weight',
 'ENMO_Avg_Active_Days_MVPA110',
 'FGC-FGC_CU']

**Trying a Linear Model**

In this section, I'll make a linear model with a single predictor (hours spent on the internet)

Note: Column selector documented here: https://stackoverflow.com/questions/62416223/how-to-select-only-few-columns-in-scikit-learn-column-selector-pipeline

Note: custom loss functions for linear models are documented here: https://alexmiller.phd/posts/linear-model-custom-loss-function-regularization-python/

In [7]:
# First I'll see if I can get a pipe set up to do prediction on a split
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import FunctionTransformer


train_tt, train_ho = train_test_split(train_cleaned, test_size=0.2)

slr = Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('selector', ColumnTransformer([('selector', 'passthrough', ['PreInt_EduHx-computerinternet_hoursday'])], remainder="drop")),
                ('linear', LinearRegression())])

slr.fit(train_tt[predictors], train_tt['PCIAT-PCIAT_Total'])
mean_squared_error(train_ho['PCIAT-PCIAT_Total'], slr.predict(train_ho))

np.float64(347.8401450953909)

**An Ordinal (Sequential Binary) Classifier**

It looks like our attempts so far have under-predicted sii values of 2 and 3. 

We'll create a class that first predicts whether or not the sii value is 0, then continues upward...
This isn't quite the same as creating four separate binary predictors for 0, 1, 2, and 3 outcomes. We'll need to think about the code more to really know what it's doing.

I came up with this idea myself, but I wasn't the first one to do it. It was described on Medium: https://towardsdatascience.com/simple-trick-to-train-an-ordinal-regression-with-any-classifier-6911183d2a3c from an article by Frank and Hal

Also described on stackoverflow: https://stackoverflow.com/questions/57561189/multi-class-multi-label-ordinal-classification-with-sklearn

Some discussion of the proposed code that highlights some of its issues is on stackoverflow: https://stackoverflow.com/questions/66486947/how-to-use-ordinal-classifier

In [8]:
from sklearn.base import clone
from sklearn.metrics import accuracy_score

class OrdinalClassifier():

    def __init__(self, clf):
        self.clf = clf
        self.clfs = {}

    def fit(self, X, y):
        self.unique_class = np.sort(np.unique(y))
        if self.unique_class.shape[0] > 2:
            for i in range(self.unique_class.shape[0] - 1):
                # for each k - 1 ordinal value we fit a binary classification problem
                binary_y = (y > self.unique_class[i]).astype(np.uint8)
                clf = clone(self.clf)
                clf.fit(X, binary_y)
                #print('binary_y has been fit for', self.unique_class[i])
                self.clfs[i] = clf

    def predict_proba(self, X):
        clfs_predict = {k: v.predict_proba(X) for k, v in self.clfs.items()}
        predicted = []
        for i, y in enumerate(self.unique_class):
            #print('encoding for i=', i)
            if i == 0:
                # V1 = 1 - Pr(y > V1)
                predicted.append(1 - clfs_predict[i][:, 1])
            elif y in clfs_predict:
                # Vi = Pr(y > Vi-1) - Pr(y > Vi)
                predicted.append(clfs_predict[i - 1][:, 1] - clfs_predict[i][:, 1])
            else:
                # Vk = Pr(y > Vk-1)
                predicted.append(clfs_predict[i - 1][:, 1])
        return np.vstack(predicted).T

    def predict(self, X):
        return self.unique_class[np.argmax(self.predict_proba(X), axis=1)]
    
    def score(self, X, y, sample_weight=None):
        return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

**Comparing Models for Out-of-the-Box Performance**

**Part 1: Setting Up Models**

In the sections below, we'll set up a collection of un-tuned models and use them to predict sii scores. 

This first section instantiates the various models inside a dictionary that we'll use in the loop.

In [9]:
from sklearn.preprocessing import StandardScaler, FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier
from sklearn.svm import SVR, SVC
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor, GradientBoostingRegressor, RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from xgboost import XGBRegressor, XGBClassifier

# Create classifiers to use as inputs to ordinal classifiers

# Note that the logistic regression is failing to converge.
# This can be addressed - see https://stackoverflow.com/questions/62658215/convergencewarning-lbfgs-failed-to-converge-status-1-stop-total-no-of-iter
# Here we're increasing max_iter from its default and also adding a standard scaler into its pipeline
# We could also adjust the solver, ad described here: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
logisticc = LogisticRegression(max_iter=1000)

knnc=KNeighborsClassifier(10)
svc = SVC()
rfc = RandomForestClassifier()
# Note that the default for adaboost is the SAMME.R algorithm, but this will be deprecated in future releases. Switching to SAMME
adac = AdaBoostClassifier(algorithm='SAMME')
gradc = GradientBoostingClassifier()
xgbc = XGBClassifier()


# List the various models we'll try to identify their "out of the box" performance
models = {
'slr_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('selector', ColumnTransformer([('selector', 'passthrough', ['PreInt_EduHx-computerinternet_hoursday'])], remainder="drop")),
                ('linear', LinearRegression())]),

'mlr_key_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('selector', ColumnTransformer([('selector', 'passthrough', keyfeatures)], remainder="drop")),
                ('linear', LinearRegression())]),

'mlr_all_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('linear', LinearRegression())]),

'knn_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('knn', KNeighborsRegressor(10))]),

'svr_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('rf', SVR())]),

'rf_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('rf', RandomForestRegressor())]),

'ada_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('ada', AdaBoostRegressor())]),

'grad_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('grad', GradientBoostingRegressor())]),

'xgb_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('xgb', XGBRegressor())]),

'ordinal_logistic_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('scale', StandardScaler()),
                ('logistic_oc', OrdinalClassifier(logisticc))]),

'ordinal_knn_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('knn_ordinal', OrdinalClassifier(knnc))]),

# Note that SVC doesn't have a predict_proba method. This can be manually added later in a custom classifier; removing for now to facilitate completion...
#'ordinal_svc_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
#                ('add_zones', FunctionTransformer(zone_encoder)),
#                ('svc_ordinal', OrdinalClassifier(svc))]),

'ordinal_rf_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('rf_ordinal', OrdinalClassifier(rfc))]),

'ordinal_ada_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('ada_ordinal', OrdinalClassifier(adac))]),

'ordinal_grad_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('grad_ordinal', OrdinalClassifier(gradc))]),

'ordinal_xgb_pipe' : Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('xgb_ordinal', OrdinalClassifier(xgbc))]),
}



**Comparing Models for Out-of-the-Box Performance**

**Part 2: Running Up Models**

The second section runs the models through a 5-fold split to compute kappa values.

We can predict sii scores in two ways:
1. Predict the PCIAT_Total score and then compute sii values
2. Predict the sii score directly

We can also modify some of these computations by adjusting the bins for computing sii values and by adjusting the computed sii scores manually.

One issue here is the extremely small number of sii=3 values in our data set. These are precisely the values we're most interested in predicting, since they most strongly indicate problematic internet use.

A suggestion from stackoverflow is to over-sample the sii=3 instances; we've implemented this over-sampling inside the k-fold split to try to avoid data leakage:
https://stackoverflow.com/questions/39512140/how-to-deal-with-this-unbalanced-class-skewed-data-set

Note: Column selector documented here: https://stackoverflow.com/questions/62416223/how-to-select-only-few-columns-in-scikit-learn-column-selector-pipeline

In [None]:
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import cohen_kappa_score

# Set up a list of the models and methods to organize the computation of means in the kfold split
modellist = []
for pipeline_name, pipeline_obj in models.items():
    modellist.append(pipeline_name)

methodlist = ['Compute SII from PCIAT (Standard Bins) (kappa)', 
              'Compute SII from PCIAT (Modified Bins) (kappa)',
                'Predict SII (rounded) (kappa)',
                'Predict SII (+0.1 rounded) (kappa)']

num_splits = 5


# Create an array with len(modellist) rows and len(methodlist) columns
output = np.zeros((len(methodlist), len(modellist), num_splits))

#Make a StratifiedKFold object stratified by the variable sii
# This is necessary due to the small number of sii=3 values
kfold = StratifiedKFold(n_splits=num_splits, shuffle=True)

## i will count the split number 
i = 0
for train_index, test_index in kfold.split(train_cleaned, train_cleaned['sii']):
#for train_index, test_index in kfold.split(train_cleaned):
    train_tt = train_cleaned.iloc[train_index]
    train_ho = train_cleaned.iloc[test_index]

    # The number of sii=3 values is so small, we'll try to boost prediction 
    # performance by duplicating the rows with this value
    train_tt_sii3=train_tt[train_tt['sii']==3]
    train_tt_sii3=pd.concat([train_tt_sii3]*4, ignore_index=True)
    train_tt=pd.concat([train_tt,train_tt_sii3], ignore_index=True)
    train_tt.reset_index(drop=True, inplace=True)

    # j will enumerate the model
    j=0

    for pipeline_name, pipeline_obj in models.items():
        # The ordinal predictors can't predict PCIAT scores, so we'll leave them out of the first round of computations
        if 'ordinal' in pipeline_name:
            kappa_sii_comp = 0
            kappa_sii_comp_mod = 0
        else:
            # Fit and make predictions of PCIAT_Total
            pipeline_obj.fit(train_tt[predictors], train_tt['PCIAT-PCIAT_Total'])
            pred = pipeline_obj.predict(train_ho[predictors])

            # Compute sii based on PCIAT and compute mse
            bins = [0, 30, 49,79,100]
            pred_bin = np.digitize(pred, bins)-1

            # Try a slightly different set of bins suggested by the "tuning" below
            bins_mod = [0, 27, 43, 71, 100]
            pred_bin_mod = np.digitize(pred, bins_mod)-1

            # Compute kappa values
            kappa_sii_comp = cohen_kappa_score(train_ho['sii'], pred_bin, weights='quadratic')
            kappa_sii_comp_mod = cohen_kappa_score(train_ho['sii'], pred_bin_mod, weights='quadratic')
        
        # Store the kappa values in the output array
        output[0,j,i] = kappa_sii_comp
        output[1,j,i] = kappa_sii_comp_mod
        j=j+1

    j=0
    for pipeline_name, pipeline_obj in models.items():
        # Fit and make predictions of sii
        pipeline_obj.fit(train_tt[predictors], train_tt['sii'])
        pred = pipeline_obj.predict(train_ho[predictors])

        # Try two different ways of rounding the predictions
        pred_round = np.round(pred)
        pred_roundmod = np.round(pred+0.1)

        # Compute and record the kappa values
        kappa_sii_round = cohen_kappa_score(train_ho['sii'], pred_round, weights='quadratic')
        kappa_sii_roundmod = cohen_kappa_score(train_ho['sii'], pred_roundmod, weights='quadratic')

        output[2,j,i] = kappa_sii_round
        output[3,j,i] = kappa_sii_roundmod
        j=j+1
    i=i+1

# Create a new array by computing the average of the values in output along the third axis
output_avg = np.mean(output, axis=2)

In [59]:
# create a data frame from output using modellist as the names of the columns and methodlist as the names of the rows
output_df = pd.DataFrame(output_avg, columns=modellist, index=methodlist)

output_df

Unnamed: 0,slr_pipe,mlr_key_pipe,mlr_all_pipe,knn_pipe,svr_pipe,rf_pipe,ada_pipe,grad_pipe,xgb_pipe,ordinal_logistic_pipe,ordinal_knn_pipe,ordinal_rf_pipe,ordinal_ada_pipe,ordinal_grad_pipe,ordinal_xgb_pipe
Compute SII from PCIAT (Standard Bins) (kappa),0.326215,0.426822,0.41088,0.291561,0.243216,0.401811,0.356389,0.408368,0.348377,0.0,0.0,0.0,0.0,0.0,0.0
Compute SII from PCIAT (Modified Bins) (kappa),0.347795,0.461043,0.445265,0.338944,0.278466,0.412807,0.41757,0.433329,0.373473,0.0,0.0,0.0,0.0,0.0,0.0
Predict SII (rounded) (kappa),0.279295,0.410669,0.417569,0.311799,0.333834,0.378713,0.252133,0.408248,0.38517,0.384273,0.253311,0.307794,0.34374,0.426198,0.379592
Predict SII (+0.1 rounded) (kappa),0.329004,0.387273,0.403277,0.302074,0.349448,0.373556,0.196603,0.4085,0.378786,0.384273,0.253311,0.307794,0.34374,0.426198,0.379592


In [60]:
# Export output_df to a csv
output_df.to_csv('output_df.csv')

**Results of OOtB Investigation**

* The multiple regression - both using all predictors and using just the "key" features had among the largest kappa values.
* Gradient Boosting - both as a regressor and inside an ordinal classifier - was at or near the top of kappa values.
* Logistic regression inside an ordinal classifier did well
* Random forest performed decently for predicting PCIAT scores and then computing sii values
* XGBoost performed decently for predicting sii values both via regression and inside an ordinal classifier.

In general, adjusting the predicted sii values decreased kappa.

Predicting PCIAT and then computing sii benefitted from adjusting the cutpoints.

In the sections below, we'll "tune" the cutpoints and then try tuning a few of the models and re-testing the loop.

**Tuning the Bins**

We noticed that our model struggled to predict higher output values.

When we adjusted the values for converting PCIAT scores to sii scores, we noticed an improvement in prediction when we lowered the cutpoints.

We sought to "tune" these cutpoints. However, we need to be mindful of overfitting. 

We'll look at the combination of cutpoints that maximize kappa, and then from the top 20 (or so) select the cutpoints that are closest to the original ones (as measured by euclidean distance)

In [55]:
# We'll use the multiple linear regression with "key" features to test the cutpoints
# This model performed as good or better than most of the other (untuned) models
# Since it is quick to run, it should be a decent choice for doing this tuning

# Start by setting up the MLR pipeline
mlr_key_pipe=Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('selector', ColumnTransformer([('selector', 'passthrough', keyfeatures)], remainder="drop")),
                ('linear', LinearRegression())])

# Set the number of k-fold splits
num_splits = 5

# Set the number of different cutpoints to try
num_bincuts=15

# Create an array with len(modellist) rows and len(methodlist) columns
output = np.zeros((num_bincuts, num_bincuts,num_bincuts, num_splits))

#Make a KFold object, stratified on sii=3 values
kfold= StratifiedKFold(n_splits=num_splits, shuffle=True)

## i will count the split number 
i = 0

for train_index, test_index in kfold.split(train_cleaned, train_cleaned['sii']):
    train_tt = train_cleaned.iloc[train_index]
    train_ho = train_cleaned.iloc[test_index]

    # As above, we're going to "boost" the number of observations with sii=3 to try to improve predictive performance
    train_tt_sii3=train_tt[train_tt['sii']==3]
    train_tt_sii3=pd.concat([train_tt_sii3]*4, ignore_index=True)
    train_tt=pd.concat([train_tt,train_tt_sii3], ignore_index=True)
    train_tt.reset_index(drop=True, inplace=True)
    
    # Fit the pipe and make predictions
    mlr_key_pipe.fit(train_tt[predictors], train_tt['PCIAT-PCIAT_Total'])
    pred = mlr_key_pipe.predict(train_ho[predictors])

    # Iterate through values of the three cutpoints    
    for r in range(num_bincuts):
        for s in range(num_bincuts):
            for t in range(num_bincuts):
                bins = [0, 30-r, 49-s,79-t,100]
                pred_bin_mod = np.digitize(pred, bins)-1
                # Compute kappa for the binned predictions
                kappa_sii_comp_mod = cohen_kappa_score(train_ho['sii'], pred_bin_mod, weights='quadratic')
                output[r,s,t,i]=kappa_sii_comp_mod
                
    i=i+1

# Create a new array by computing the average of the values in output along the third axis
output_avg = np.mean(output, axis=3)

Next, we'll examine the output of the tuning

In [57]:
# Flatten the array and sort the indices in descending order
sorted_indices_flat = np.argsort(output_avg.ravel())[::-1]

#Decide how many top values you want to look at. 
n=20

# Get the flat indices of the top two values
top_flat_indices = sorted_indices_flat[:n]

# Convert the flat indices to 3D indices
top_indices = [np.unravel_index(idx, output_avg.shape) for idx in top_flat_indices]

# Retrieve the top two values
top_values = output_avg.ravel()[top_flat_indices]

print("Top values:", top_values)
print("Locations of the top values:", top_indices)

Top values: [0.46371448 0.46365268 0.46320111 0.46320111 0.46320111 0.46320111
 0.46296255 0.46289302 0.4627613  0.46267516 0.46244046 0.46244046
 0.46244046 0.46244046 0.46227087 0.46222172 0.46222172 0.46222172
 0.46222172 0.4621337 ]
Locations of the top values: [(np.int64(3), np.int64(8), np.int64(13)), (np.int64(3), np.int64(8), np.int64(12)), (np.int64(3), np.int64(8), np.int64(9)), (np.int64(3), np.int64(8), np.int64(11)), (np.int64(3), np.int64(8), np.int64(8)), (np.int64(3), np.int64(8), np.int64(10)), (np.int64(3), np.int64(7), np.int64(13)), (np.int64(3), np.int64(7), np.int64(12)), (np.int64(3), np.int64(6), np.int64(13)), (np.int64(3), np.int64(6), np.int64(12)), (np.int64(3), np.int64(7), np.int64(11)), (np.int64(3), np.int64(7), np.int64(9)), (np.int64(3), np.int64(7), np.int64(8)), (np.int64(3), np.int64(7), np.int64(10)), (np.int64(3), np.int64(8), np.int64(14)), (np.int64(3), np.int64(6), np.int64(10)), (np.int64(3), np.int64(6), np.int64(9)), (np.int64(3), np.int64(6

**Results of Cutpoint Tuning**

Of the top 20 cutpoints, the combination that minimizes the Euclidean distance between the original and new cutpoints is:
[0, 30-3, 49-6,79-8,100] = [0, 27, 43, 71, 100]

We'll circle back around and try this in the loop above.

**Hyperparameter Tuning**

The models above were run "out of the box." In the next sections, we'll try to tune a few of them (random forest, gradient boost, logistic regression inside an ordinal classifier, and, maybe, xgboost) to see how much we can improve their performance.

**Tuning the Random Forest Regressor**

In this section, we'll tune the random forest regressor.

Note that using a pipeline with gridcvsearch requires slightly different naming conventions for the parameter grid: 
https://stackoverflow.com/questions/34889110/random-forest-with-gridsearchcv-error-on-param-grid

Additional suggestions for only applying preprocessing before doing the gridsearchcv (not sure if we need this):
https://stackoverflow.com/questions/43366561/use-sklearns-gridsearchcv-with-a-pipeline-preprocessing-just-once

In [16]:
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics import cohen_kappa_score, make_scorer
from sklearn.metrics import cohen_kappa_score
#from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import make_pipeline

#max_depths = range(1, 11)
max_depths = range(1, 2)
n_trees = [100, 500]

param_grid = {
    'rf__n_estimators': [50, 100, 200],
    'rf__max_depth': range(1,2),
    'rf__min_samples_split': [2, 4, 8]
}

# Cohen's kappa isn't included in the list of metrics by default
# We can make a kappa scorer to still use it in GridSearch
kappa_scorer = make_scorer(cohen_kappa_score, weights='quadratic')

# Instantiate a random forest pipeline
rf_pipe = Pipeline([('mice_impute', Custom_MICE_Imputer()),
                ('add_zones', FunctionTransformer(zone_encoder)),
                ('rf', RandomForestRegressor())])

grid_cv_rf = GridSearchCV(rf_pipe, 
                          param_grid = param_grid, 
                          #scoring = kappa_scorer,  #We don't want to use kappa for PCIAT predictions
                          cv = 5)

#clf_rf = make_pipeline(Custom_MICE_Imputer(),
#                       FunctionTransformer(zone_encoder),
#                       GridSearchCV(RandomForestRegressor(),
#                                 param_grid={'max_depth':max_depths,
#                                             'n_estimators': n_trees},
#                                scoring=kappa_scorer,
#                                cv=5,
#                                refit=True))

#clf_rf.fit(train_cleaned[predictors], train_cleaned['PCIAT-PCIAT_Total'])
#clf_rf.predict()

# We'll start by tuning on PCIAT, and can tune separately on sii
grid_cv_rf.fit(train_cleaned[predictors], train_cleaned['PCIAT-PCIAT_Total'])


  _data = np.array(data, dtype=dtype, copy=copy,


In [18]:
## You can find the hyperparameter grid point that
## gave the best performance like so
## .best_params_

# Get .best_params from the GridSearchCV inside the clf_rf pipeline
grid_cv_rf.best_params_

{'rf__max_depth': 1, 'rf__min_samples_split': 2, 'rf__n_estimators': 100}

In [19]:
## You can find the best score like so
## .best_score_
grid_cv_rf.best_score_

np.float64(0.17582149615134404)

In [20]:
## Calling best_estimator_ returns the model with the 
## best avg cv performance after it has been refit on the
## entire data set
grid_cv_rf.best_estimator_

In [None]:
grid_cv_rf.best_estimator_.predict(train_ho[predictors])

**Using Neural Regression to Predict Ordinal Outcomes**

It looks like we could also use "neural regression" (PyTorch) to do the prediction. 

Here is a website that describes how to accomplish this: https://visualstudiomagazine.com/articles/2021/10/04/ordinal-classification-pytorch.aspx

The code below is copied from the website

In [None]:
# house_price_ord.py
# predict ordinal price from AC, sq ft, style, nearest school
# PyTorch 1.8.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

import numpy as np
import torch as T
device = T.device("cpu")  # apply to Tensor or Module

# -----------------------------------------------------------

class HouseDataset(T.utils.data.Dataset):
  # AC  sq ft   style  price   school
  # -1  0.2500  0 1 0    3     0 1 0
  #  1  0.1275  1 0 0    2     0 0 1
  # air condition: -1 = no, +1 = yes
  # style: art_deco, bungalow, colonial
  # price: k=4: 0 = low, 1 = medium, 2 = high, 3 = very high
  # school: johnson, kennedy, lincoln

  def __init__(self, src_file, k):
    # k for programmtic approach
    all_xy = np.loadtxt(src_file, 
      usecols=[0,1,2,3,4,5,6,7,8], delimiter="\t",
      comments="#", skiprows=0, dtype=np.float32)

    tmp_x = all_xy[:,[0,1,2,3,4,6,7,8]]
    tmp_y = all_xy[:,5]    # 1D -- 2D will be required

    n = len(tmp_y)
    for i in range(n):  # hard-coded is easy to understand
      if int(tmp_y[i])   == 0: tmp_y[i] = 0.125
      elif int(tmp_y[i]) == 1: tmp_y[i] = 0.375
      elif int(tmp_y[i]) == 2: tmp_y[i] = 0.625
      elif int(tmp_y[i]) == 3: tmp_y[i] = 0.875
      else: print("Fatal logic error ")

    tmp_y = np.reshape(tmp_y, (-1,1))  # 2D    

    self.x_data = T.tensor(tmp_x, \
      dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y, \
      dtype=T.float32).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx,:]  # or just [idx]
    price = self.y_data[idx,:] 
    return (preds, price)       # tuple of two matrices 

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, 10)  # 8-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)  # [0.0 to 1.0]

    T.nn.init.xavier_uniform_(self.hid1.weight)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.sigmoid(self.oupt(z))  # 
    return z

# -----------------------------------------------------------

def accuracy(model, ds, k):
  n_correct = 0; n_wrong = 0
  acc_delta = (1.0 / k) / 2   # if k=4 delta = 0.125
  for i in range(len(ds)):    # each input
    (X, y) = ds[i]            # (predictors, target)
    with T.no_grad():         # y target is like 0.375
      oupt = model(X)         # oupt is in [0.0, 1.0]

    if T.abs(oupt - y) <= acc_delta:
      n_correct += 1
    else:
      n_wrong += 1

  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def accuracy_old(model, ds, k):
  model.eval()
  n_correct = 0; n_wrong = 0
  for i in range(len(ds)):
    (X, Y) = ds[i]            # (predictors, target)
    with T.no_grad():
      oupt = model(X)         # computed is in 0.0 to 1.0
    if oupt >= 0.0 and oupt < 0.25 and Y == 0.125:  # ugly
      n_correct += 1
    elif oupt >= 0.25 and oupt < 0.50 and Y == 0.375:
      n_correct += 1
    elif oupt >= 0.50 and oupt < 0.75 and Y == 0.625:
      n_correct += 1
    elif oupt >= 0.75 and Y == 0.875:
      n_correct += 1
    else:
      n_wrong += 1
  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def train(net, ds, bs, lr, me, le):
  # network, dataset, batch_size, learn_rate, 
  # max_epochs, log_every
  train_ldr = T.utils.data.DataLoader(ds,
    batch_size=bs, shuffle=True)
  loss_func = T.nn.MSELoss()
  opt = T.optim.Adam(net.parameters(), lr=lr)

  for epoch in range(0, me):
    # T.manual_seed(1+epoch)  # recovery reproducibility
    epoch_loss = 0  # for one full epoch

    for (b_idx, batch) in enumerate(train_ldr):
      (X, y) = batch           # (predictors, targets)
      opt.zero_grad()          # prepare gradients
      oupt = net(X)            # predicted prices

      loss_val = loss_func(oupt, y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()  # compute gradients
      opt.step()           # update weights

    if epoch % le == 0:
      print("epoch = %4d   loss = %0.4f" % \
       (epoch, epoch_loss))
      # TODO: save checkpoint

# -----------------------------------------------------------

def float_oupt_to_class(oupt, k):
  end_pts = np.zeros(k+1, dtype=np.float32) 
  delta = 1.0 / k
  for i in range(k):
    end_pts[i] = i * delta
  end_pts[k] = 1.0
  # if k=4, [0.0, 0.25, 0.50, 0.75, 1.0] 

  for i in range(k):
    if oupt >= end_pts[i] and oupt <= end_pts[i+1]:
      return i
  return -1  # fatal error 

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin predict House ordinal price \n")
  T.manual_seed(1)  # representative results 
  np.random.seed(1)
  
  # 1. create Dataset objects
  print("Creating Houses Dataset objects ")
  print("Converting ordinal labels to float targets ")
  train_file = ".\\Data\\houses_train_ord.txt"
  train_ds = HouseDataset(train_file, k=4)  # 200 rows

  test_file = ".\\Data\\houses_test_ord.txt"
  test_ds = HouseDataset(test_file, k=4)  # 40 rows

  # 2. create network
  print("\nCreating 8-10-10-1 neural network ")
  net = Net().to(device)
  net.train()   # set mode

  # 3. train model
  bat_size = 10
  lrn_rate = 0.010
  max_epochs = 500
  log_every = 100

  print("\nbat_size = %3d " % bat_size)
  print("lrn_rate = %0.3f " % lrn_rate)
  print("loss = MSELoss ")
  print("optimizer = Adam ")
  print("max_epochs = %3d " % max_epochs)

  print("\nStarting training ")
  train(net, train_ds, bat_size, lrn_rate, 
    max_epochs, log_every)
  print("Training complete ")

  # 4. evaluate model accuracy
  print("\nComputing model accuracy")
  net.eval()  # set mode
  acc_train = accuracy(net, train_ds, k=4) 
  print("Accuracy on train data = %0.4f" % \
    acc_train)

  acc_test = accuracy(net, test_ds, k=4) 
  print("Accuracy on test data  = %0.4f" % \
    acc_test)

  # 5. save trained model (TODO)
  print("\nSaving trained model as houses_model.h5 ")
  # model.save_weights(".\\Models\\houses_model_wts.h5")
  # model.save(".\\Models\\houses_model.h5")

  # 6. make a prediction
  print("\nPredicting house price for AC=no, sqft=2300, ")
  print(" style=colonial, school=kennedy: ")
  unk = np.array([[-1, 0.2300,  0,0,1,  0,1,0]],
    dtype=np.float32)
  unk = T.tensor(unk, dtype=T.float32).to(device) 

  with T.no_grad():
    pred_price = net(unk)
  pred_price = pred_price.item()  # scalar 0.0 to 1.0
  print("\nPredicted price raw output: %0.4f" % \
    pred_price)

  labels = ["low", "medium", "high", "very high"]
  c = float_oupt_to_class(pred_price, k=4)
  print("Predicted price ordinal label: %d " % c)
  print("Predicted price friendly class: %s " % \
    labels[c])

  print("\nEnd House ordinal price demo")

if __name__ == "__main__":
  main()

# ===========
# houses_train_ord.txt
# AC (-1 = no), sq_ft, style (one-hot)
# price (0=low, 1=med, 2=high, 3=v. high), 
# school (one-hot)
#   -1   0.1275   0   1   0   0   0   0   1
#    1   0.1100   1   0   0   0   1   0   0
#   -1   0.1375   0   0   1   0   0   1   0
#    1   0.1975   0   1   0   2   0   0   1
#   -1   0.1200   0   0   1   0   1   0   0
#   -1   0.2500   0   1   0   2   0   1   0
#    1   0.1275   1   0   0   1   0   0   1
#   -1   0.1750   0   0   1   1   0   0   1
#   -1   0.2500   0   1   0   2   0   0   1
#    1   0.1800   0   1   0   1   1   0   0
#    1   0.0975   1   0   0   0   0   0   1
#   -1   0.1100   0   1   0   0   0   1   0
#    1   0.1975   0   0   1   1   0   0   1
#   -1   0.3175   1   0   0   3   0   1   0
#   -1   0.1700   0   1   0   1   1   0   0
#    1   0.1650   0   1   0   1   0   1   0
#   -1   0.2250   0   1   0   2   0   1   0
#   -1   0.2125   0   1   0   2   0   1   0
#    1   0.1675   0   1   0   1   0   1   0
#    1   0.1550   1   0   0   1   0   1   0
#   -1   0.1375   0   0   1   0   1   0   0
#   -1   0.2425   0   1   0   2   1   0   0
#    1   0.3200   0   0   1   3   0   1   0
#   -1   0.3075   1   0   0   3   0   1   0
#   -1   0.2700   1   0   0   2   0   0   1
#    1   0.1700   0   1   0   1   0   0   1
#   -1   0.1475   1   0   0   1   1   0   0
#   -1   0.2500   0   1   0   2   0   0   1
#   -1   0.2750   1   0   0   2   0   0   1
#   -1   0.2000   1   0   0   2   1   0   0
#   -1   0.1100   0   0   1   0   1   0   0
#   -1   0.3400   1   0   0   3   0   1   0
#    1   0.3000   0   0   1   3   1   0   0
#    1   0.1550   0   1   0   1   0   1   0
#   -1   0.2150   0   1   0   1   0   0   1
#   -1   0.2900   0   0   1   3   0   1   0
#    1   0.2750   0   0   1   2   0   1   0
#    1   0.2175   0   1   0   2   0   1   0
#    1   0.2150   0   1   0   2   0   0   1
#    1   0.1050   1   0   0   1   1   0   0
#   -1   0.2775   1   0   0   2   0   0   1
#   -1   0.3225   1   0   0   3   0   1   0
#    1   0.2075   0   1   0   2   1   0   0
#   -1   0.3225   1   0   0   3   0   0   1
#    1   0.2800   0   0   1   3   0   0   1
#   -1   0.1575   0   1   0   1   0   0   1
#    1   0.3250   0   0   1   3   0   0   1
#   -1   0.2750   1   0   0   2   0   0   1
#    1   0.1250   1   0   0   1   1   0   0
#   -1   0.2325   0   1   0   2   0   0   1
#    1   0.1825   1   0   0   2   1   0   0
#   -1   0.2600   0   1   0   2   0   1   0
#   -1   0.3075   1   0   0   3   0   0   1
#   -1   0.2875   1   0   0   3   0   0   1
#    1   0.2300   0   1   0   2   0   1   0
#    1   0.3100   0   0   1   3   1   0   0
#   -1   0.2750   1   0   0   2   0   0   1
#    1   0.1125   0   1   0   0   0   0   1
#    1   0.2525   1   0   0   2   1   0   0
#    1   0.1625   0   1   0   1   0   1   0
#    1   0.1075   1   0   0   1   0   0   1
#   -1   0.2200   0   1   0   2   0   1   0
#   -1   0.2300   0   1   0   2   0   1   0
#   -1   0.3100   1   0   0   3   0   1   0
#   -1   0.2875   1   0   0   3   0   1   0
#    1   0.3375   0   0   1   3   0   0   1
#   -1   0.1450   0   0   1   0   1   0   0
#   -1   0.2650   1   0   0   2   1   0   0
#    1   0.2225   0   1   0   2   1   0   0
#   -1   0.2300   0   1   0   2   0   1   0
#    1   0.1025   0   1   0   0   0   1   0
#    1   0.1925   0   1   0   2   1   0   0
#   -1   0.2525   0   1   0   2   0   1   0
#   -1   0.1650   0   1   0   1   0   1   0
#    1   0.1650   0   1   0   1   0   1   0
#   -1   0.1300   1   0   0   1   0   1   0
#   -1   0.2900   1   0   0   3   1   0   0
#   -1   0.2175   0   1   0   1   0   0   1
#    1   0.2300   1   0   0   2   1   0   0
#   -1   0.3000   1   0   0   3   1   0   0
#    1   0.2125   0   1   0   1   1   0   0
#    1   0.2825   0   0   1   2   0   0   1
#    1   0.3125   0   0   1   3   0   1   0
#    1   0.2500   0   1   0   2   1   0   0
#   -1   0.2375   0   1   0   2   0   0   1
#    1   0.3375   0   0   1   3   0   1   0
#    1   0.2000   0   1   0   2   0   0   1
#   -1   0.2100   0   1   0   1   0   1   0
#   -1   0.3225   1   0   0   3   1   0   0
#    1   0.2375   0   0   1   2   1   0   0
#   -1   0.2250   0   1   0   2   0   1   0
#    1   0.1250   1   0   0   1   0   0   1
#   -1   0.1925   1   0   0   1   1   0   0
#   -1   0.2750   0   1   0   2   0   0   1
#    1   0.2200   0   1   0   2   1   0   0
#   -1   0.1675   0   1   0   1   1   0   0
#   -1   0.1700   0   1   0   1   0   0   1
#   -1   0.1350   0   0   1   0   0   1   0
#   -1   0.1600   0   1   0   1   0   1   0
#   -1   0.2125   0   1   0   1   0   0   1
#    1   0.1200   1   0   0   1   0   0   1
#   -1   0.2100   0   1   0   2   0   1   0
#   -1   0.1250   0   0   1   0   0   0   1
#   -1   0.2550   0   1   0   2   0   1   0
#    1   0.2750   0   0   1   2   0   1   0
#   -1   0.2200   0   0   1   1   1   0   0
#    1   0.0925   1   0   0   1   1   0   0
#    1   0.3350   0   0   1   3   0   1   0
#   -1   0.2250   0   1   0   2   0   0   1
#   -1   0.2425   0   1   0   2   1   0   0
#    1   0.1275   0   1   0   1   0   1   0
#    1   0.3350   0   1   0   3   1   0   0
#   -1   0.1850   0   1   0   1   0   0   1
#    1   0.1600   0   1   0   1   1   0   0
#   -1   0.2400   0   1   0   2   1   0   0
#    1   0.3300   0   0   1   3   0   0   1
#   -1   0.3075   1   0   0   3   1   0   0
#    1   0.2900   0   1   0   3   0   0   1
#   -1   0.0950   0   0   1   0   1   0   0
#   -1   0.1900   0   1   0   1   0   0   1
#    1   0.1375   0   1   0   1   1   0   0
#   -1   0.2100   0   1   0   1   1   0   0
#   -1   0.3025   1   0   0   3   1   0   0
#    1   0.1375   1   0   0   0   0   0   1
#   -1   0.1475   1   0   0   1   0   1   0
#    1   0.2150   0   1   0   2   1   0   0
#   -1   0.2400   0   1   0   2   1   0   0
#   -1   0.1375   0   0   1   0   0   0   1
#    1   0.2200   1   0   0   2   1   0   0
#   -1   0.1150   0   0   1   0   0   1   0
#    1   0.1825   0   0   1   2   0   1   0
#   -1   0.3225   1   0   0   3   0   0   1
#   -1   0.1450   0   0   1   0   0   0   1
#    1   0.1675   0   1   0   1   1   0   0
#    1   0.3325   0   0   1   3   0   1   0
#    1   0.1075   1   0   0   0   0   0   1
#   -1   0.1350   0   0   1   0   1   0   0
#   -1   0.1450   0   0   1   0   1   0   0
#    1   0.1575   0   1   0   1   1   0   0
#   -1   0.1825   0   1   0   1   0   0   1
#   -1   0.2450   0   1   0   2   0   1   0
#    1   0.1425   1   0   0   1   1   0   0
#    1   0.2175   0   1   0   2   0   0   1
#    1   0.2325   0   1   0   2   0   1   0
#   -1   0.2875   1   0   0   3   1   0   0
#    1   0.2625   0   1   0   2   0   0   1
#    1   0.1575   0   1   0   1   0   0   1
#    1   0.2750   0   0   1   2   1   0   0
#   -1   0.2500   0   1   0   2   1   0   0
#   -1   0.2400   0   1   0   2   0   1   0
#    1   0.1100   1   0   0   0   0   0   1
#   -1   0.2975   1   0   0   3   0   0   1
#   -1   0.1725   0   0   1   1   1   0   0
#    1   0.3225   0   0   1   3   1   0   0
#   -1   0.1450   0   0   1   0   0   0   1
#    1   0.1725   0   1   0   1   0   1   0
#    1   0.3050   0   0   1   3   1   0   0
#   -1   0.3200   1   0   0   3   0   0   1
#    1   0.1450   1   0   0   1   1   0   0
#   -1   0.3175   1   0   0   3   0   1   0
#    1   0.1475   1   0   0   1   0   1   0
#    1   0.2575   0   1   0   2   1   0   0
#    1   0.1200   1   0   0   1   0   0   1
#   -1   0.2425   0   1   0   2   0   1   0
#   -1   0.0900   1   0   0   0   1   0   0
#   -1   0.0925   0   0   1   0   1   0   0
#   -1   0.1650   0   0   1   1   0   1   0
#    1   0.1025   1   0   0   0   0   0   1
#   -1   0.1475   0   0   1   0   0   0   1
#    1   0.2225   1   0   0   2   0   0   1
#    1   0.3250   1   0   0   3   0   0   1
#    1   0.2800   0   0   1   2   1   0   0
#    1   0.2625   0   1   0   2   0   0   1
#    1   0.1450   1   0   0   1   0   1   0
#    1   0.2350   0   1   0   2   0   1   0
#   -1   0.3425   0   0   1   3   1   0   0
#   -1   0.1575   0   1   0   1   0   0   1
#   -1   0.3075   0   0   1   2   0   1   0
#   -1   0.0950   0   0   1   0   0   1   0
#   -1   0.1925   0   1   0   1   0   0   1
#    1   0.1300   1   0   0   1   1   0   0
#   -1   0.3075   1   0   0   3   0   1   0
#   -1   0.2000   0   1   0   1   1   0   0
#    1   0.2475   0   1   0   3   1   0   0
#   -1   0.2825   1   0   0   3   1   0   0
#    1   0.2425   0   1   0   3   0   1   0
#   -1   0.2625   0   0   1   2   1   0   0
#    1   0.0900   1   0   0   0   1   0   0
#    1   0.2800   0   0   1   2   0   0   1
#    1   0.2600   0   1   0   2   0   1   0
#    1   0.0900   0   1   0   0   0   1   0
#    1   0.2900   0   0   1   3   1   0   0
#    1   0.1950   0   1   0   2   0   1   0
#    1   0.2325   0   1   0   2   1   0   0
#    1   0.2025   0   1   0   1   0   1   0
#    1   0.3025   0   0   1   3   1   0   0
#   -1   0.1800   0   0   1   1   0   1   0
#   -1   0.2225   0   1   0   2   1   0   0
#   -1   0.1425   0   0   1   0   1   0   0
#   -1   0.2725   1   0   0   2   0   0   1
#  
# houses_test_ord.txt                          
#    1   0.2550   0   1   0   2   1   0   0
#    1   0.1625   0   1   0   1   0   1   0
#   -1   0.2750   1   0   0   2   1   0   0
#   -1   0.1275   0   0   1   0   0   0   1
#   -1   0.1650   0   0   1   1   0   0   1
#    1   0.1450   1   0   0   1   0   1   0
#   -1   0.3275   1   0   0   3   1   0   0
#    1   0.2175   0   1   0   2   0   1   0
#    1   0.2725   0   0   1   2   0   1   0
#   -1   0.3075   1   0   0   3   0   1   0
#   -1   0.2600   1   0   0   2   0   1   0
#   -1   0.1525   0   0   1   0   0   1   0
#   -1   0.1450   0   0   1   0   1   0   0
#    1   0.2375   0   1   0   2   0   0   1
#   -1   0.1950   0   1   0   1   0   1   0
#   -1   0.2375   0   1   0   2   0   0   1
#    1   0.2475   0   1   0   2   1   0   0
#    1   0.3150   0   0   1   3   0   0   1
#    1   0.1525   1   0   0   1   1   0   0
#    1   0.3050   0   0   1   3   0   0   1
#    1   0.2350   0   1   0   2   0   0   1
#   -1   0.1525   0   0   1   0   0   0   1
#    1   0.2550   0   1   0   2   0   0   1
#    1   0.1200   0   1   0   1   1   0   0
#    1   0.2450   0   1   0   2   1   0   0
#   -1   0.3300   1   0   0   3   0   0   1
#    1   0.3275   1   0   0   3   1   0   0
#    1   0.2300   1   0   0   2   0   1   0
#    1   0.2275   0   1   0   2   0   0   1
#    1   0.2350   1   0   0   2   1   0   0
#    1   0.1475   1   0   0   1   1   0   0
#    1   0.2850   0   0   1   3   0   0   1
#    1   0.1000   0   0   1   0   1   0   0
#    1   0.1750   0   1   0   1   1   0   0
#    1   0.3075   0   0   1   3   0   0   1
#    1   0.1550   0   1   0   1   0   0   1
#   -1   0.0925   0   0   1   0   1   0   0
#   -1   0.1300   0   0   1   0   0   0   1
#    1   0.1425   0   0   1   1   1   0   0
#    1   0.2975   0   0   1   3   0   0   1
