### Basic Overview

We will be exploring random forest methods to build a predictive model for predicting housing prices, given relevant data.

As detailed in https://roamanalytics.com/2016/10/28/are-categorical-variables-getting-lost-in-your-random-forests/, we 
will be using h2o instead of sklearn as that is more tailored to handling of  categorical variables.

In [1]:
import pandas as pd
import numpy as np
import sys
sys.path.append('../../common_routines/')
from relevant_functions import\
    evaluate_model_score_given_predictions,\
    evaluate_model_score


#### Get clean data first

In [2]:
train_data = pd.read_csv('../../cleaned_input/train_data.csv')
validation_data = pd.read_csv('../../cleaned_input/validation_data.csv')

In [3]:
train_data.columns

Index(['Id', 'MSSubClass', 'MSZoning', 'LotArea', 'Street', 'LotShape',
       'LandContour', 'Utilities', 'LotConfig', 'LandSlope',
       ...
       'LogGarageArea', 'LogWoodDeckSF', 'LogOpenPorchSF', 'LogEnclosedPorch',
       'Log3SsnPorch', 'LogScreenPorch', 'LogPoolArea', 'LogMiscVal',
       'LogSalePrice', 'LogMasVnrArea_times_not_missing'],
      dtype='object', length=102)

In [4]:
train_validation_data = pd.concat([train_data, validation_data])

In [5]:
test_data = pd.read_csv('../../input/test.csv')

In [6]:
test_data.isnull().sum().sum()

7000

In [7]:
## Are they indeed clean ?
train_data.isnull().sum().any()

False

In [8]:
dummy = pd.read_csv('../../input/train.csv')

In [9]:
dummy.columns

Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive

In [10]:
validation_data.isnull().sum().any()

False

#### Get h2o up and running !


In [11]:
# Using h2o
import h2o
h2o.init(nthreads = -1, max_mem_size = 15)

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "1.8.0_121"; OpenJDK Runtime Environment (Zulu 8.20.0.5-macosx) (build 1.8.0_121-b15); OpenJDK 64-Bit Server VM (Zulu 8.20.0.5-macosx) (build 25.121-b15, mixed mode)
  Starting server from /Users/babs4JESUS/anaconda3/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/cz/3nvpl4mj0g5ds3hlsc15wxdr0000gn/T/tmp_6smdstl
  JVM stdout: /var/folders/cz/3nvpl4mj0g5ds3hlsc15wxdr0000gn/T/tmp_6smdstl/h2o_babs4JESUS_started_from_python.out
  JVM stderr: /var/folders/cz/3nvpl4mj0g5ds3hlsc15wxdr0000gn/T/tmp_6smdstl/h2o_babs4JESUS_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O cluster uptime:,01 secs
H2O cluster timezone:,America/New_York
H2O data parsing timezone:,UTC
H2O cluster version:,3.24.0.1
H2O cluster version age:,8 days
H2O cluster name:,H2O_from_python_babs4JESUS_zinejs
H2O cluster total nodes:,1
H2O cluster free memory:,13.33 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8


### Brief framework.

We will be building according the following framework (similar to how we did for PCA).

Given set of columns, we should be able to do the following :

1. Train model on training set.

2. Validate on validation set.

3. Generate predicitons on test data.

4. Do cross validation on combined set of training/validation data.


In [12]:
ALL_CATEGORICAL_COLUMNS = ['MSSubClass',
 'MSZoning',
 'Street',
 'LotShape',
 'LandContour',
 'Utilities',
 'LotConfig',
 'LandSlope',
 'Neighborhood',
 'Condition1',
 'Condition2',
 'BldgType',
 'HouseStyle',
 'RoofStyle',
 'RoofMatl',
 'Exterior1st',
 'Exterior2nd',
 'MasVnrType',
 'ExterQual',
 'ExterCond',
 'Foundation',
 'BsmtQual',
 'BsmtCond',
 'BsmtExposure',
 'BsmtFinType1',
 'BsmtFinType2',
 'Heating',
 'HeatingQC',
 'CentralAir',
 'Electrical',
 'KitchenQual',
 'Functional',
 'FireplaceQu',
 'GarageType',
 'GarageFinish',
 'GarageQual',
 'GarageCond',
 'PavedDrive',
 'PoolQC',
 'Fence',
 'MiscFeature',
 'MoSold',
 'YrSold',
 'SaleType',
 'SaleCondition']

In [13]:
ALL_NUMERICAL_COLUMNS = ['LotArea',
 'OverallQual',
 'OverallCond',
 'YearBuilt',
 'YearRemodAdd',
 'MasVnrArea_times_not_missing',
 'MasVnrArea_not_missing',
 'BsmtFinSF1',
 'BsmtUnfSF',
 'TotalBsmtSF',
 '1stFlrSF',
 '2ndFlrSF',
 'LowQualFinSF',
 'GrLivArea',
 'BsmtFullBath',
 'BsmtHalfBath',
 'FullBath',
 'HalfBath',
 'BedroomAbvGr',
 'KitchenAbvGr',
 'TotRmsAbvGrd',
 'Fireplaces',
 'GarageYrBlt_times_not_missing',
 'GarageYrBlt_not_missing',
 'GarageCars',
 'GarageArea',
 'WoodDeckSF',
 'OpenPorchSF',
 'EnclosedPorch',
 '3SsnPorch',
 'ScreenPorch',
 'PoolArea',
 'MiscVal']

In [14]:
ALL_COLUMNS = ALL_CATEGORICAL_COLUMNS + ALL_NUMERICAL_COLUMNS

In [15]:
# Columns the model to be trained 
# Check out ExterQual and YearBuilt instead of YearRemodAdd
# Check out BsmtCond
#cat_cols_in_model = ['MSSubClass', 'Neighborhood', 'ExterQual', 'Foundation', 'BsmtQual', 'BsmtCond',
#                     'BsmtFinType1']
#numeric_cols_in_model = ['GrLivArea', 'OverallQual', 'OverallCond', 'YearRemodAdd', 'BsmtFinSF1', 
#                         'TotalBsmtSF']
# Check out GarageCond
cat_cols_in_model = ['MSSubClass', 'Neighborhood']
numeric_cols_in_model = ['GrLivArea', 'OverallQual', 'OverallCond', 'YearRemodAdd', 'BsmtFinSF1', 
                         'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'GarageArea', 'LowQualFinSF']

all_cols_in_model = cat_cols_in_model + numeric_cols_in_model

dep_var_col = 'LogSalePrice'

In [16]:
all_cols_in_model

['MSSubClass',
 'Neighborhood',
 'GrLivArea',
 'OverallQual',
 'OverallCond',
 'YearRemodAdd',
 'BsmtFinSF1',
 'TotalBsmtSF',
 '1stFlrSF',
 '2ndFlrSF',
 'GarageArea',
 'LowQualFinSF']

#### Training model on the training set


In [17]:
def get_h2o_frame_with_rel_factors(test_data):
    test_data_h2o = h2o.H2OFrame(test_data)
    for col in ALL_CATEGORICAL_COLUMNS:
        test_data_h2o[col] = test_data_h2o[col].asfactor()
    return test_data_h2o

In [18]:
from h2o.estimators.random_forest import H2ORandomForestEstimator

In [19]:
hpr_1 = H2ORandomForestEstimator(model_id='housing_price_regression', seed=1)
hpr_1.train(x=all_cols_in_model, 
            y=dep_var_col, 
            training_frame=get_h2o_frame_with_rel_factors(train_data))

Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%


In [20]:
predict_out = hpr_1.predict(
    get_h2o_frame_with_rel_factors(train_data))                
train_data['Predictions'] = predict_out.as_data_frame()['predict'].values.tolist()

Parse progress: |█████████████████████████████████████████████████████████| 100%
drf prediction progress: |████████████████████████████████████████████████| 100%


In [21]:
#evaluate_model_score_given_predictions(np.log(train_data['Predictions'].values), 
#                                       np.log(train_data[dep_var_col].values))

In [22]:
evaluate_model_score_given_predictions((train_data['Predictions'].values), 
                                       (train_data[dep_var_col].values))

0.05519499883354487

#### Inspect the output model

It may seem like a trivial thing, but shouldn't we inspect the model to see what exactly it does ? This is especially important in data science and it is very easy to get entangled in a quagmire of models and functions without clearly understanding what any of them does.


In [23]:
from h2o.tree import H2OTree
tree = H2OTree(model = hpr_1, tree_number = 0, tree_class = None)

In [24]:
tree

<h2o.tree.tree.H2OTree at 0x124eeef28>

In [25]:
len(tree)

1349

In [26]:
print(tree)

Tree related to model housing_price_regression. Tree number is 0, tree class is 'None'




In [27]:
tree.levels

[None,
 ['Blueste',
  'BrDale',
  'BrkSide',
  'Edwards',
  'IDOTRR',
  'MeadowV',
  'Mitchel',
  'NAmes',
  'NPkVill',
  'OldTown',
  'SWISU',
  'Sawyer'],
 ['Blmngtn',
  'ClearCr',
  'CollgCr',
  'Crawfor',
  'Gilbert',
  'NWAmes',
  'NoRidge',
  'NridgHt',
  'SawyerW',
  'Somerst',
  'StoneBr',
  'Timber',
  'Veenker'],
 None,
 None,
 None,
 None,
 ['30', '40', '50', '70', '75', '160', '180', '190'],
 ['20', '45', '60', '80', '85', '90', '120'],
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 ['30', '40', '70', '160', '180'],
 ['50', '75', '190'],
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 ['20',
  '30',
  '40',
  '45',
  '50',
  '75',
  '80',
  '85',
  '90',
  '120',
  '160',
  '180',
  '190'],
 ['60', '70'],
 None,
 None,
 None,
 None,
 None,
 None,
 ['Blmngtn',
  'ClearCr',
  'CollgCr',
  'Gilbert',
  'NWAmes',
  'SawyerW',
  'Timber

In [28]:
tree.tree_number

0

In [29]:
tree.show()

Tree related to model housing_price_regression. Tree number is 0, tree class is 'None'




In [30]:
print(tree.root_node)

Node ID 0 
Left child node ID = 1
Right child node ID = 2

Splits on column Neighborhood
  - Categorical levels going to the left node: ['Blueste', 'BrDale', 'BrkSide', 'Edwards', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'OldTown', 'SWISU', 'Sawyer']
  - Categorical levels going to the right node: ['Blmngtn', 'ClearCr', 'CollgCr', 'Crawfor', 'Gilbert', 'NWAmes', 'NoRidge', 'NridgHt', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']

NA values go to the RIGHT


In [31]:
print(tree.root_node.left_child)

Node ID 1 
Left child node ID = 3
Right child node ID = 4

Splits on column GrLivArea
Split threshold < 1377.5 to the left node, >= 1377.5 to the right node 

NA values go to the LEFT


#### Testing model on validation set

In [32]:
predict_out = hpr_1.predict(
    get_h2o_frame_with_rel_factors(validation_data))                
validation_data['Predictions'] = predict_out.as_data_frame()['predict'].values.tolist()

Parse progress: |█████████████████████████████████████████████████████████| 100%
drf prediction progress: |████████████████████████████████████████████████| 100%


In [33]:
#evaluate_model_score_given_predictions(np.log(validation_data['Predictions'].values), 
#                                       np.log(validation_data[dep_var_col].values))

In [34]:
validation_score = evaluate_model_score_given_predictions((validation_data['Predictions'].values), 
                                                          (validation_data[dep_var_col].values))
print(validation_score)

0.13821357736206205


#### Generate predictions on test data

In [35]:
test_data_one_hot = pd.read_csv('../../cleaned_input/test_data_one_hot.csv')

In [36]:
test_data['LogMasVnrArea_times_not_missing'] = test_data_one_hot['LogMasVnrArea_times_not_missing']

In [37]:
test_data[all_cols_in_model].isnull().sum()

MSSubClass      0
Neighborhood    0
GrLivArea       0
OverallQual     0
OverallCond     0
YearRemodAdd    0
BsmtFinSF1      1
TotalBsmtSF     1
1stFlrSF        0
2ndFlrSF        0
GarageArea      1
LowQualFinSF    0
dtype: int64

In [38]:
hpr_1 = H2ORandomForestEstimator(model_id='housing_price_regression', seed=1)
hpr_1.train(x=all_cols_in_model, 
            y=dep_var_col, 
            training_frame=get_h2o_frame_with_rel_factors(train_validation_data))

Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%


In [39]:
predict_out = hpr_1.predict(
    get_h2o_frame_with_rel_factors(test_data))                
test_data['Predictions'] = predict_out.as_data_frame()['predict'].values.tolist()

Parse progress: |█████████████████████████████████████████████████████████| 100%
drf prediction progress: |████████████████████████████████████████████████| 100%




#### Cross validation

In [40]:
# Do a 10 fold cross validation as that is done typically.
hpr_cross_val = H2ORandomForestEstimator(model_id='housing_price_regression', 
                                         seed=1, 
                                         nfolds=5,
                                         keep_cross_validation_predictions=True)
hpr_cross_val.train(x=all_cols_in_model, 
                    y=dep_var_col, 
                    training_frame=get_h2o_frame_with_rel_factors(train_validation_data))


Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%


In [41]:
hpr_cross_val.cross_validation_predictions

Model Details
H2ORandomForestEstimator :  Distributed Random Forest
Model Key:  housing_price_regression


ModelMetricsRegression: drf
** Reported on train data. **

MSE: 0.019799110697422875
RMSE: 0.1407093127601115
MAE: 0.09757718828825032
RMSLE: 0.01093486090327993
Mean Residual Deviance: 0.019799110697422875

ModelMetricsRegression: drf
** Reported on cross-validation data. **

MSE: 0.01938395105795873
RMSE: 0.13922625850736178
MAE: 0.09577830267605716
RMSLE: 0.01083603165391223
Mean Residual Deviance: 0.01938395105795873
Cross-Validation Metrics Summary: 


0,1,2,3,4,5,6,7
,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
mae,0.0956881,0.0058530,0.1084217,0.0829694,0.0954006,0.0988700,0.0927787
mean_residual_deviance,0.0193422,0.0029547,0.0252202,0.0125499,0.0191893,0.0216115,0.0181401
mse,0.0193422,0.0029547,0.0252202,0.0125499,0.0191893,0.0216115,0.0181401
r2,0.8790135,0.0117323,0.8703818,0.8985824,0.8720775,0.8562414,0.8977843
residual_deviance,0.0193422,0.0029547,0.0252202,0.0125499,0.0191893,0.0216115,0.0181401
rmse,0.1382107,0.0109537,0.1588085,0.1120262,0.1385254,0.1470084,0.134685
rmsle,0.0107523,0.0008834,0.0123688,0.0086034,0.0107532,0.0114827,0.0105534


Scoring History: 


0,1,2,3,4,5,6
,timestamp,duration,number_of_trees,training_rmse,training_mae,training_deviance
,2019-04-09 09:51:37,1.522 sec,0.0,,,
,2019-04-09 09:51:37,1.537 sec,1.0,0.1954341,0.1397954,0.0381945
,2019-04-09 09:51:37,1.548 sec,2.0,0.1908295,0.1352168,0.0364159
,2019-04-09 09:51:37,1.559 sec,3.0,0.1815573,0.1314005,0.0329631
,2019-04-09 09:51:37,1.570 sec,4.0,0.1760658,0.1279177,0.0309992
---,---,---,---,---,---,---
,2019-04-09 09:51:38,2.016 sec,46.0,0.1414475,0.0981634,0.0200074
,2019-04-09 09:51:38,2.027 sec,47.0,0.1413494,0.0980232,0.0199797
,2019-04-09 09:51:38,2.037 sec,48.0,0.1411098,0.0977282,0.0199120



See the whole table with table.as_data_frame()
Variable Importances: 


0,1,2,3
variable,relative_importance,scaled_importance,percentage
OverallQual,2756.4650879,1.0,0.3048628
Neighborhood,2168.5063477,0.7866983,0.2398350
GrLivArea,1208.7586670,0.4385177,0.1336877
TotalBsmtSF,714.8059692,0.2593198,0.0790570
GarageArea,542.7357788,0.1968956,0.0600261
1stFlrSF,459.9797363,0.1668730,0.0508734
YearRemodAdd,366.4526367,0.1329430,0.0405294
MSSubClass,345.7257690,0.1254236,0.0382370
BsmtFinSF1,175.9045105,0.0638153,0.0194549


<bound method ModelBase.cross_validation_predictions of >

In [42]:
def get_cross_validated_rmse(hpr_cross_val):
    cv_preds = hpr_cross_val.cross_validation_predictions()
    for i in range(len(cv_preds)):
        if i == 0:
            result_cv = cv_preds[0]['predict'].as_data_frame().copy()
        else:
            result_cv +=  cv_preds[i]['predict'].as_data_frame()
    return evaluate_model_score_given_predictions(result_cv, train_validation_data['LogSalePrice'])    

In [43]:
get_cross_validated_rmse(hpr_cross_val)

0.13922626299526575

#### Try somewhat of a greedy method to select columns

In [47]:
import operator
def get_cross_val_scores_new_col(base_model_cols):
    columns_to_cross_val_score = dict()
    for col in ALL_COLUMNS:
        cur_model_cols = base_model_cols + [col]
        print(cur_model_cols)

        # Do a 10 fold cross validation as that is done typically.
        hpr_cross_val = H2ORandomForestEstimator(model_id='housing_price_regression', 
                                                 seed=1, 
                                                 nfolds=5,
                                                 keep_cross_validation_predictions=True)
        hpr_cross_val.train(x=cur_model_cols, 
                            y=dep_var_col, 
                            training_frame=get_h2o_frame_with_rel_factors(train_validation_data))

        cv_score = get_cross_validated_rmse(hpr_cross_val)

        columns_to_cross_val_score[col] = cv_score
    
    sorted_cross_val_scores = sorted(columns_to_cross_val_score.items(), key=operator.itemgetter(1))
    return sorted_cross_val_scores

In [48]:
sorted_cross_val_scores = get_cross_val_scores_new_col([])

['MSSubClass']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['MSZoning']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['Street']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['LotShape']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['LandContour']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['Utilities']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |

Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallCond']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['YearBuilt']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['YearRemodAdd']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['MasVnrArea_times_not_missing']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['MasVnrArea_not_missing']
Parse progress: |█████████████████████████████████████████████████████████| 100%
dr

In [None]:
sorted_cross_val_scores

[('OverallQual', 0.22982421832125344),
 ('Neighborhood', 0.2661876508712513),
 ('GarageCars', 0.2863561789819541),
 ('ExterQual', 0.2941283886407502),
 ('BsmtQual', 0.29631406694885337),
 ('KitchenQual', 0.2975537585280438),
 ('GarageArea', 0.2992157116026678),
 ('YearBuilt', 0.31028553626759886),
 ('GrLivArea', 0.3110556632966484),
 ('GarageFinish', 0.3151021633979988),
 ('GarageYrBlt_times_not_missing', 0.31954443950602),
 ('FullBath', 0.32034100136445326),
 ('GarageType', 0.3275768023883375),
 ('YearRemodAdd', 0.3304578802248884),
 ('MSSubClass', 0.3304683876637387),
 ('FireplaceQu', 0.3340680181155671),
 ('Foundation', 0.3348166520747213),
 ('TotRmsAbvGrd', 0.33798477079298456),
 ('TotalBsmtSF', 0.3416650172863745),
 ('Fireplaces', 0.34385830432543923),
 ('HeatingQC', 0.350979295322223),
 ('BsmtFinType1', 0.358082663943684),
 ('1stFlrSF', 0.3586992985754942),
 ('MasVnrType', 0.3597555919081454),
 ('MSZoning', 0.3637349598674163),
 ('2ndFlrSF', 0.3661242254067586),
 ('Exterior1st', 

In [None]:
sorted_cross_val_scores = get_cross_val_scores_new_col(['OverallQual'])

['OverallQual', 'MSSubClass']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'MSZoning']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'Street']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'LotShape']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'LandContour']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'Utilities']
Parse progress: |█

drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'SaleType']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'SaleCondition']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'LotArea']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'OverallQual']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%
['OverallQual', 'OverallCond']
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |█████████████

In [None]:
sorted_cross_val_scores