# Chapter 5 - Ensemble machine learning, deep learning

2022 February 23

![kandc](img/kandc.jpg)

[Texas Monthly, Music Monday: Uncovering The Mystery Of The King & Carter Jazzing Orchestra](https://www.texasmonthly.com/the-daily-post/music-monday-uncovering-the-mystery-of-the-king-carter-jazzing-orchestra/)

## Ensemble machine learning

"Ensemble machine learning methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms." [H2O.ai ensemble example](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html)

In this manner, SuperLearner ensembles are powerful tools because they: 
* elucidate issues of algorithmic bias and variance
* circumvent bias introduced by selecting single models
* offer a means to optimize prediction through the stacking/blending of weaker models
* allow for comparison of multiple algorithms, and/or comparison of the same model but tuned in many different ways
* utilize a second-level algorithm that produces an ideal weighted prediction that is suitable for data of virtually all distributions and uses cross-validation to prevent overfitting

The below example utilizes the h2o package, and requires Java to be installed on your machine.
* install Java: https://www.java.com/en/download/help/mac_install.html
* h2o SuperLearner example: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html

Check out some other great tutorials: 
* Python mlens library: https://mlens.readthedocs.io/en/0.1.x/install/
* Machine Learning Mastery: https://machinelearningmastery.com/super-learner-ensemble-in-python/
* KDNuggets: https://www.kdnuggets.com/2018/02/introduction-python-ensembles.html/2#comments

The quintessential R guide: 
* Guide to SuperLearner: https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html

Read the papers: 
* [Van der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super Learner. Stat. Appl. Genet. Mol. Biol. 2007, 6, 1–21.](https://www.degruyter.com/document/doi/10.2202/1544-6115.1309/html)
* [Polley, E.C.; van der Laan, M.J. Super Learner in Prediction, UC Berkeley Division of Biostatistics Working Paper Series Paper 266.](https://biostats.bepress.com/ucbbiostat/paper266)

## H2O SuperLearner ensemble

In [49]:
# !pip install h2o

# Requires install of Java
# https://www.java.com/en/download/help/mac_install.html

In [21]:
import h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator
from h2o.grid.grid_search import H2OGridSearch
from __future__ import print_function
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O_cluster_uptime:,3 mins 11 secs
H2O_cluster_timezone:,America/Los_Angeles
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.34.0.1
H2O_cluster_version_age:,5 months and 8 days !!!
H2O_cluster_name:,H2O_from_python_evanmuzzall_gwre63
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.549 Gb
H2O_cluster_total_cores:,16
H2O_cluster_allowed_cores:,16


In [22]:
# Import a sample binary outcome train/test set into H2O
train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


In [3]:
train

response,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28
1,0.869293,-0.635082,0.22569,0.32747,-0.689993,0.754202,-0.248573,-1.09206,0.0,1.37499,-0.653674,0.930349,1.10744,1.1389,-1.5782,-1.04699,0.0,0.65793,-0.0104546,-0.0457672,3.10196,1.35376,0.979563,0.978076,0.920005,0.721657,0.988751,0.876678
1,0.907542,0.329147,0.359412,1.49797,-0.31301,1.09553,-0.557525,-1.58823,2.17308,0.812581,-0.213642,1.27101,2.21487,0.499994,-1.26143,0.732156,0.0,0.398701,-1.13893,-0.00081911,0.0,0.30222,0.833048,0.9857,0.978098,0.779732,0.992356,0.798343
1,0.798835,1.47064,-1.63597,0.453773,0.425629,1.10487,1.28232,1.38166,0.0,0.851737,1.54066,-0.81969,2.21487,0.99349,0.35608,-0.208778,2.54822,1.25695,1.12885,0.900461,0.0,0.909753,1.10833,0.985692,0.951331,0.803252,0.865924,0.780118
0,1.34438,-0.876626,0.935913,1.99205,0.882454,1.78607,-1.64678,-0.942383,0.0,2.42326,-0.676016,0.736159,2.21487,1.29872,-1.43074,-0.364658,0.0,0.745313,-0.678379,-1.36036,0.0,0.946652,1.0287,0.998656,0.728281,0.8692,1.02674,0.957904
1,1.10501,0.321356,1.5224,0.882808,-1.20535,0.681466,-1.07046,-0.921871,0.0,0.800872,1.02097,0.971407,2.21487,0.596761,-0.350273,0.631194,0.0,0.479999,-0.373566,0.113041,0.0,0.755856,1.36106,0.98661,0.838085,1.1333,0.872245,0.808487
0,1.59584,-0.607811,0.00707492,1.81845,-0.111906,0.84755,-0.566437,1.58124,2.17308,0.755421,0.64311,1.42637,0.0,0.921661,-1.19043,-1.61559,0.0,0.651114,-0.654227,-1.27434,3.10196,0.823761,0.938191,0.971758,0.789176,0.430553,0.961357,0.957818
1,0.409391,-1.88468,-1.02729,1.67245,-1.6046,1.33801,0.0554274,0.0134659,2.17308,0.509783,-1.03834,0.707862,0.0,0.746918,-0.358465,-1.64665,0.0,0.367058,0.0694965,1.37713,3.10196,0.869418,1.22208,1.00063,0.545045,0.698653,0.977314,0.828786
1,0.933895,0.62913,0.527535,0.238033,-0.966569,0.547811,-0.0594392,-1.70687,2.17308,0.941003,-2.65373,-0.15722,0.0,1.03037,-0.175505,0.523021,2.54822,1.37355,1.29125,-1.46745,0.0,0.901837,1.08367,0.979696,0.7833,0.849195,0.894356,0.774879
1,1.40514,0.536603,0.689554,1.17957,-0.110061,3.2024,-1.52696,-1.57603,0.0,2.93154,0.567342,-0.130033,2.21487,1.78712,0.899499,0.585151,2.54822,0.401865,-0.151202,1.16349,0.0,1.66707,4.03927,1.17583,1.04535,1.54297,3.53483,2.74075
1,1.17657,0.104161,1.397,0.479721,0.265513,1.13556,1.53483,-0.253291,0.0,1.02725,0.534316,1.18002,0.0,2.40566,0.0875568,-0.976534,2.54822,1.25038,0.268541,0.530334,0.0,0.833175,0.773968,0.98575,1.1037,0.84914,0.937104,0.812364




In [23]:
print(train.shape)
print(test.shape)

(10000, 29)
(5000, 29)


In [24]:
# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)

In [25]:
# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

In [26]:
# Number of CV folds (to generate level-one data for stacking)
nfolds = 5

In [27]:
# There are a few ways to assemble a list of models to stack together:
# 1. Train individual models and put them in a list
# 2. Train a grid of models
# 3. Train several grids of models
# Note: All base models must have the same cross-validation folds and
# the cross-validated predicted values must be kept.


# 1. Generate a 2-model ensemble (GBM + RF)

# Train and cross-validate a GBM
my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli",
                                      ntrees=10,
                                      max_depth=3,
                                      min_rows=2,
                                      learn_rate=0.2,
                                      nfolds=nfolds,
                                      fold_assignment="Modulo",
                                      keep_cross_validation_predictions=True,
                                      seed=1)
my_gbm.train(x=x, y=y, training_frame=train)

gbm Model Build progress: |██████████████████████████████████████████████████████| (done) 100%
Model Details
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  GBM_model_python_1645639753735_1


Model Summary: 


Unnamed: 0,Unnamed: 1,number_of_trees,number_of_internal_trees,model_size_in_bytes,min_depth,max_depth,mean_depth,min_leaves,max_leaves,mean_leaves
0,,10.0,10.0,1580.0,3.0,3.0,3.0,8.0,8.0,8.0




ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.20052170746266884
RMSE: 0.44779650228945383
LogLoss: 0.5879177464092424
Mean Per-Class Error: 0.29613223631461116
AUC: 0.7735466157694937
AUCPR: 0.7909110775032231
Gini: 0.5470932315389874

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.4424199442800167: 


Unnamed: 0,Unnamed: 1,0,1,Error,Rate
0,0,2433.0,2272.0,0.4829,(2272.0/4705.0)
1,1,818.0,4477.0,0.1545,(818.0/5295.0)
2,Total,3251.0,6749.0,0.309,(3090.0/10000.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.44242,0.743441,244.0
1,max f2,0.335831,0.858813,324.0
2,max f0point5,0.542377,0.726119,166.0
3,max accuracy,0.5074,0.7042,192.0
4,max precision,0.790473,0.973684,3.0
5,max recall,0.158255,1.0,394.0
6,max specificity,0.803933,0.999575,0.0
7,max absolute_mcc,0.51933,0.407039,183.0
8,max min_per_class_accuracy,0.513347,0.703507,187.0
9,max mean_per_class_accuracy,0.51933,0.703868,183.0



Gains/Lift Table: Avg response rate: 52.95 %, avg score: 52.93 %


Unnamed: 0,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
0,1,0.0165,0.788097,1.831345,1.831345,0.969697,0.794128,0.969697,0.794128,0.030217,0.030217,83.134461,83.134461,0.029154
1,2,0.0309,0.779039,1.770538,1.803008,0.9375,0.780161,0.954693,0.787619,0.025496,0.055713,77.053824,80.300766,0.052737
2,3,0.0403,0.769898,1.727844,1.785476,0.914894,0.776514,0.945409,0.785029,0.016242,0.071955,72.784441,78.547579,0.067279
3,4,0.052,0.766379,1.662591,1.757827,0.880342,0.767629,0.930769,0.781114,0.019452,0.091407,66.25909,75.782669,0.083756
4,5,0.1019,0.733553,1.680415,1.719918,0.88978,0.746516,0.910697,0.764171,0.083853,0.17526,68.041465,71.991834,0.155919
5,6,0.1501,0.703064,1.586872,1.677195,0.840249,0.720131,0.888075,0.750029,0.076487,0.251747,58.687245,67.719474,0.21604
6,7,0.21,0.672701,1.475547,1.619677,0.781302,0.68196,0.857619,0.730613,0.088385,0.340132,47.554706,61.967714,0.276583
7,8,0.3003,0.631813,1.294604,1.521928,0.685493,0.653431,0.805861,0.707405,0.116903,0.457035,29.460397,52.192787,0.333124
8,9,0.4003,0.582373,1.242682,1.452169,0.658,0.607055,0.768923,0.682336,0.124268,0.581303,24.268178,45.216866,0.384704
9,10,0.5,0.519662,1.102458,1.382436,0.583751,0.552458,0.732,0.656438,0.109915,0.691218,10.245751,38.243626,0.406415




ModelMetricsBinomial: gbm
** Reported on cross-validation data. **

MSE: 0.20591759601729728
RMSE: 0.4537814408030559
LogLoss: 0.5996635025616064
Mean Per-Class Error: 0.3109957762972908
AUC: 0.752800639024444
AUCPR: 0.7705701530928649
Gini: 0.5056012780488881

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.4159392211190996: 


Unnamed: 0,Unnamed: 1,0,1,Error,Rate
0,0,1696.0,3009.0,0.6395,(3009.0/4705.0)
1,1,538.0,4757.0,0.1016,(538.0/5295.0)
2,Total,2234.0,7766.0,0.3547,(3547.0/10000.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.415939,0.728428,268.0
1,max f2,0.279109,0.855982,346.0
2,max f0point5,0.561962,0.711702,154.0
3,max accuracy,0.517902,0.6888,186.0
4,max precision,0.835804,1.0,0.0
5,max recall,0.158348,1.0,392.0
6,max specificity,0.835804,1.0,0.0
7,max absolute_mcc,0.517902,0.377372,186.0
8,max min_per_class_accuracy,0.515693,0.687354,188.0
9,max mean_per_class_accuracy,0.519427,0.689004,185.0



Gains/Lift Table: Avg response rate: 52.95 %, avg score: 52.91 %


Unnamed: 0,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
0,1,0.0101,0.789519,1.776382,1.776382,0.940594,0.800149,0.940594,0.800149,0.017941,0.017941,77.63816,77.63816,0.016666
1,2,0.0204,0.781367,1.815231,1.795997,0.961165,0.78539,0.95098,0.792698,0.018697,0.036638,81.523144,79.599696,0.034513
2,3,0.0335,0.77806,1.715575,1.764548,0.908397,0.779096,0.934328,0.787379,0.022474,0.059112,71.557497,76.454836,0.054436
3,4,0.0407,0.771218,1.678733,1.749367,0.888889,0.7742,0.92629,0.785047,0.012087,0.071199,67.873256,74.936719,0.064823
4,5,0.0523,0.764063,1.693204,1.73691,0.896552,0.766315,0.919694,0.780893,0.019641,0.09084,69.320439,73.691043,0.081914
5,6,0.101,0.72984,1.60936,1.675408,0.852156,0.745647,0.887129,0.763898,0.078376,0.169216,60.935988,67.540833,0.144987
6,7,0.1501,0.699633,1.565478,1.639448,0.828921,0.716835,0.868088,0.748503,0.076865,0.246081,56.547794,63.944843,0.203998
7,8,0.2,0.675508,1.426839,1.586402,0.755511,0.68607,0.84,0.732926,0.071199,0.31728,42.683857,58.640227,0.249268
8,9,0.3,0.628903,1.318225,1.49701,0.698,0.649476,0.792667,0.705109,0.131822,0.449103,31.822474,49.700976,0.316903
9,10,0.4,0.579228,1.21813,1.42729,0.645,0.605317,0.75575,0.680161,0.121813,0.570916,21.813031,42.72899,0.363265




Cross-Validation Metrics Summary: 


Unnamed: 0,Unnamed: 1,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
0,accuracy,0.662,0.011028,0.6605,0.659,0.6475,0.665,0.678
1,auc,0.753708,0.008455,0.767336,0.754202,0.749615,0.744636,0.752753
2,err,0.338,0.011028,0.3395,0.341,0.3525,0.335,0.322
3,err_count,676.0,22.056746,679.0,682.0,705.0,670.0,644.0
4,f0point5,0.667733,0.006292,0.664405,0.665501,0.66057,0.676335,0.671854
5,f1,0.732593,0.01129,0.734662,0.743802,0.733056,0.737666,0.713778
6,f2,0.812088,0.030635,0.821535,0.842984,0.823409,0.81123,0.761282
7,lift_top_group,1.770414,0.11952,1.897533,1.715529,1.677038,1.660517,1.901455
8,logloss,0.599664,0.005876,0.590319,0.597491,0.603376,0.604522,0.60261
9,max_per_class_error,0.581695,0.083577,0.597252,0.647312,0.644951,0.576419,0.44254



Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error
0,,2022-02-23 10:12:55,1.846 sec,0.0,0.499129,0.691406,0.5,0.5295,1.0,0.4705
1,,2022-02-23 10:12:55,1.925 sec,1.0,0.488236,0.669779,0.689191,0.67497,1.340835,0.3927
2,,2022-02-23 10:12:55,1.945 sec,2.0,0.479807,0.653146,0.719111,0.725388,1.54503,0.3958
3,,2022-02-23 10:12:55,1.962 sec,3.0,0.473698,0.641037,0.72465,0.731284,1.569032,0.3933
4,,2022-02-23 10:12:55,1.984 sec,4.0,0.467046,0.627816,0.746638,0.760063,1.683844,0.3692
5,,2022-02-23 10:12:55,2.003 sec,5.0,0.462994,0.619629,0.750899,0.766012,1.696473,0.3404
6,,2022-02-23 10:12:55,2.022 sec,6.0,0.458301,0.609973,0.756603,0.772386,1.733566,0.3515
7,,2022-02-23 10:12:55,2.043 sec,7.0,0.455569,0.604192,0.758101,0.773795,1.748075,0.3512
8,,2022-02-23 10:12:55,2.067 sec,8.0,0.452777,0.598335,0.763715,0.778986,1.754413,0.318
9,,2022-02-23 10:12:55,2.091 sec,9.0,0.449743,0.591929,0.768491,0.784821,1.776659,0.3207



Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,x26,565.332336,1.0,0.410859
1,x28,204.503998,0.361741,0.148625
2,x27,189.230789,0.334725,0.137525
3,x23,124.527618,0.220273,0.090501
4,x6,121.551117,0.215008,0.088338
5,x25,99.598267,0.176176,0.072384
6,x4,44.359753,0.078467,0.032239
7,x10,18.008587,0.031855,0.013088
8,x22,3.660778,0.006475,0.00266
9,x18,3.097153,0.005478,0.002251



See the whole table with table.as_data_frame()




In [28]:
# Train and cross-validate a RF
my_rf = H2ORandomForestEstimator(ntrees=50,
                                 nfolds=nfolds,
                                 fold_assignment="Modulo",
                                 keep_cross_validation_predictions=True,
                                 seed=1)
my_rf.train(x=x, y=y, training_frame=train)

drf Model Build progress: |██████████████████████████████████████████████████████| (done) 100%
Model Details
H2ORandomForestEstimator :  Distributed Random Forest
Model Key:  DRF_model_python_1645639753735_129


Model Summary: 


Unnamed: 0,Unnamed: 1,number_of_trees,number_of_internal_trees,model_size_in_bytes,min_depth,max_depth,mean_depth,min_leaves,max_leaves,mean_leaves
0,,50.0,50.0,943584.0,20.0,20.0,20.0,1402.0,1583.0,1496.78




ModelMetricsBinomial: drf
** Reported on train data. **

MSE: 0.20546981364562208
RMSE: 0.45328778236967965
LogLoss: 0.6008794722074892
Mean Per-Class Error: 0.3224386087972231
AUC: 0.7416906451357175
AUCPR: 0.7551298236228338
Gini: 0.48338129027143495

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3888572089486607: 


Unnamed: 0,Unnamed: 1,0,1,Error,Rate
0,0,1938.0,2767.0,0.5881,(2767.0/4705.0)
1,1,662.0,4633.0,0.125,(662.0/5295.0)
2,Total,2600.0,7400.0,0.3429,(3429.0/10000.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.388857,0.729894,274.0
1,max f2,0.178865,0.852754,362.0
2,max f0point5,0.583101,0.697594,175.0
3,max accuracy,0.491198,0.6804,222.0
4,max precision,0.999849,1.0,0.0
5,max recall,4e-05,1.0,399.0
6,max specificity,0.999849,1.0,0.0
7,max absolute_mcc,0.491198,0.356604,222.0
8,max min_per_class_accuracy,0.526118,0.675027,205.0
9,max mean_per_class_accuracy,0.520683,0.677561,208.0



Gains/Lift Table: Avg response rate: 52.95 %, avg score: 52.91 %


Unnamed: 0,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
0,1,0.01,0.945952,1.794145,1.794145,0.95,0.978593,0.95,0.978593,0.017941,0.017941,79.414542,79.414542,0.016879
1,2,0.0208,0.923077,1.713706,1.752379,0.907407,0.935488,0.927885,0.956212,0.018508,0.036449,71.370615,75.237888,0.033261
2,3,0.03,0.9,1.703822,1.737488,0.902174,0.909819,0.92,0.941984,0.015675,0.052125,70.382231,73.74882,0.047024
3,4,0.0404,0.882353,1.634343,1.710936,0.865385,0.890502,0.905941,0.928732,0.016997,0.069122,63.434299,71.093597,0.061045
4,5,0.05,0.867742,1.593484,1.688385,0.84375,0.874732,0.894,0.918364,0.015297,0.084419,59.348442,68.838527,0.073155
5,6,0.1,0.809095,1.567517,1.627951,0.83,0.835855,0.862,0.877109,0.078376,0.162795,56.751653,62.79509,0.133465
6,7,0.15,0.760766,1.507082,1.587661,0.798,0.783973,0.840667,0.846064,0.075354,0.238149,50.708215,58.766132,0.187352
7,8,0.2,0.72,1.401322,1.541076,0.742,0.738881,0.816,0.819268,0.070066,0.308215,40.1322,54.107649,0.230001
8,9,0.3022,0.647059,1.33235,1.470488,0.705479,0.681472,0.778623,0.772667,0.136166,0.444381,33.235024,47.048806,0.302192
9,10,0.4,0.585714,1.181807,1.399906,0.625767,0.615625,0.74125,0.734271,0.115581,0.559962,18.180712,39.990557,0.339983




ModelMetricsBinomial: drf
** Reported on cross-validation data. **

MSE: 0.19956810017797458
RMSE: 0.4467304558433134
LogLoss: 0.5833973417660576
Mean Per-Class Error: 0.30854313866569527
AUC: 0.7614513722267212
AUCPR: 0.777397370947399
Gini: 0.5229027444534424

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.382028770186007: 


Unnamed: 0,Unnamed: 1,0,1,Error,Rate
0,0,1842.0,2863.0,0.6085,(2863.0/4705.0)
1,1,526.0,4769.0,0.0993,(526.0/5295.0)
2,Total,2368.0,7632.0,0.3389,(3389.0/10000.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.382029,0.737836,289.0
1,max f2,0.204063,0.85472,367.0
2,max f0point5,0.571824,0.716855,182.0
3,max accuracy,0.505138,0.6903,220.0
4,max precision,0.977634,1.0,0.0
5,max recall,0.100147,1.0,391.0
6,max specificity,0.977634,1.0,0.0
7,max absolute_mcc,0.559896,0.384004,189.0
8,max min_per_class_accuracy,0.521841,0.68933,210.0
9,max mean_per_class_accuracy,0.55385,0.691457,192.0



Gains/Lift Table: Avg response rate: 52.95 %, avg score: 52.88 %


Unnamed: 0,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
0,1,0.01,0.901617,1.77526,1.77526,0.94,0.929766,0.94,0.929766,0.017753,0.017753,77.525968,77.525968,0.016477
1,2,0.02,0.87734,1.77526,1.77526,0.94,0.88955,0.94,0.909658,0.017753,0.035505,77.525968,77.525968,0.032955
2,3,0.03,0.857939,1.737488,1.762669,0.92,0.86649,0.933333,0.895269,0.017375,0.05288,73.74882,76.266918,0.048629
3,4,0.0416,0.84,1.709485,1.747839,0.905172,0.848993,0.925481,0.882365,0.01983,0.07271,70.94852,74.783904,0.066121
4,5,0.05,0.83065,1.776159,1.752597,0.940476,0.835844,0.928,0.87455,0.01492,0.08763,77.6159,75.259679,0.079978
5,6,0.1,0.777339,1.650614,1.701605,0.874,0.801188,0.901,0.837869,0.082531,0.170161,65.061379,70.160529,0.149119
6,7,0.15,0.735883,1.552408,1.651873,0.822,0.754507,0.874667,0.810082,0.07762,0.247781,55.240793,65.187284,0.207823
7,8,0.2046,0.7,1.397406,1.583965,0.739927,0.715986,0.83871,0.784971,0.076298,0.324079,39.74065,58.39654,0.253941
8,9,0.3041,0.64,1.359014,1.510362,0.719598,0.668698,0.799737,0.746927,0.135222,0.459301,35.901415,51.036247,0.329864
9,10,0.4,0.581753,1.219007,1.44051,0.645464,0.610507,0.76275,0.71422,0.116903,0.576204,21.900666,44.050992,0.374504




Cross-Validation Metrics Summary: 


Unnamed: 0,Unnamed: 1,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
0,accuracy,0.6648,0.011719,0.6725,0.669,0.644,0.6695,0.669
1,auc,0.761771,0.006352,0.767895,0.759037,0.751985,0.764674,0.765263
2,err,0.3352,0.011719,0.3275,0.331,0.356,0.3305,0.331
3,err_count,670.4,23.43715,655.0,662.0,712.0,661.0,662.0
4,f0point5,0.668341,0.009353,0.673882,0.672582,0.656944,0.678371,0.659929
5,f1,0.74028,0.00814,0.740388,0.75,0.737463,0.745083,0.728466
6,f2,0.829747,0.014114,0.82146,0.847559,0.840477,0.826347,0.812889
7,lift_top_group,1.778484,0.163896,1.802657,1.869159,1.668211,1.568266,1.984127
8,logloss,0.583397,0.004707,0.576402,0.583977,0.589635,0.58394,0.583033
9,max_per_class_error,0.604207,0.055881,0.565539,0.629032,0.687296,0.592795,0.546371



Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error
0,,2022-02-23 10:13:13,10.307 sec,0.0,,,,,,
1,,2022-02-23 10:13:13,10.337 sec,1.0,0.642818,14.048395,0.577384,0.589593,1.144218,0.464917
2,,2022-02-23 10:13:13,10.369 sec,2.0,0.615404,11.895281,0.595767,0.598674,1.167648,0.470126
3,,2022-02-23 10:13:13,10.403 sec,3.0,0.596109,10.322973,0.606984,0.606263,1.188141,0.471944
4,,2022-02-23 10:13:13,10.435 sec,4.0,0.578952,8.963477,0.617051,0.613281,1.205225,0.47326
5,,2022-02-23 10:13:13,10.468 sec,5.0,0.566757,7.823122,0.621447,0.619131,1.222342,0.47265
6,,2022-02-23 10:13:13,10.500 sec,6.0,0.555494,6.782681,0.625539,0.623303,1.235544,0.472576
7,,2022-02-23 10:13:13,10.531 sec,7.0,0.542668,5.762143,0.63419,0.632386,1.256193,0.470778
8,,2022-02-23 10:13:13,10.565 sec,8.0,0.534223,5.001247,0.637938,0.63483,1.257749,0.470999
9,,2022-02-23 10:13:13,10.599 sec,9.0,0.52593,4.32727,0.643458,0.638581,1.259426,0.41698



See the whole table with table.as_data_frame()

Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,x26,7803.849121,1.0,0.101371
1,x27,4741.325195,0.607562,0.061589
2,x28,4625.907227,0.592773,0.06009
3,x23,3613.875244,0.463089,0.046944
4,x25,3548.924072,0.454766,0.0461
5,x6,3477.234863,0.445579,0.045169
6,x4,3103.802002,0.397727,0.040318
7,x1,2902.952637,0.37199,0.037709
8,x10,2854.255615,0.36575,0.037077
9,x7,2711.995605,0.34752,0.035229



See the whole table with table.as_data_frame()




In [29]:
# Train a stacked ensemble using the GBM and GLM above
ensemble = H2OStackedEnsembleEstimator(model_id="my_ensemble_binomial",
                                       base_models=[my_gbm, my_rf])
ensemble.train(x=x, y=y, training_frame=train)

# Eval ensemble performance on the test data
perf_stack_test = ensemble.model_performance(test)

stackedensemble Model Build progress: |██████████████████████████████████████████| (done) 100%


In [30]:
# Compare to base learner performance on the test set
perf_gbm_test = my_gbm.model_performance(test)
perf_rf_test = my_rf.model_performance(test)
baselearner_best_auc_test = max(perf_gbm_test.auc(), perf_rf_test.auc())
stack_auc_test = perf_stack_test.auc()
print("Best Base-learner Test AUC:  {0}".format(baselearner_best_auc_test))
print("Ensemble Test AUC:  {0}".format(stack_auc_test))

Best Base-learner Test AUC:  0.769204725074508
Ensemble Test AUC:  0.7731183158978566


In [31]:
# Generate predictions on a test set (if neccessary)
pred = ensemble.predict(test)


# 2. Generate a random grid of models and stack them together

# Specify GBM hyperparameters for the grid
hyper_params = {"learn_rate": [0.01, 0.03],
                "max_depth": [3, 4, 5, 6, 9],
                "sample_rate": [0.7, 0.8, 0.9, 1.0],
                "col_sample_rate": [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]}
search_criteria = {"strategy": "RandomDiscrete", "max_models": 3, "seed": 1}

stackedensemble prediction progress: |███████████████████████████████████████████| (done) 100%


In [32]:
# Train the grid
grid = H2OGridSearch(model=H2OGradientBoostingEstimator(ntrees=10,
                                                        seed=1,
                                                        nfolds=nfolds,
                                                        fold_assignment="Modulo",
                                                        keep_cross_validation_predictions=True),
                     hyper_params=hyper_params,
                     search_criteria=search_criteria,
                     grid_id="gbm_grid_binomial")
grid.train(x=x, y=y, training_frame=train)

gbm Grid Build progress: |███████████████████████████████████████████████████████| (done) 100%
    col_sample_rate learn_rate max_depth sample_rate  \
0               0.4       0.03         3         0.7   
1               0.2       0.03         4         0.8   
2               0.7       0.01         5         0.9   

                   model_ids             logloss  
0  gbm_grid_binomial_model_3   0.667209869083727  
1  gbm_grid_binomial_model_2  0.6742791229554091  
2  gbm_grid_binomial_model_1  0.6770690446978346  




In [33]:
# Train a stacked ensemble using the GBM grid
ensemble = H2OStackedEnsembleEstimator(model_id="my_ensemble_gbm_grid_binomial",
                                       base_models=grid.model_ids)
ensemble.train(x=x, y=y, training_frame=train)

# Eval ensemble performance on the test data
perf_stack_test = ensemble.model_performance(test)

# Compare to base learner performance on the test set
baselearner_best_auc_test = max([h2o.get_model(model).model_performance(test_data=test).auc() for model in grid.model_ids])
stack_auc_test = perf_stack_test.auc()
print("Best Base-learner Test AUC:  {0}".format(baselearner_best_auc_test))
print("Ensemble Test AUC:  {0}".format(stack_auc_test))

# Generate predictions on a test set (if neccessary)
pred = ensemble.predict(test)

stackedensemble Model Build progress: |██████████████████████████████████████████| (done) 100%
Best Base-learner Test AUC:  0.748146530400473
Ensemble Test AUC:  0.7510921003414699
stackedensemble prediction progress: |███████████████████████████████████████████| (done) 100%


## Deep learning basics

Deep learning is a subfield of machine learning that uses a variety of multi-layered artificial neural networks to model datasets and predict outcomes. Deep learning is ideal for numeric, text, image, video, and sound data because deep representative networks store these data as large matrices and recycle error to make better predictions during the next epoch. 

To understand deep networks, let's start with a toy example of a single feed forward neural network - a perceptron.

Read Goodfellow et al's Deep Learning Book to learn more: https://www.deeplearningbook.org/

In [46]:
# generate toy dataset
example = {'x1': [1, 0, 1, 1, 0], 
           'x2': [1, 1, 1, 1, 0], 
           'xm': [1, 0, 1, 1, 0],
           'y': ['yes', 'no', 'yes', 'yes', 'no']
           }
example_df = pd.DataFrame(data = example)
example_df

Unnamed: 0,x1,x2,xm,y
0,1,1,1,yes
1,0,1,0,no
2,1,1,1,yes
3,1,1,1,yes
4,0,0,0,no


![perceptron](img/perceptron.png)

Perceptron figure modified from [Sebastian Raschka's Single-Layer Neural Networks and Gradient Descent](https://sebastianraschka.com/Articles/2015_singlelayer_neurons.html)

Perceptron key terms: 
* **Layer:** the network typology of a deep learning model, usually divided into variations of input, hidden, preprocessing, encoder/decoder, and output. 
* **Inputs:** the features/covariates for a single observation. These are just the individual cells in a dataframe (the 1's and 0s from `example_df` above), but they could be words from a text or pixels from an image. 
* **Weights:** the learnable parameters of a model that connect the input layer to the output via the net input (summation) and activation functions. 
* **Bias term:** A placeholder "1" assures that we do not receive 0 outputs by default. 
* **Net input function:** computes the weighted sum of the input layer. 
* **Activation function:** determine if a neuron should be fired or not. In binary classification for example, this means should a 1 or 0 be output?
* Output: one node that contains the y prediction
* **Error:** how far off an output prediction was. The weights can be updated by adjusting the learning rate based on the error to reduce it for the next epoch

## What makes a network "deep"?

A "deep" network is just network with multiple/many hidden layers for handling potential nonlinear transformations.

* Fully connected layer: a layer where all nodes are connected to every node in the next layer (as indicated by the purple arrows 

![deep](img/deep.png)

Example of "deep" network with two hidden layers modified from [DevSkrol's Artificial Neural Network Explained with an Regression Example](https://devskrol.com/2020/11/22/388/)

>NOTE: Bias term not shown for some reason!

Let's go through François Chollet's "Image classification from scratch" [tutorial](https://keras.io/examples/vision/image_classification_from_scratch/) to examine this architecture to predict images of cats versus dogs. 

[Click here to open the Colab notebook](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/image_classification_from_scratch.ipynb)

You should also check out his deep learning book! https://www.manning.com/books/deep-learning-with-python-second-edition

![dogcat](img/dogcat.jpg)