# Chapter 5 - Ensemble machine learning, deep learning

2022 February 23

![kandc](img/kandc.jpg)

[Texas Monthly, Music Monday: Uncovering The Mystery Of The King & Carter Jazzing Orchestra](https://www.texasmonthly.com/the-daily-post/music-monday-uncovering-the-mystery-of-the-king-carter-jazzing-orchestra/)

## Ensemble machine learning

"Ensemble machine learning methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms." [H2O.ai ensemble example](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html)

In this manner, SuperLearner ensembles are powerful tools because they: 
* elucidate issues of algorithmic bias and variance
* circumvent bias introduced by selecting single models
* offer a means to optimize prediction through the stacking/blending of weaker models
* allow for comparison of multiple algorithms, and/or comparison of the same model but tuned in many different ways
* utilize a second-level algorithm that produces an ideal weighted prediction that is suitable for data of virtually all distributions and uses cross-validation to prevent overfitting

The below example utilizes the h2o package, and requires Java to be installed on your machine.
* install Java: https://www.java.com/en/download/help/mac_install.html
* h2o SuperLearner example: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html

Check out some other great tutorials: 
* Python mlens library: https://mlens.readthedocs.io/en/0.1.x/install/
* Machine Learning Mastery: https://machinelearningmastery.com/super-learner-ensemble-in-python/
* KDNuggets: https://www.kdnuggets.com/2018/02/introduction-python-ensembles.html/2#comments

The quintessential R guide: 
* Guide to SuperLearner: https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html

Read the papers: 
* [Van der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super Learner. Stat. Appl. Genet. Mol. Biol. 2007, 6, 1–21.](https://www.degruyter.com/document/doi/10.2202/1544-6115.1309/html)
* [Polley, E.C.; van der Laan, M.J. Super Learner in Prediction, UC Berkeley Division of Biostatistics Working Paper Series Paper 266.](https://biostats.bepress.com/ucbbiostat/paper266)

## H2O SuperLearner ensemble

In [49]:
# !pip install h2o

# Requires install of Java
# https://www.java.com/en/download/help/mac_install.html

In [11]:
import h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator
from h2o.grid.grid_search import H2OGridSearch
from __future__ import print_function
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O_cluster_uptime:,20 mins 05 secs
H2O_cluster_timezone:,America/Los_Angeles
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.34.0.1
H2O_cluster_version_age:,5 months and 8 days !!!
H2O_cluster_name:,H2O_from_python_evanmuzzall_3e9b6s
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,2.861 Gb
H2O_cluster_total_cores:,16
H2O_cluster_allowed_cores:,16


In [19]:
# Import a sample binary outcome train/test set into H2O
# Learn about subset of Higgs Boson dataset: https://www.kaggle.com/c/higgs-boson
train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

UsageError: Line magic function `%capture` not found (But cell magic `%%capture` exists, did you mean that instead?).


In [13]:
train

response,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28
1,0.869293,-0.635082,0.22569,0.32747,-0.689993,0.754202,-0.248573,-1.09206,0.0,1.37499,-0.653674,0.930349,1.10744,1.1389,-1.5782,-1.04699,0.0,0.65793,-0.0104546,-0.0457672,3.10196,1.35376,0.979563,0.978076,0.920005,0.721657,0.988751,0.876678
1,0.907542,0.329147,0.359412,1.49797,-0.31301,1.09553,-0.557525,-1.58823,2.17308,0.812581,-0.213642,1.27101,2.21487,0.499994,-1.26143,0.732156,0.0,0.398701,-1.13893,-0.00081911,0.0,0.30222,0.833048,0.9857,0.978098,0.779732,0.992356,0.798343
1,0.798835,1.47064,-1.63597,0.453773,0.425629,1.10487,1.28232,1.38166,0.0,0.851737,1.54066,-0.81969,2.21487,0.99349,0.35608,-0.208778,2.54822,1.25695,1.12885,0.900461,0.0,0.909753,1.10833,0.985692,0.951331,0.803252,0.865924,0.780118
0,1.34438,-0.876626,0.935913,1.99205,0.882454,1.78607,-1.64678,-0.942383,0.0,2.42326,-0.676016,0.736159,2.21487,1.29872,-1.43074,-0.364658,0.0,0.745313,-0.678379,-1.36036,0.0,0.946652,1.0287,0.998656,0.728281,0.8692,1.02674,0.957904
1,1.10501,0.321356,1.5224,0.882808,-1.20535,0.681466,-1.07046,-0.921871,0.0,0.800872,1.02097,0.971407,2.21487,0.596761,-0.350273,0.631194,0.0,0.479999,-0.373566,0.113041,0.0,0.755856,1.36106,0.98661,0.838085,1.1333,0.872245,0.808487
0,1.59584,-0.607811,0.00707492,1.81845,-0.111906,0.84755,-0.566437,1.58124,2.17308,0.755421,0.64311,1.42637,0.0,0.921661,-1.19043,-1.61559,0.0,0.651114,-0.654227,-1.27434,3.10196,0.823761,0.938191,0.971758,0.789176,0.430553,0.961357,0.957818
1,0.409391,-1.88468,-1.02729,1.67245,-1.6046,1.33801,0.0554274,0.0134659,2.17308,0.509783,-1.03834,0.707862,0.0,0.746918,-0.358465,-1.64665,0.0,0.367058,0.0694965,1.37713,3.10196,0.869418,1.22208,1.00063,0.545045,0.698653,0.977314,0.828786
1,0.933895,0.62913,0.527535,0.238033,-0.966569,0.547811,-0.0594392,-1.70687,2.17308,0.941003,-2.65373,-0.15722,0.0,1.03037,-0.175505,0.523021,2.54822,1.37355,1.29125,-1.46745,0.0,0.901837,1.08367,0.979696,0.7833,0.849195,0.894356,0.774879
1,1.40514,0.536603,0.689554,1.17957,-0.110061,3.2024,-1.52696,-1.57603,0.0,2.93154,0.567342,-0.130033,2.21487,1.78712,0.899499,0.585151,2.54822,0.401865,-0.151202,1.16349,0.0,1.66707,4.03927,1.17583,1.04535,1.54297,3.53483,2.74075
1,1.17657,0.104161,1.397,0.479721,0.265513,1.13556,1.53483,-0.253291,0.0,1.02725,0.534316,1.18002,0.0,2.40566,0.0875568,-0.976534,2.54822,1.25038,0.268541,0.530334,0.0,0.833175,0.773968,0.98575,1.1037,0.84914,0.937104,0.812364




In [14]:
print(train.shape)
print(test.shape)

(10000, 29)
(5000, 29)


In [15]:
# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)

In [16]:
# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

In [17]:
# Number of CV folds (to generate level-one data for stacking)
nfolds = 5

In [18]:
# There are a few ways to assemble a list of models to stack together:
# 1. Train individual models and put them in a list
# 2. Train a grid of models
# 3. Train several grids of models
# Note: All base models must have the same cross-validation folds and
# the cross-validated predicted values must be kept.


# 1. Generate a 2-model ensemble (GBM + RF)

# Train and cross-validate a GBM
my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli",
                                      ntrees=10,
                                      max_depth=3,
                                      min_rows=2,
                                      learn_rate=0.2,
                                      nfolds=nfolds,
                                      fold_assignment="Modulo",
                                      keep_cross_validation_predictions=True,
                                      seed=1)
my_gbm.train(x=x, y=y, training_frame=train)

In [10]:
# Train and cross-validate a RF
my_rf = H2ORandomForestEstimator(ntrees=50,
                                 nfolds=nfolds,
                                 fold_assignment="Modulo",
                                 keep_cross_validation_predictions=True,
                                 seed=1)
my_rf.train(x=x, y=y, training_frame=train)

NameError: name 'nfolds' is not defined

In [29]:
# Train a stacked ensemble using the GBM and GLM above
ensemble = H2OStackedEnsembleEstimator(model_id="my_ensemble_binomial",
                                       base_models=[my_gbm, my_rf])
ensemble.train(x=x, y=y, training_frame=train)

# Eval ensemble performance on the test data
perf_stack_test = ensemble.model_performance(test)

stackedensemble Model Build progress: |██████████████████████████████████████████| (done) 100%


In [30]:
# Compare to base learner performance on the test set
perf_gbm_test = my_gbm.model_performance(test)
perf_rf_test = my_rf.model_performance(test)
baselearner_best_auc_test = max(perf_gbm_test.auc(), perf_rf_test.auc())
stack_auc_test = perf_stack_test.auc()
print("Best Base-learner Test AUC:  {0}".format(baselearner_best_auc_test))
print("Ensemble Test AUC:  {0}".format(stack_auc_test))

Best Base-learner Test AUC:  0.769204725074508
Ensemble Test AUC:  0.7731183158978566


In [31]:
# Generate predictions on a test set (if neccessary)
pred = ensemble.predict(test)


# 2. Generate a random grid of models and stack them together

# Specify GBM hyperparameters for the grid
hyper_params = {"learn_rate": [0.01, 0.03],
                "max_depth": [3, 4, 5, 6, 9],
                "sample_rate": [0.7, 0.8, 0.9, 1.0],
                "col_sample_rate": [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]}
search_criteria = {"strategy": "RandomDiscrete", "max_models": 3, "seed": 1}

stackedensemble prediction progress: |███████████████████████████████████████████| (done) 100%


In [32]:
# Train the grid
grid = H2OGridSearch(model=H2OGradientBoostingEstimator(ntrees=10,
                                                        seed=1,
                                                        nfolds=nfolds,
                                                        fold_assignment="Modulo",
                                                        keep_cross_validation_predictions=True),
                     hyper_params=hyper_params,
                     search_criteria=search_criteria,
                     grid_id="gbm_grid_binomial")
grid.train(x=x, y=y, training_frame=train)

gbm Grid Build progress: |███████████████████████████████████████████████████████| (done) 100%
    col_sample_rate learn_rate max_depth sample_rate  \
0               0.4       0.03         3         0.7   
1               0.2       0.03         4         0.8   
2               0.7       0.01         5         0.9   

                   model_ids             logloss  
0  gbm_grid_binomial_model_3   0.667209869083727  
1  gbm_grid_binomial_model_2  0.6742791229554091  
2  gbm_grid_binomial_model_1  0.6770690446978346  




In [33]:
# Train a stacked ensemble using the GBM grid
ensemble = H2OStackedEnsembleEstimator(model_id="my_ensemble_gbm_grid_binomial",
                                       base_models=grid.model_ids)
ensemble.train(x=x, y=y, training_frame=train)

# Eval ensemble performance on the test data
perf_stack_test = ensemble.model_performance(test)

# Compare to base learner performance on the test set
baselearner_best_auc_test = max([h2o.get_model(model).model_performance(test_data=test).auc() for model in grid.model_ids])
stack_auc_test = perf_stack_test.auc()
print("Best Base-learner Test AUC:  {0}".format(baselearner_best_auc_test))
print("Ensemble Test AUC:  {0}".format(stack_auc_test))

# Generate predictions on a test set (if neccessary)
pred = ensemble.predict(test)

stackedensemble Model Build progress: |██████████████████████████████████████████| (done) 100%
Best Base-learner Test AUC:  0.748146530400473
Ensemble Test AUC:  0.7510921003414699
stackedensemble prediction progress: |███████████████████████████████████████████| (done) 100%


## Deep learning basics

Deep learning is a subfield of machine learning that uses a variety of multi-layered artificial neural networks to model datasets and predict outcomes. Deep learning is ideal for numeric, text, image, video, and sound data because deep representative networks store these data as large matrices and recycle error to make better predictions during the next epoch. 

To understand deep networks, let's start with a toy example of a single feed forward neural network - a perceptron.

Read Goodfellow et al's Deep Learning Book to learn more: https://www.deeplearningbook.org/

In [46]:
import pandas as pd

# generate toy dataset
example = {'x1': [1, 0, 1, 1, 0], 
           'x2': [1, 1, 1, 1, 0], 
           'xm': [1, 0, 1, 1, 0],
           'y': ['yes', 'no', 'yes', 'yes', 'no']
           }
example_df = pd.DataFrame(data = example)
example_df

Unnamed: 0,x1,x2,xm,y
0,1,1,1,yes
1,0,1,0,no
2,1,1,1,yes
3,1,1,1,yes
4,0,0,0,no


![perceptron](img/perceptron.png)

Perceptron figure modified from [Sebastian Raschka's Single-Layer Neural Networks and Gradient Descent](https://sebastianraschka.com/Articles/2015_singlelayer_neurons.html)

Perceptron key terms: 
* **Layer:** the network typology of a deep learning model, usually divided into variations of input, hidden, preprocessing, encoder/decoder, and output. 
* **Inputs:** the features/covariates for a single observation. These are just the individual cells in a dataframe (the 1's and 0s from `example_df` above), but they could be words from a text or pixels from an image. 
* **Weights:** the learnable parameters of a model that connect the input layer to the output via the net input (summation) and activation functions. 
* **Bias term:** A placeholder "1" assures that we do not receive 0 outputs by default. 
* **Net input function:** computes the weighted sum of the input layer. 
* **Activation function:** determine if a neuron should be fired or not. In binary classification for example, this means should a 1 or 0 be output?
* Output: one node that contains the y prediction
* **Error:** how far off an output prediction was. The weights can be updated by adjusting the learning rate based on the error to reduce it for the next epoch

## What makes a network "deep"?

A "deep" network is just network with multiple/many hidden layers for handling potential nonlinear transformations.

* Fully connected layer: a layer where all nodes are connected to every node in the next layer (as indicated by the purple arrows 

![deep](img/deep.png)

Example of "deep" network with two hidden layers modified from [DevSkrol's Artificial Neural Network Explained with an Regression Example](https://devskrol.com/2020/11/22/388/)

>NOTE: Bias term not shown for some reason!

Let's go through François Chollet's "Image classification from scratch" [tutorial](https://keras.io/examples/vision/image_classification_from_scratch/) to examine this architecture to predict images of cats versus dogs. 

[Click here to open the Colab notebook](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/image_classification_from_scratch.ipynb)

You should also check out his deep learning book! https://www.manning.com/books/deep-learning-with-python-second-edition

![dogcat](img/dogcat.jpg)