<h1 style="display: inline;" >6. Grid Search and Stacking (B2)


The performance of most machine learning models depends on the settings of several hyperparameters. Optimizing model performance is generally a manner of tuning these hyperparameters and testing performance. 

Testing performance on a subspace of parameter settings is known as a 'grid search'. In its default state, the user supplies a list of possible hyperparameter settings, and the search is conducted over all possibilities. 

Since this can be quite time-consuming, H2O allows you to stipulate the 'max_models' (i.e. maximum number of parameter combinations to consider), with each model chosen randomly from among the given possibilities (i.e. a "RandomDiscrete" search strategy).

Here we import the airline data and create a number of different models by random search of hyperparameter space.

**Importing the data:**

In [1]:
import h2o
h2o.init()

You can upgrade to the newest version of the module running from the command line
    $ pip2 install --upgrade requests
Checking whether there is an H2O instance running at http://localhost:54321. connected.


0,1
H2O cluster uptime:,59 mins 52 secs
H2O cluster version:,3.14.0.6
H2O cluster version age:,10 days
H2O cluster name:,ec2-user
H2O cluster total nodes:,2
H2O cluster free memory:,966 Mb
H2O cluster total cores:,4
H2O cluster allowed cores:,4
H2O cluster status:,"locked, healthy"
H2O connection url:,http://localhost:54321


In [3]:
flights = h2o.import_file("hdfs://ec2-34-204-73-232.compute-1.amazonaws.com:9000/allyears2k.csv")

Parse progress: |█████████████████████████████████████████████████████████| 100%


Our goal is to predict whether the arrival is delayed, "ArrDelay". We will use date and time, scheduled time, origin and destination, and airline.


In [4]:
pred = flights[['IsArrDelayed', 'Year','Month','DayofMonth','DayOfWeek','CRSDepTime','CRSArrTime','CRSElapsedTime','UniqueCarrier','FlightNum','TailNum','Origin','Dest','Distance']]
pred_train, pred_valid = pred.split_frame(ratios=[0.8])

# Building Models
** Generalized Linear Models:**

In [5]:
from h2o.grid.grid_search import H2OGridSearch
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
hyper_parameters = {'alpha': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], 'lambda': [0, 1e-7, 1e-5, 1e-3, 1e-1]}
criteria = {"strategy": "RandomDiscrete", "max_models": 5, "seed": 1234}
gs1 = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial', nfolds=2, fold_assignment="modulo", keep_cross_validation_predictions=True), hyper_parameters, search_criteria=criteria)

x_cols = ['Year','Month','DayofMonth','DayOfWeek','CRSDepTime','CRSArrTime','CRSElapsedTime','UniqueCarrier','FlightNum','TailNum','Origin','Dest','Distance']

gs1.train(x=x_cols, y="IsArrDelayed", training_frame=pred_train, validation_frame=pred_valid)
auc_glm = gs1.auc(valid=True)

glm Grid Build progress: |████████████████████████████████████████████████| 100%


**Gradient Boosted Machines:**

In [6]:
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch
hyper_parameters = {'learn_rate': [0.01, 0.03], 'max_depth': [3,4,5,6,9], 'sample_rate': [0.7,0.8,0.9,1.0], 'col_sample_rate': [0.2,0.3,0.4,0.5,0.6,0.7,0.8]}
criteria = {"strategy": "RandomDiscrete", "max_models": 5, "seed": 1234}
gs2 = H2OGridSearch(H2OGradientBoostingEstimator(nfolds=2, fold_assignment="modulo", keep_cross_validation_predictions=True), hyper_parameters, search_criteria=criteria)

x_cols = ['Year','Month','DayofMonth','DayOfWeek','CRSDepTime','CRSArrTime','CRSElapsedTime','UniqueCarrier','FlightNum','TailNum','Origin','Dest','Distance']

gs2.train(x=x_cols, y="IsArrDelayed", training_frame=pred_train, validation_frame=pred_valid)
auc_gbm = gs2.auc(valid=True)

gbm Grid Build progress: |████████████████████████████████████████████████| 100%


**Multi-layer perceptrons:**

In [7]:
from h2o.estimators.deeplearning import H2ODeepLearningEstimator
from h2o.grid.grid_search import H2OGridSearch
hyper_parameters = {'activation': ["rectifier", "rectifier_with_dropout"], 'hidden': [[10, 10],[20, 15],[50,50,50]], 'l1': [0, 1e-3, 1e-5], 'l2': [0, 1e-3, 1e-5]}
criteria = {"strategy": "RandomDiscrete", "max_models": 5, "seed": 1234}
gs3 = H2OGridSearch(H2ODeepLearningEstimator(nfolds=2, fold_assignment="modulo", keep_cross_validation_predictions=True), hyper_parameters, search_criteria=criteria)
gs3.train(x=x_cols, y="IsArrDelayed", training_frame=pred_train, validation_frame=pred_valid)
auc_mlp = gs3.auc()

deeplearning Grid Build progress: |███████████████████████████████████████| 100%


# Evaluating Model Performance

In [8]:
auc = auc_glm
auc.update(auc_gbm)
auc.update(auc_mlp)
auc

{u'Grid_DeepLearning_py_4_sid_93ae_model_python_1508440543160_871_model_0': 0.7100269514556684,
 u'Grid_DeepLearning_py_4_sid_93ae_model_python_1508440543160_871_model_1': 0.7071959224861313,
 u'Grid_DeepLearning_py_4_sid_93ae_model_python_1508440543160_871_model_2': 0.6688345482458213,
 u'Grid_DeepLearning_py_4_sid_93ae_model_python_1508440543160_871_model_3': 0.7042174007318446,
 u'Grid_DeepLearning_py_4_sid_93ae_model_python_1508440543160_871_model_4': 0.6565614708058285,
 u'Grid_GBM_py_4_sid_93ae_model_python_1508440543160_47_model_0': 0.6830318707483571,
 u'Grid_GBM_py_4_sid_93ae_model_python_1508440543160_47_model_1': 0.6797463558257494,
 u'Grid_GBM_py_4_sid_93ae_model_python_1508440543160_47_model_2': 0.6922829123052446,
 u'Grid_GBM_py_4_sid_93ae_model_python_1508440543160_47_model_3': 0.6793678034410898,
 u'Grid_GBM_py_4_sid_93ae_model_python_1508440543160_47_model_4': 0.7132884927724197,
 u'Grid_GLM_py_4_sid_93ae_model_python_1508440543160_1_model_0': 0.6532549396398448,
 u'Gr

# Stacking

Next, we create a composite model as an ensemble of smaller models — a process known as stacking. The H2O stacking model creates an ensemble by fitting a GLM on top of the results of the individual models.

In [9]:
models = list(auc.keys())

In [10]:
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator

stack = H2OStackedEnsembleEstimator(base_models=models)
stack.train(x=x_cols, y="IsArrDelayed", training_frame=pred_train, validation_frame=pred_valid)

stackedensemble Model Build progress: |███████████████████████████████████| 100%

ModelMetricsBinomialGLM: stackedensemble
** Reported on train data. **

MSE: 0.198670047585
RMSE: 0.445724183308
LogLoss: 0.581557336202
Null degrees of freedom: 35243
Residual degrees of freedom: 35238
Null deviance: 48422.9937834
Residual deviance: 40992.8135142
AIC: 41004.8135142
AUC: 0.762249457365
Gini: 0.524498914731
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.414364127528: 


0,1,2,3,4
,NO,YES,Error,Rate
NO,6562.0,9103.0,0.5811,(9103.0/15665.0)
YES,2269.0,17310.0,0.1159,(2269.0/19579.0)
Total,8831.0,26413.0,0.3227,(11372.0/35244.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.4143641,0.7527396,273.0
max f2,0.2566094,0.8664125,357.0
max f0point5,0.5588160,0.7294756,186.0
max accuracy,0.5179434,0.6947282,212.0
max precision,0.9150475,1.0,0.0
max recall,0.1259838,1.0,397.0
max specificity,0.9150475,1.0,0.0
max absolute_mcc,0.5569370,0.3817494,187.0
max min_per_class_accuracy,0.5429482,0.6902293,196.0


Gains/Lift Table: Avg response rate: 55.55 %



0,1,2,3,4,5,6,7,8,9,10,11
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,cumulative_response_rate,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0100159,0.9017340,1.7847937,1.7847937,0.9915014,0.9915014,0.0178763,0.0178763,78.4793703,78.4793703
,2,0.0200034,0.8907397,1.7131557,1.7490255,0.9517045,0.9716312,0.0171102,0.0349865,71.3155677,74.9025497
,3,0.0300193,0.8772443,1.7185014,1.7388412,0.9546742,0.9659735,0.0172123,0.0521988,71.8501366,73.8841170
,4,0.0400068,0.8604212,1.6978140,1.7285989,0.9431818,0.9602837,0.0169569,0.0691557,69.7813984,72.8598922
,5,0.0500227,0.8440597,1.6420102,1.7112615,0.9121813,0.9506523,0.0164462,0.0856019,64.2010207,71.1261533
,6,0.1000170,0.7797209,1.6110925,1.6611912,0.8950057,0.9228369,0.0805455,0.1661474,61.1092498,66.1191224
,7,0.1500113,0.7377690,1.5222117,1.6148735,0.8456300,0.8971061,0.0761019,0.2422493,52.2211682,61.4873472
,8,0.2000057,0.7079815,1.4272012,1.5679620,0.7928490,0.8710455,0.0713520,0.3136013,42.7201154,56.7962049
,9,0.2999943,0.6555287,1.2949015,1.4769505,0.7193530,0.8204861,0.1294755,0.4430768,29.4901548,47.6950491







Here we can compare the performance of the stacked model to that of the individual models above. Performance varies due to the randomness of our grid search, but on last run the AUC improved from 0.713 (best individual model) to 0.762 (stacked model):

In [None]:
stack.model_performance()