# Hyperparameter Optimization using H2O

This notebook compares the performance of H2O’s grid search, randomized grid search, and AutoML for hyperparameter optimization of the H2ORandomForestEstimator model.

## Overview
The key steps involve performing grid search, randomized grid search, and using H2O AutoML to find the best hyperparameters and model performance.

## Procedure
- **Grid Search**:
  - Perform grid search for H2ORandomForestEstimator with specified hyperparameters.
  - Display sorted grid results and evaluate the best model's performance.
- **Randomized Grid Search**:
  - Perform hyperparameter optimization using randomized grid search.
  - Display sorted results and evaluate the best model's performance.
- **H2O AutoML**:
  - Use H2O AutoML to find the best deep learning model for the classification task.
  - Display the leaderboard and evaluate the best model's performance.

References:
- [H2O Documentation](https://docs.h2o.ai)
- [allyears2k Headers Dataset](https://www.h2o.ai/resources/datasets/allyears2k_headers.zip)

In [5]:
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o import estimators
h2o.init()

# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")

# convert columns to factors
airlines["Year"]= airlines["Year"].asfactor()
airlines["Month"]= airlines["Month"].asfactor()
airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor()
airlines["Cancelled"] = airlines["Cancelled"].asfactor()
airlines['FlightNum'] = airlines['FlightNum'].asfactor()

# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "UniqueCarrier",
              "DayOfWeek", "Month", "Distance", "FlightNum"]
response = "IsDepDelayed"

# split into train val and test
train, valid, test = airlines.split_frame(ratios=[.8, .1], seed=1234)

Checking whether there is an H2O instance running at http://localhost:54321. connected.


0,1
H2O_cluster_uptime:,8 mins 26 secs
H2O_cluster_timezone:,America/New_York
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.46.0.1
H2O_cluster_version_age:,24 days
H2O_cluster_name:,H2O_from_python_darien_b05g38
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,7.925 Gb
H2O_cluster_total_cores:,12
H2O_cluster_allowed_cores:,12


Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


# 5.1.a

In [8]:
from h2o.grid.grid_search import H2OGridSearch

rf_estimator = estimators.H2ORandomForestEstimator()

ntrees = [10, 30, 50, 100]
max_depth = [1, 2, 4, 6]

hyper_parameters = {"ntrees": [10, 30, 50, 100], "max_depth": [1, 2, 4, 6]}

# Train the grid
rf_grid = H2OGridSearch(model=rf_estimator, hyper_params=hyper_parameters)
rf_grid.train(x=predictors, 
              y=response, 
              training_frame=train, 
              validation_frame=valid,
)


drf Grid Build progress: |

███████████████████████████████████████████████████████| (done) 100%


Unnamed: 0,max_depth,ntrees,model_ids,logloss
,6.0,30.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_8,0.6110924
,6.0,100.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_16,0.6114624
,6.0,50.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_12,0.6117272
,6.0,10.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_4,0.61557
,4.0,100.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_15,0.626486
,4.0,50.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_11,0.6270048
,4.0,30.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_7,0.6274358
,4.0,10.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_3,0.6320038
,2.0,10.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_2,0.6505406
,2.0,50.0,Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_10,0.6506954


# 5.1.b

In [91]:
# Get the grid results
grid_results = rf_grid.get_grid(sort_by='accuracy', decreasing=True)
print(grid_results)


Hyper-Parameter Search Summary: ordered by decreasing accuracy
    max_depth    ntrees    model_ids                                                         accuracy
--  -----------  --------  ----------------------------------------------------------------  ----------
    6            50        Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_12  0.671596
    6            30        Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_8   0.668075
    6            100       Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_16  0.66784
    6            10        Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_4   0.66784
    4            50        Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_11  0.657277
    4            100       Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_15  0.656338
    4            30        Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_7   0.655634
    4            10

# 5.1.c

In [94]:
best_model = grid_results.models[0]

performance = best_model.model_performance(test)

print(f"Best Model ID: {best_model.model_id}")
print(f"Accuracy: {performance.accuracy()[0][1]:.4f}")
print(f"AUC score: {performance.auc():.4f}")


Best Model ID: Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_12
Accuracy: 0.6781
AUC score: 0.7380


```text
Best Model ID: Grid_DRF_py_40_sid_a544_model_python_1712505565922_3059_model_12
Accuracy: 0.6781
AUC score: 0.7380
```

The best model in the grid search is a decision tree model with an accuracy of 0.6781 and an AUC score of 0.7380.

## 5.2 Randomized grid search

# 5.2.a

In [21]:
# %% [markdown]
# # 5.2.a

# %%


rf_estimator = estimators.H2ORandomForestEstimator()

ntrees = [10, 30, 50, 100]
max_depth = [1, 2, 4, 6]

hyper_parameters = {"ntrees": [10, 30, 50, 100], "max_depth": [1, 2, 4, 6]}

search_criteria = {'strategy': 'RandomDiscrete', 'max_models': 10, 'seed': 9232}

# Train the grid
rf_grid2 = H2OGridSearch(model=rf_estimator, hyper_params=hyper_parameters, search_criteria=search_criteria)
rf_grid2.train(x=predictors, 
              y=response, 
              training_frame=train, 
              validation_frame=valid,
)



rf_grid2.train(
    x=predictors,
    y=response,
    training_frame=train,
    validation_frame=valid,
)



drf Grid Build progress: |

███████████████████████████████████████████████████████| (done) 100%
drf Grid Build progress: |███████████████████████████████████████████████████████| (done) 100%
Hyper-Parameter Search Summary: ordered by decreasing accuracy
    max_depth    ntrees    model_ids                                                         accuracy
--  -----------  --------  ----------------------------------------------------------------  ----------
    6            100       Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_1   0.670423
    6            30        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_9   0.665962
    4            50        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_10  0.659624
    4            10        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_4   0.655164
    4            30        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_5   0.655164
    4            100       Grid_DRF_py_40_sid_a544_model_


# 5.2.b


In [22]:

# %%
# Get the random grid results
random_grid2_results = rf_grid2.get_grid(sort_by='accuracy', decreasing=True)
print(random_grid2_results)

Hyper-Parameter Search Summary: ordered by decreasing accuracy
    max_depth    ntrees    model_ids                                                         accuracy
--  -----------  --------  ----------------------------------------------------------------  ----------
    6            100       Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_1   0.670423
    6            30        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_9   0.665962
    4            50        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_10  0.659624
    4            10        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_4   0.655164
    4            30        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_5   0.655164
    4            100       Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_7   0.654695
    2            30        Grid_DRF_py_40_sid_a544_model_python_1712505565922_6060_model_3   0.64108
    2            1

# 3. H2O AutoML


#  5.3.a


In [23]:

# %%
from h2o.automl import H2OAutoML

# Initialize the AutoML object
aml = H2OAutoML(max_models=20, seed=9232)

# Train the AutoML
aml.train(x=predictors, y=response, training_frame=train, validation_frame=valid)


AutoML progress: |
12:39:44.888: User specified a validation frame with cross-validation still enabled. Please note that the models will still be validated using cross-validation only, the validation frame will be used to provide purely informative validation metrics on the trained models.
12:39:44.890: AutoML: XGBoost is not available; skipping it.

███████████████████████████████████████████████████████████████Job request failed Unexpected HTTP error: ('Connection aborted.', BadStatusLine('GET /3/Jobs/$03017f00000132d4ffffffff$_a460fe8ed666c9228a7e3806744456b HTTP/1.1\r\n')), will retry after 3s.
| (done) 100%


key,value
Stacking strategy,cross_validation
Number of base models (used / total),3/20
# GBM base models (used / total),0/10
# DRF base models (used / total),1/2
# DeepLearning base models (used / total),2/7
# GLM base models (used / total),0/1
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5
Metalearner fold_column,

Unnamed: 0,NO,YES,Error,Rate
NO,3485.0,1168.0,0.251,(1168.0/4653.0)
YES,667.0,4641.0,0.1257,(667.0/5308.0)
Total,4152.0,5809.0,0.1842,(1835.0/9961.0)

metric,threshold,value,idx
max f1,0.4944467,0.8349375,226.0
max f2,0.4009381,0.8977111,283.0
max f0point5,0.574844,0.8518398,181.0
max accuracy,0.5187538,0.8198976,213.0
max precision,0.9876612,1.0,0.0
max recall,0.272308,1.0,341.0
max specificity,0.9876612,1.0,0.0
max absolute_mcc,0.5203673,0.6385444,212.0
max min_per_class_accuracy,0.522363,0.8185757,211.0
max mean_per_class_accuracy,0.5203673,0.8195915,212.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0101395,0.9642759,1.8766014,1.8766014,1.0,0.9803605,1.0,0.9803605,0.0190279,0.0190279,87.6601356,87.6601356,0.0190279
2,0.0200783,0.9381837,1.8766014,1.8766014,1.0,0.9499913,1.0,0.9653278,0.0186511,0.037679,87.6601356,87.6601356,0.037679
3,0.0301175,0.9189456,1.8766014,1.8766014,1.0,0.9291041,1.0,0.9532532,0.0188395,0.0565185,87.6601356,87.6601356,0.0565185
4,0.0400562,0.9032262,1.8766014,1.8766014,1.0,0.9107966,1.0,0.9427189,0.0186511,0.0751696,87.6601356,87.6601356,0.0751696
5,0.0500954,0.8856107,1.8766014,1.8766014,1.0,0.8945477,1.0,0.9330653,0.0188395,0.094009,87.6601356,87.6601356,0.094009
6,0.1000904,0.8135001,1.8728331,1.8747191,0.997992,0.8482258,0.998997,0.8906881,0.0936323,0.1876413,87.2833081,87.4719108,0.1874264
7,0.1500853,0.7557493,1.8766014,1.8753461,1.0,0.782473,0.9993311,0.8546405,0.0938206,0.2814619,87.6601356,87.5346105,0.281247
8,0.2000803,0.7065385,1.8539917,1.8700102,0.9879518,0.7302968,0.9964877,0.8235702,0.0926903,0.3741522,85.3991702,87.0010183,0.3726478
9,0.3000703,0.6329329,1.667462,1.8025167,0.8885542,0.6676194,0.9605219,0.771604,0.1667295,0.5408817,66.7462049,80.2516726,0.5155217
10,0.4001606,0.5774851,1.4098038,1.7042892,0.7512538,0.6036645,0.9081786,0.7295981,0.1411078,0.6819894,40.9803827,70.428924,0.6033305

Unnamed: 0,NO,YES,Error,Rate
NO,845.0,1152.0,0.5769,(1152.0/1997.0)
YES,329.0,1934.0,0.1454,(329.0/2263.0)
Total,1174.0,3086.0,0.3477,(1481.0/4260.0)

metric,threshold,value,idx
max f1,0.4291946,0.7231258,270.0
max f2,0.1517097,0.8507755,388.0
max f0point5,0.5533079,0.6944024,191.0
max accuracy,0.5067752,0.6732394,222.0
max precision,0.9864963,1.0,0.0
max recall,0.1029932,1.0,397.0
max specificity,0.9864963,1.0,0.0
max absolute_mcc,0.5067752,0.3421678,222.0
max min_per_class_accuracy,0.5281019,0.668002,209.0
max mean_per_class_accuracy,0.524817,0.6703984,211.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100939,0.9719079,1.8824569,1.8824569,1.0,0.9827336,1.0,0.9827336,0.0190013,0.0190013,88.2456916,88.2456916,0.0190013
2,0.0201878,0.9416241,1.7073446,1.7949008,0.9069767,0.9580741,0.9534884,0.9704038,0.0172338,0.0362351,70.7344644,79.490078,0.0342321
3,0.0300469,0.9171548,1.7479957,1.7795101,0.9285714,0.928654,0.9453125,0.9567047,0.0172338,0.0534688,74.7995707,77.9510053,0.0499636
4,0.0401408,0.9004599,1.7073446,1.7613632,0.9069767,0.9086193,0.9356725,0.944613,0.0172338,0.0707026,70.7344644,76.1363196,0.0651943
5,0.05,0.8763802,1.5238937,1.7145382,0.8095238,0.8879015,0.9107981,0.9334305,0.0150243,0.0857269,52.3893694,71.4538224,0.0762126
6,0.1002347,0.8084332,1.5569854,1.6355773,0.8271028,0.8429219,0.8688525,0.8880702,0.0782148,0.1639417,55.6985393,63.557732,0.1358996
7,0.15,0.7489386,1.5183969,1.5967005,0.8066038,0.7764471,0.8482003,0.8510372,0.0755634,0.2395051,51.8396852,59.6700545,0.1909322
8,0.2,0.7055275,1.4494034,1.5598763,0.7699531,0.7265418,0.8286385,0.8199133,0.0724702,0.3119753,44.9403447,55.987627,0.2388656
9,0.3,0.6307953,1.2814848,1.4670791,0.6807512,0.6663109,0.7793427,0.7687125,0.1281485,0.4401237,28.1484755,46.7079099,0.2989119
10,0.4,0.5749611,1.117985,1.3798056,0.5938967,0.6016234,0.7329812,0.7269403,0.1117985,0.5519222,11.7984976,37.9805568,0.3240805

Unnamed: 0,NO,YES,Error,Rate
NO,7221.0,9545.0,0.5693,(9545.0/16766.0)
YES,2355.0,16130.0,0.1274,(2355.0/18485.0)
Total,9576.0,25675.0,0.3376,(11900.0/35251.0)

metric,threshold,value,idx
max f1,0.3564042,0.7305254,280.0
max f2,0.1859148,0.847948,360.0
max f0point5,0.5625001,0.7108722,181.0
max accuracy,0.484417,0.6918669,216.0
max precision,0.9788691,0.9875,2.0
max recall,0.0599107,1.0,398.0
max specificity,0.9945203,0.9999404,0.0
max absolute_mcc,0.5343993,0.3831103,193.0
max min_per_class_accuracy,0.5126044,0.6911009,203.0
max mean_per_class_accuracy,0.5343993,0.6916352,193.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100139,0.9394392,1.8043623,1.8043623,0.9461756,0.9613896,0.9461756,0.9613896,0.0180687,0.0180687,80.4362315,80.4362315,0.0169355
2,0.0200278,0.9181912,1.7233281,1.7638452,0.9036827,0.9280595,0.9249292,0.9447245,0.0172572,0.0353259,72.3328079,76.3845197,0.0321648
3,0.0300417,0.9042116,1.7071212,1.7449372,0.8951841,0.911074,0.9150142,0.9335077,0.0170949,0.0524209,70.7121232,74.4937209,0.0470529
4,0.0400272,0.891076,1.7390592,1.7434708,0.9119318,0.8974832,0.9142452,0.9245207,0.0173654,0.0697863,73.9059157,74.347082,0.0625693
5,0.0500128,0.8797161,1.7173886,1.7382633,0.9005682,0.8853097,0.9115145,0.9166919,0.017149,0.0869354,71.7388638,73.826326,0.0776308
6,0.1000255,0.8267037,1.6398302,1.6890467,0.8598979,0.8529982,0.8857062,0.884845,0.0820124,0.1689478,63.9830182,68.9046721,0.1449111
7,0.1500099,0.783841,1.5260375,1.6347309,0.800227,0.8044248,0.8572239,0.8580484,0.0762781,0.2452259,52.6037463,63.4730853,0.2001942
8,0.2000227,0.7455751,1.463516,1.5919211,0.7674419,0.7641487,0.8347752,0.8345701,0.0731945,0.3184203,46.3515987,59.1921066,0.2489345
9,0.3000199,0.6710794,1.3340925,1.5059863,0.6995745,0.7084786,0.7897126,0.7925436,0.1334055,0.4518258,33.4092484,50.5986331,0.3191764
10,0.400017,0.5946852,1.2161557,1.4335338,0.6377305,0.6326692,0.7517197,0.7525778,0.1216121,0.5734379,21.6155679,43.3533807,0.3646225

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.6671839,0.0115815,0.6587448,0.6771945,0.6544893,0.6811739,0.6643170
aic,8281.283,140.94276,8292.175,8188.164,8409.877,8095.222,8420.979
auc,0.7562525,0.0083935,0.7539476,0.7636517,0.7515472,0.7660461,0.7460697
err,0.3328161,0.0115815,0.3412552,0.3228055,0.3455107,0.3188261,0.3356831
err_count,2346.8,95.89682,2398.0,2269.0,2459.0,2227.0,2381.0
f0point5,0.6697307,0.0092720,0.6609719,0.6748992,0.6596254,0.6812984,0.6718582
f1,0.732205,0.0035600,0.7266925,0.7350847,0.7321059,0.7356049,0.7315368
f2,0.8077263,0.0088494,0.8069252,0.8070553,0.8224812,0.7993189,0.8028511
lift_top_group,1.8050249,0.0523627,1.8117794,1.7377949,1.7665415,1.8580035,1.8510054
loglikelihood,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
import pickle 
with open('aml.pkl', 'wb') as f:
    pickle.dump(aml, f)


# 5.3.b


In [60]:
# leaderboard
import pandas as pd 
pd.set_option('display.max_rows', None)
aml_lb = h2o.automl.get_leaderboard(aml)
aml_leaderboard = aml.leaderboard.sort(by='auc', ascending=False)
# display(aml_leaderboard.head(50))
print(aml_lb)

best_model = aml

model_id                                                      auc    logloss     aucpr    mean_per_class_error      rmse       mse
StackedEnsemble_AllModels_1_AutoML_1_20240407_123944     0.756087   0.585832  0.767857                0.348354  0.447694  0.20043
StackedEnsemble_BestOfFamily_1_AutoML_1_20240407_123944  0.755393   0.586386  0.767101                0.339776  0.447966  0.200674
GBM_1_AutoML_1_20240407_123944                           0.752938   0.58934   0.76335                 0.351937  0.449191  0.201773
GBM_3_AutoML_1_20240407_123944                           0.751078   0.590976  0.760174                0.347079  0.449897  0.202407
GBM_2_AutoML_1_20240407_123944                           0.749332   0.592415  0.759057                0.353812  0.450614  0.203053
GBM_4_AutoML_1_20240407_123944                           0.746035   0.596445  0.755082                0.357029  0.452407  0.204672
GBM_5_AutoML_1_20240407_123944                           0.745896   0.595113  0.7556

### Best AML Model and AUC

In [80]:
from pprint import pprint 
best_model = aml.leader

print(f"Best model: {best_model.model_id}")
print(f"AUC score: {best_model.auc():.4f}")



Best model: StackedEnsemble_AllModels_1_AutoML_1_20240407_123944
AUC score: 0.9109


### Full Parameters

In [84]:
display(best_model)
display(best_model.params)

key,value
Stacking strategy,cross_validation
Number of base models (used / total),3/20
# GBM base models (used / total),0/10
# DRF base models (used / total),1/2
# DeepLearning base models (used / total),2/7
# GLM base models (used / total),0/1
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5
Metalearner fold_column,

Unnamed: 0,NO,YES,Error,Rate
NO,3485.0,1168.0,0.251,(1168.0/4653.0)
YES,667.0,4641.0,0.1257,(667.0/5308.0)
Total,4152.0,5809.0,0.1842,(1835.0/9961.0)

metric,threshold,value,idx
max f1,0.4944467,0.8349375,226.0
max f2,0.4009381,0.8977111,283.0
max f0point5,0.574844,0.8518398,181.0
max accuracy,0.5187538,0.8198976,213.0
max precision,0.9876612,1.0,0.0
max recall,0.272308,1.0,341.0
max specificity,0.9876612,1.0,0.0
max absolute_mcc,0.5203673,0.6385444,212.0
max min_per_class_accuracy,0.522363,0.8185757,211.0
max mean_per_class_accuracy,0.5203673,0.8195915,212.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0101395,0.9642759,1.8766014,1.8766014,1.0,0.9803605,1.0,0.9803605,0.0190279,0.0190279,87.6601356,87.6601356,0.0190279
2,0.0200783,0.9381837,1.8766014,1.8766014,1.0,0.9499913,1.0,0.9653278,0.0186511,0.037679,87.6601356,87.6601356,0.037679
3,0.0301175,0.9189456,1.8766014,1.8766014,1.0,0.9291041,1.0,0.9532532,0.0188395,0.0565185,87.6601356,87.6601356,0.0565185
4,0.0400562,0.9032262,1.8766014,1.8766014,1.0,0.9107966,1.0,0.9427189,0.0186511,0.0751696,87.6601356,87.6601356,0.0751696
5,0.0500954,0.8856107,1.8766014,1.8766014,1.0,0.8945477,1.0,0.9330653,0.0188395,0.094009,87.6601356,87.6601356,0.094009
6,0.1000904,0.8135001,1.8728331,1.8747191,0.997992,0.8482258,0.998997,0.8906881,0.0936323,0.1876413,87.2833081,87.4719108,0.1874264
7,0.1500853,0.7557493,1.8766014,1.8753461,1.0,0.782473,0.9993311,0.8546405,0.0938206,0.2814619,87.6601356,87.5346105,0.281247
8,0.2000803,0.7065385,1.8539917,1.8700102,0.9879518,0.7302968,0.9964877,0.8235702,0.0926903,0.3741522,85.3991702,87.0010183,0.3726478
9,0.3000703,0.6329329,1.667462,1.8025167,0.8885542,0.6676194,0.9605219,0.771604,0.1667295,0.5408817,66.7462049,80.2516726,0.5155217
10,0.4001606,0.5774851,1.4098038,1.7042892,0.7512538,0.6036645,0.9081786,0.7295981,0.1411078,0.6819894,40.9803827,70.428924,0.6033305

Unnamed: 0,NO,YES,Error,Rate
NO,845.0,1152.0,0.5769,(1152.0/1997.0)
YES,329.0,1934.0,0.1454,(329.0/2263.0)
Total,1174.0,3086.0,0.3477,(1481.0/4260.0)

metric,threshold,value,idx
max f1,0.4291946,0.7231258,270.0
max f2,0.1517097,0.8507755,388.0
max f0point5,0.5533079,0.6944024,191.0
max accuracy,0.5067752,0.6732394,222.0
max precision,0.9864963,1.0,0.0
max recall,0.1029932,1.0,397.0
max specificity,0.9864963,1.0,0.0
max absolute_mcc,0.5067752,0.3421678,222.0
max min_per_class_accuracy,0.5281019,0.668002,209.0
max mean_per_class_accuracy,0.524817,0.6703984,211.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100939,0.9719079,1.8824569,1.8824569,1.0,0.9827336,1.0,0.9827336,0.0190013,0.0190013,88.2456916,88.2456916,0.0190013
2,0.0201878,0.9416241,1.7073446,1.7949008,0.9069767,0.9580741,0.9534884,0.9704038,0.0172338,0.0362351,70.7344644,79.490078,0.0342321
3,0.0300469,0.9171548,1.7479957,1.7795101,0.9285714,0.928654,0.9453125,0.9567047,0.0172338,0.0534688,74.7995707,77.9510053,0.0499636
4,0.0401408,0.9004599,1.7073446,1.7613632,0.9069767,0.9086193,0.9356725,0.944613,0.0172338,0.0707026,70.7344644,76.1363196,0.0651943
5,0.05,0.8763802,1.5238937,1.7145382,0.8095238,0.8879015,0.9107981,0.9334305,0.0150243,0.0857269,52.3893694,71.4538224,0.0762126
6,0.1002347,0.8084332,1.5569854,1.6355773,0.8271028,0.8429219,0.8688525,0.8880702,0.0782148,0.1639417,55.6985393,63.557732,0.1358996
7,0.15,0.7489386,1.5183969,1.5967005,0.8066038,0.7764471,0.8482003,0.8510372,0.0755634,0.2395051,51.8396852,59.6700545,0.1909322
8,0.2,0.7055275,1.4494034,1.5598763,0.7699531,0.7265418,0.8286385,0.8199133,0.0724702,0.3119753,44.9403447,55.987627,0.2388656
9,0.3,0.6307953,1.2814848,1.4670791,0.6807512,0.6663109,0.7793427,0.7687125,0.1281485,0.4401237,28.1484755,46.7079099,0.2989119
10,0.4,0.5749611,1.117985,1.3798056,0.5938967,0.6016234,0.7329812,0.7269403,0.1117985,0.5519222,11.7984976,37.9805568,0.3240805

Unnamed: 0,NO,YES,Error,Rate
NO,7221.0,9545.0,0.5693,(9545.0/16766.0)
YES,2355.0,16130.0,0.1274,(2355.0/18485.0)
Total,9576.0,25675.0,0.3376,(11900.0/35251.0)

metric,threshold,value,idx
max f1,0.3564042,0.7305254,280.0
max f2,0.1859148,0.847948,360.0
max f0point5,0.5625001,0.7108722,181.0
max accuracy,0.484417,0.6918669,216.0
max precision,0.9788691,0.9875,2.0
max recall,0.0599107,1.0,398.0
max specificity,0.9945203,0.9999404,0.0
max absolute_mcc,0.5343993,0.3831103,193.0
max min_per_class_accuracy,0.5126044,0.6911009,203.0
max mean_per_class_accuracy,0.5343993,0.6916352,193.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100139,0.9394392,1.8043623,1.8043623,0.9461756,0.9613896,0.9461756,0.9613896,0.0180687,0.0180687,80.4362315,80.4362315,0.0169355
2,0.0200278,0.9181912,1.7233281,1.7638452,0.9036827,0.9280595,0.9249292,0.9447245,0.0172572,0.0353259,72.3328079,76.3845197,0.0321648
3,0.0300417,0.9042116,1.7071212,1.7449372,0.8951841,0.911074,0.9150142,0.9335077,0.0170949,0.0524209,70.7121232,74.4937209,0.0470529
4,0.0400272,0.891076,1.7390592,1.7434708,0.9119318,0.8974832,0.9142452,0.9245207,0.0173654,0.0697863,73.9059157,74.347082,0.0625693
5,0.0500128,0.8797161,1.7173886,1.7382633,0.9005682,0.8853097,0.9115145,0.9166919,0.017149,0.0869354,71.7388638,73.826326,0.0776308
6,0.1000255,0.8267037,1.6398302,1.6890467,0.8598979,0.8529982,0.8857062,0.884845,0.0820124,0.1689478,63.9830182,68.9046721,0.1449111
7,0.1500099,0.783841,1.5260375,1.6347309,0.800227,0.8044248,0.8572239,0.8580484,0.0762781,0.2452259,52.6037463,63.4730853,0.2001942
8,0.2000227,0.7455751,1.463516,1.5919211,0.7674419,0.7641487,0.8347752,0.8345701,0.0731945,0.3184203,46.3515987,59.1921066,0.2489345
9,0.3000199,0.6710794,1.3340925,1.5059863,0.6995745,0.7084786,0.7897126,0.7925436,0.1334055,0.4518258,33.4092484,50.5986331,0.3191764
10,0.400017,0.5946852,1.2161557,1.4335338,0.6377305,0.6326692,0.7517197,0.7525778,0.1216121,0.5734379,21.6155679,43.3533807,0.3646225

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.6671839,0.0115815,0.6587448,0.6771945,0.6544893,0.6811739,0.6643170
aic,8281.283,140.94276,8292.175,8188.164,8409.877,8095.222,8420.979
auc,0.7562525,0.0083935,0.7539476,0.7636517,0.7515472,0.7660461,0.7460697
err,0.3328161,0.0115815,0.3412552,0.3228055,0.3455107,0.3188261,0.3356831
err_count,2346.8,95.89682,2398.0,2269.0,2459.0,2227.0,2381.0
f0point5,0.6697307,0.0092720,0.6609719,0.6748992,0.6596254,0.6812984,0.6718582
f1,0.732205,0.0035600,0.7266925,0.7350847,0.7321059,0.7356049,0.7315368
f2,0.8077263,0.0088494,0.8069252,0.8070553,0.8224812,0.7993189,0.8028511
lift_top_group,1.8050249,0.0523627,1.8117794,1.7377949,1.7665415,1.8580035,1.8510054
loglikelihood,0.0,0.0,0.0,0.0,0.0,0.0,0.0


{'model_id': {'default': None,
  'actual': {'__meta': {'schema_version': 3,
    'schema_name': 'ModelKeyV3',
    'schema_type': 'Key<Model>'},
   'name': 'StackedEnsemble_AllModels_1_AutoML_1_20240407_123944',
   'type': 'Key<Model>',
   'URL': '/3/Models/StackedEnsemble_AllModels_1_AutoML_1_20240407_123944'},
  'input': None},
 'training_frame': {'default': None,
  'actual': {'__meta': {'schema_version': 3,
    'schema_name': 'FrameKeyV3',
    'schema_type': 'Key<Frame>'},
   'name': 'AutoML_1_20240407_123944_training_py_40_sid_a544',
   'type': 'Key<Frame>',
   'URL': '/3/Frames/AutoML_1_20240407_123944_training_py_40_sid_a544'},
  'input': {'__meta': {'schema_version': 3,
    'schema_name': 'FrameKeyV3',
    'schema_type': 'Key<Frame>'},
   'name': 'AutoML_1_20240407_123944_training_py_40_sid_a544',
   'type': 'Key<Frame>',
   'URL': '/3/Frames/AutoML_1_20240407_123944_training_py_40_sid_a544'}},
 'response_column': {'default': None,
  'actual': {'__meta': {'schema_version': 3,
    


# 5.3.c


In [85]:
# * (c) Display the AUC score of the best model for the test set. (2)

performance = best_model.model_performance(test)
print(f"AUC score (test set): {performance.auc():.4f}")


AUC score (test set): 0.7452


# 5.3.d

No extreme gradient boosting models were trained by the AutoML model. Therefore, I have adjusted part d to identify the best gradient boosting model.

In [90]:
# * (d) Identify the best XGBoost model among all the models tested using log loss as the criteria. (2)

aml_lb = h2o.automl.get_leaderboard(aml, extra_columns = 'ALL').sort(by='logloss', ascending=True)
display(aml_lb.head(49))

# xgb = aml.get_best_model(algorithm="xgboost", criterion="logloss")
gbm = aml.get_best_model(algorithm="gbm", criterion="logloss")

print(f"Best gbm model: {gbm.model_id}")
print(f"Log loss: {gbm.logloss():.4f}")

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse,training_time_ms,predict_time_per_row_ms,algo
StackedEnsemble_AllModels_1_AutoML_1_20240407_123944,0.756087,0.585832,0.767857,0.348354,0.447694,0.20043,10583,0.119953,StackedEnsemble
StackedEnsemble_BestOfFamily_1_AutoML_1_20240407_123944,0.755393,0.586386,0.767101,0.339776,0.447966,0.200674,6984,0.110318,StackedEnsemble
GBM_1_AutoML_1_20240407_123944,0.752938,0.58934,0.76335,0.351937,0.449191,0.201773,876,0.011968,GBM
GBM_3_AutoML_1_20240407_123944,0.751078,0.590976,0.760174,0.347079,0.449897,0.202407,527,0.010315,GBM
GBM_2_AutoML_1_20240407_123944,0.749332,0.592415,0.759057,0.353812,0.450614,0.203053,490,0.01101,GBM
GBM_5_AutoML_1_20240407_123944,0.745896,0.595113,0.755672,0.355188,0.451947,0.204256,434,0.010734,GBM
GBM_4_AutoML_1_20240407_123944,0.746035,0.596445,0.755082,0.357029,0.452407,0.204672,857,0.010363,GBM
GBM_grid_1_AutoML_1_20240407_123944_model_2,0.743288,0.596842,0.754765,0.362023,0.45291,0.205128,385,0.011329,GBM
GBM_grid_1_AutoML_1_20240407_123944_model_1,0.741982,0.598302,0.75391,0.359084,0.453623,0.205774,535,0.014209,GBM
XRT_1_AutoML_1_20240407_123944,0.743518,0.602369,0.750886,0.35747,0.455421,0.207408,3657,0.020476,DRF


Best gbm model: GBM_1_AutoML_1_20240407_123944
Log loss: 0.5104


As mentioned above, no extreme gradient boosting models were trained by the AutoML model, so gradient boosting models are considered in-place. The best gradient boosting model with respect to log loss is: `GBM_1_AutoML_1_20240407_123944`
```text
Best gbm model: GBM_1_AutoML_1_20240407_123944
Log loss: 0.5104
```
