# Problem 1 - Hyperparameter Optimization using H20

Sources:
*   https://docs.h2o.ai/h2o/latest-stable/h2o-docs/performance-and-prediction.html
*   https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html


## 1.1

### (a)

In [None]:
import h2o
from h2o.grid.grid_search import H2OGridSearch
from h2o.estimators import H2ORandomForestEstimator
from h2o.automl import H2OAutoML

In [None]:
# Initialize H2O
h2o.init(nthreads=-1, max_mem_size=8)

# Load airlines dataset from the given link
airlines = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.17" 2022-10-18; OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu222.04); OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu222.04, mixed mode, sharing)
  Starting server from /ext3/miniconda3/lib/python3.11/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /state/partition1/job-40377324/tmpxttzbgvi
  JVM stdout: /state/partition1/job-40377324/tmpxttzbgvi/h2o_pi2018_started_from_python.out
  JVM stderr: /state/partition1/job-40377324/tmpxttzbgvi/h2o_pi2018_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,01 secs
H2O_cluster_timezone:,America/New_York
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.44.0.2
H2O_cluster_version_age:,17 days
H2O_cluster_name:,H2O_from_python_pi2018_7vrsft
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,8 Gb
H2O_cluster_total_cores:,14
H2O_cluster_allowed_cores:,14


Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


In [None]:
# Split the dataset into training and test sets
train, test = airlines.split_frame(ratios=[.8], seed=1234)

# Define predictors and response
response = "IsDepDelayed" # y
predictors = ["Origin", "Dest", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance", "FlightNum"] #x

# Define hyperparameters for grid search
hyper_params = {'ntrees': [10, 30, 50, 100], 'max_depth': [1, 2, 4, 6]}

# Initialize H2ORandomForestEstimator
rf = H2ORandomForestEstimator()

In [None]:
# Perform Grid Search
grid = H2OGridSearch(model=rf, hyper_params=hyper_params,
                     search_criteria={'strategy': "Cartesian"})
grid.train(x=predictors, y=response, training_frame=train)

drf Grid Build progress: |███████████████████████████████████████████████████████| (done) 100%


Unnamed: 0,max_depth,ntrees,model_ids,logloss
,6.0,100.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_16,0.6171837
,6.0,50.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_12,0.6175312
,6.0,30.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_8,0.6194176
,6.0,10.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_4,0.622914
,4.0,100.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_15,0.6343926
,4.0,50.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_11,0.6348209
,4.0,30.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_7,0.6353276
,4.0,10.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_3,0.642091
,2.0,50.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_10,0.657764
,2.0,30.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_6,0.6578709


**Answer:**
Above we can see the summary of our hyper-parameter grid search ordered by incresing log loss. As we want to minimize the loss we can see that the best models are on top. We can immediately notice that trees with more depth (6) have lower log loss and thus better performance. Secondary to this is their number of trees, more tress there are, for same depth, the model has lower log loss. Thus we can conclude that best model has the maximum depth and maximum number of trees, in out case the best model has depth of 6 and 100 trees. After that, all other models with depth 6 follow, with 50 trees, then 30 trees, and then 10 trees. To sum up, the max depth in our case is more relevant for better performance and then number of trees - we want the max depth and number of trees to be as big as possible.

### (b)

In [None]:
# Display the grid results, sorted by accuracy
grid_results = grid.get_grid(sort_by='accuracy', decreasing=True)
print(grid_results)

Hyper-Parameter Search Summary: ordered by decreasing accuracy
    max_depth    ntrees    model_ids                                                     accuracy
--  -----------  --------  ------------------------------------------------------------  ----------
    6            100       Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_16  0.671697
    6            50        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_12  0.66852
    6            30        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_8   0.668321
    6            10        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_4   0.658838
    4            100       Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_15  0.654705
    4            50        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_11  0.654563
    4            30        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_7   0.649712
    4            10        Grid_DRF_py_5_sid_b6a1_mode

Here we can see the same results we notices above as it is the same output just instead of increasing log loss we are looking at decreasing accuracy. We can conclude that the best models have depth as large as possible and then as many trees as possible. In our case, we can see that the best model has depth of 6 and 100 trees and achieves accuracy of 0.67.

### (c)

In [None]:
# Identify the best model
best_model = grid_results.models[0]
print("Best model:", best_model)

Best model: Model Details
H2ORandomForestEstimator : Distributed Random Forest
Model Key: Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_1_model_16


Model Summary: 
    number_of_trees    number_of_internal_trees    model_size_in_bytes    min_depth    max_depth    mean_depth    min_leaves    max_leaves    mean_leaves
--  -----------------  --------------------------  ---------------------  -----------  -----------  ------------  ------------  ------------  -------------
    100                100                         124693                 6            6            6             31            64            58.32

ModelMetricsBinomial: drf
** Reported on train data. **

MSE: 0.21389354824635162
RMSE: 0.46248626817058214
LogLoss: 0.6171836851866914
Mean Per-Class Error: 0.38007776922466097
AUC: 0.7236719301730956
AUCPR: 0.7352984343448201
Gini: 0.4473438603461912

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.4000107713270529
       NO    YES    Error    Rate
-----  ---- 

In [None]:
# Evaluate the best model’s performance on the test set and display the AUC score
performance = best_model.model_performance(test)
print("AUC: ", performance.auc())

AUC:  0.7185525921634769


**Answer:**
Above we can see that the best model is the Distributed Random Forest with 100 trees, of depth of 6 and average number of 58 leaves. We can also see that the AUC it achieved is 0.72 which means it preforms relatively well with MSE of only 0.21.

## 1.2

### (a)

In [None]:
# Perform Randomized Grid Search
# Search criteria for randomized search with a maximum of 10 models
search_criteria = {'strategy': 'RandomDiscrete', 'max_models': 10, 'seed': 1234}

# Create and train the grid
random_grid = H2OGridSearch(model=rf, hyper_params=hyper_params, search_criteria=search_criteria)
random_grid.train(x=predictors, y=response, training_frame=train)

drf Grid Build progress: |███████████████████████████████████████████████████████| (done) 100%


Unnamed: 0,max_depth,ntrees,model_ids,logloss
,6.0,30.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_3,0.6176122
,6.0,10.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_2,0.6283258
,4.0,30.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_1,0.6343485
,4.0,10.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_8,0.6416632
,2.0,100.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_10,0.6562125
,2.0,50.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_4,0.6582721
,2.0,30.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_7,0.6588143
,2.0,10.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_5,0.6622335
,1.0,100.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_9,0.6722703
,1.0,10.0,Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_6,0.6752225


Unlike 'regular' grid search, randomized grid search will not try out all the possible combinations of model hyper-parameters but only as many random ones as we state, in out case 10. We can see above that the trend is the same as before - better models have as large max depth as possible followed by as large number of trees as possible. However, we can also notice that the model that was the best in the previous part (max depth of 6 and 100 trees) is not an option here but the best is the one with max depth of 6 and 30 trees.

### (b)

In [None]:
# Display the grid results, sorted by accuracy
random_grid_results = random_grid.get_grid(sort_by='accuracy', decreasing=True)
print(random_grid_results)

Hyper-Parameter Search Summary: ordered by decreasing accuracy
    max_depth    ntrees    model_ids                                                       accuracy
--  -----------  --------  --------------------------------------------------------------  ----------
    6            30        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_3   0.669541
    6            10        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_2   0.658377
    4            30        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_1   0.65323
    4            10        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_8   0.637008
    2            50        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_4   0.63689
    2            30        Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_7   0.630932
    2            100       Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_10  0.628805
    1            100       Grid_DRF_p

Similarly as mentiones above, we can see that the models with max depth being larger are perfomrning better and then those with larger number of trees. More precisely model with higest max depth and largest number of trees has the best accuacy. In our case that is the model with max depth of 6, 30 trees and it achieves the accuracy of 0.67.

### (c)

In [None]:
# Identify the best model
best_random_model = random_grid_results.models[0]
print(best_random_model)

Model Details
H2ORandomForestEstimator : Distributed Random Forest
Model Key: Grid_DRF_py_5_sid_b6a1_model_python_1700973201261_763_model_3


Model Summary: 
    number_of_trees    number_of_internal_trees    model_size_in_bytes    min_depth    max_depth    mean_depth    min_leaves    max_leaves    mean_leaves
--  -----------------  --------------------------  ---------------------  -----------  -----------  ------------  ------------  ------------  -------------
    30                 30                          36503                  6            6            6             44            64            56.6333

ModelMetricsBinomial: drf
** Reported on train data. **

MSE: 0.21415083682804756
RMSE: 0.4627643426497417
LogLoss: 0.6176121669832266
Mean Per-Class Error: 0.38047446254674316
AUC: 0.721410292949934
AUCPR: 0.7337727146275909
Gini: 0.44282058589986795

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.394171012902394
       NO    YES    Error    Rate
-----  ----  -----  ---

In [None]:
# Evaluate the best model’s performance on the test set and display the AUC score
random_performance = best_random_model.model_performance(test)
random_auc_score = random_performance.auc()
print("AUC: ", random_auc_score)

AUC:  0.7198049282752954


**Answer:**
As mentioned, we can see that the best model has 30 trees, max depth of 6, and average of 57 leaves. As in this previous question we can see that the AUC is relatively high and approximately equal to 0.72. While this is the same (slightly larger even 0.7198 vs 0.7186) as in part 1 when we were using 'regular' grid search this model not only took shorteer time to identify but also has less trees which most likely means it is less computationally complex and with that cheaper to train. It is worth noting that while random grid search takes shorter and in our case did not cost us any performance, it could if it failed to randomly select best model hyper-parameter combinations since it does not test for all combinations as 'regular' grid search does.

## 1.3

### (a)

In [None]:
# Initialize H2OAutoML
automl = H2OAutoML(max_models=20, seed=1) #, include_algos=["DeepLearning", "XGBoost"]

# Train using AutoML
automl.train(x=predictors, y=response, training_frame=train)

AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%


key,value
Stacking strategy,cross_validation
Number of base models (used / total),11/20
# GBM base models (used / total),3/7
# XGBoost base models (used / total),5/6
# DRF base models (used / total),2/2
# DeepLearning base models (used / total),1/4
# GLM base models (used / total),0/1
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5

Unnamed: 0,NO,YES,Error,Rate
NO,2999.0,1752.0,0.3688,(1752.0/4751.0)
YES,939.0,4378.0,0.1766,(939.0/5317.0)
Total,3938.0,6130.0,0.2673,(2691.0/10068.0)

metric,threshold,value,idx
max f1,0.4395611,0.7649166,242.0
max f2,0.2420599,0.8641403,338.0
max f0point5,0.5605701,0.7671398,180.0
max accuracy,0.5247394,0.7376838,198.0
max precision,0.9725215,1.0,0.0
max recall,0.1235624,1.0,390.0
max specificity,0.9725215,1.0,0.0
max absolute_mcc,0.5553316,0.4809077,183.0
max min_per_class_accuracy,0.5093724,0.7348129,206.0
max mean_per_class_accuracy,0.5392195,0.7397728,191.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100318,0.927991,1.893549,1.893549,1.0,0.948266,1.0,0.948266,0.0189957,0.0189957,89.3548994,89.3548994,0.0189957
2,0.0200636,0.912041,1.856053,1.874801,0.980198,0.9200426,0.990099,0.9341543,0.0186195,0.0376152,85.6052974,87.4800984,0.0371942
3,0.0300954,0.8956902,1.893549,1.8810503,1.0,0.9026487,0.9933993,0.9236524,0.0189957,0.0566109,89.3548994,88.1050321,0.0561899
4,0.0401271,0.8870695,1.856053,1.874801,0.980198,0.8908566,0.990099,0.9154535,0.0186195,0.0752304,85.6052974,87.4800984,0.0743885
5,0.0500596,0.8735119,1.8367425,1.8672497,0.97,0.8799223,0.9861111,0.9084036,0.0182434,0.0934738,83.6742524,86.7249702,0.0920004
6,0.1000199,0.8227266,1.7730846,1.8202139,0.9363817,0.8459136,0.9612711,0.8771896,0.0885838,0.1820576,77.3084644,82.0213928,0.1738488
7,0.1502781,0.77892,1.6802441,1.7734031,0.8873518,0.8006258,0.9365499,0.851584,0.0844461,0.2665037,68.0244068,77.3403122,0.2462974
8,0.2001391,0.7414836,1.5691561,1.7225188,0.8286853,0.7597102,0.9096774,0.8286953,0.0782396,0.3447433,56.9156138,72.2518762,0.3064356
9,0.3001589,0.6648409,1.455419,1.633515,0.7686197,0.7032327,0.8626737,0.7868883,0.1455708,0.4903141,45.5418988,63.3514966,0.4029641
10,0.3999801,0.5911217,1.2567136,1.5394784,0.6636816,0.6266095,0.8130122,0.7468883,0.1254467,0.6157608,25.6713611,53.9478372,0.4572678

Unnamed: 0,NO,YES,Error,Rate
NO,7109.0,9657.0,0.576,(9657.0/16766.0)
YES,2373.0,16112.0,0.1284,(2373.0/18485.0)
Total,9482.0,25769.0,0.3413,(12030.0/35251.0)

metric,threshold,value,idx
max f1,0.3610693,0.7281602,283.0
max f2,0.176969,0.8485025,372.0
max f0point5,0.5608147,0.7076517,178.0
max accuracy,0.4967571,0.6890585,210.0
max precision,0.976162,1.0,0.0
max recall,0.0730931,1.0,399.0
max specificity,0.976162,1.0,0.0
max absolute_mcc,0.5150185,0.3767557,201.0
max min_per_class_accuracy,0.5128185,0.6868663,202.0
max mean_per_class_accuracy,0.5150185,0.688582,201.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100139,0.9231134,1.8043623,1.8043623,0.9461756,0.9439106,0.9461756,0.9439106,0.0180687,0.0180687,80.4362315,80.4362315,0.0169355
2,0.0200278,0.9064982,1.7449372,1.7746498,0.9150142,0.9141833,0.9305949,0.929047,0.0174736,0.0355423,74.4937209,77.4649762,0.0326198
3,0.0300133,0.8924418,1.7498944,1.7664136,0.9176136,0.8993509,0.926276,0.919167,0.0174736,0.053016,74.9894417,76.6413579,0.0483637
4,0.0400272,0.8803131,1.663903,1.7407678,0.8725212,0.886277,0.9128278,0.9109387,0.0166622,0.0696781,66.3902973,74.0767765,0.0623418
5,0.0500411,0.8682711,1.7017189,1.7329536,0.8923513,0.8740045,0.9087302,0.9035476,0.0170408,0.086719,70.171895,73.2953575,0.0771162
6,0.1000255,0.8191489,1.6440077,1.6885059,0.8620885,0.8429744,0.8854226,0.8732782,0.0821747,0.1688937,64.4007735,68.850588,0.1447973
7,0.1500099,0.7769741,1.5141322,1.6304033,0.7939841,0.797673,0.8549546,0.848086,0.075683,0.2445767,51.4132206,63.0403306,0.1988293
8,0.2000227,0.7386506,1.4862313,1.5943552,0.7793534,0.7570568,0.8360516,0.8253255,0.0743305,0.3189072,48.6231313,59.4355196,0.2499582
9,0.3000199,0.6642188,1.3346335,1.5077895,0.6998582,0.7020252,0.7906581,0.7842293,0.1334596,0.4523668,33.4633479,50.7789476,0.3203138
10,0.400017,0.5898867,1.188565,1.427989,0.6232624,0.6271415,0.7488121,0.7449601,0.1188531,0.5712199,18.8564959,42.7989006,0.359959

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.6575674,0.0068684,0.6624911,0.6596321,0.6607819,0.6454726,0.6594595
auc,0.7526929,0.0019961,0.7515209,0.7557032,0.7533578,0.7504595,0.7524232
err,0.3424326,0.0068684,0.3375089,0.3403679,0.3392182,0.3545274,0.3405405
err_count,2414.2,50.206573,2371.0,2387.0,2421.0,2498.0,2394.0
f0point5,0.6616153,0.0056471,0.6664169,0.6612577,0.6645367,0.6520711,0.6637942
f1,0.7289358,0.0028274,0.7298006,0.7285340,0.7332819,0.7259763,0.7270862
f2,0.8115892,0.0066969,0.8065085,0.8110504,0.8178914,0.8187756,0.8037199
lift_top_group,1.7834879,0.0593886,1.8491638,1.7314646,1.8473775,1.7448102,1.7446233
logloss,0.5885047,0.0017558,0.5893163,0.5863451,0.5876075,0.5909843,0.5882701
max_per_class_error,0.5855004,0.0267832,0.5643565,0.57674,0.5894706,0.6300388,0.5668961


We can see from above that AutoML finds a models with higher accuracy than grid search and randomized grid search but we might want to consider if it is worth it as it took much much longer to run than other methods which results in higher computational cost.

### (b)

In [None]:
# Display the AutoML leaderboard
leaderboard = automl.leaderboard
leaderboard.head(rows=leaderboard.nrows)

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_AllModels_1_AutoML_1_20231125_233359,0.752628,0.588492,0.765959,0.352181,0.449103,0.201694
StackedEnsemble_BestOfFamily_1_AutoML_1_20231125_233359,0.751776,0.589156,0.76515,0.352557,0.449434,0.201991
XGBoost_grid_1_AutoML_1_20231125_233359_model_2,0.748797,0.592207,0.762795,0.362161,0.4509,0.203311
GBM_1_AutoML_1_20231125_233359,0.746962,0.593601,0.759656,0.353689,0.451444,0.203801
GBM_4_AutoML_1_20231125_233359,0.745632,0.595072,0.756256,0.353863,0.452035,0.204336
XGBoost_1_AutoML_1_20231125_233359,0.745275,0.595679,0.759311,0.356014,0.452394,0.20466
GBM_grid_1_AutoML_1_20231125_233359_model_1,0.744164,0.596406,0.75479,0.35314,0.452651,0.204893
XRT_1_AutoML_1_20231125_233359,0.743993,0.597298,0.754893,0.36765,0.453099,0.205298
XGBoost_grid_1_AutoML_1_20231125_233359_model_1,0.743921,0.596006,0.757338,0.357639,0.45266,0.204901
XGBoost_2_AutoML_1_20231125_233359,0.742837,0.597405,0.754799,0.356223,0.453201,0.205392


Looking at the above results we can see that there were 22 various models we tested on. We can also notice that the Stacked Ensemble models preformed best with highest accuracy of 0.75, followed by XGBoost.

In [None]:
# Identify the best performing model and print its parameters
best_model = automl.get_best_model()
best_model

key,value
Stacking strategy,cross_validation
Number of base models (used / total),11/20
# GBM base models (used / total),3/7
# XGBoost base models (used / total),5/6
# DRF base models (used / total),2/2
# DeepLearning base models (used / total),1/4
# GLM base models (used / total),0/1
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5

Unnamed: 0,NO,YES,Error,Rate
NO,2999.0,1752.0,0.3688,(1752.0/4751.0)
YES,939.0,4378.0,0.1766,(939.0/5317.0)
Total,3938.0,6130.0,0.2673,(2691.0/10068.0)

metric,threshold,value,idx
max f1,0.4395611,0.7649166,242.0
max f2,0.2420599,0.8641403,338.0
max f0point5,0.5605701,0.7671398,180.0
max accuracy,0.5247394,0.7376838,198.0
max precision,0.9725215,1.0,0.0
max recall,0.1235624,1.0,390.0
max specificity,0.9725215,1.0,0.0
max absolute_mcc,0.5553316,0.4809077,183.0
max min_per_class_accuracy,0.5093724,0.7348129,206.0
max mean_per_class_accuracy,0.5392195,0.7397728,191.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100318,0.927991,1.893549,1.893549,1.0,0.948266,1.0,0.948266,0.0189957,0.0189957,89.3548994,89.3548994,0.0189957
2,0.0200636,0.912041,1.856053,1.874801,0.980198,0.9200426,0.990099,0.9341543,0.0186195,0.0376152,85.6052974,87.4800984,0.0371942
3,0.0300954,0.8956902,1.893549,1.8810503,1.0,0.9026487,0.9933993,0.9236524,0.0189957,0.0566109,89.3548994,88.1050321,0.0561899
4,0.0401271,0.8870695,1.856053,1.874801,0.980198,0.8908566,0.990099,0.9154535,0.0186195,0.0752304,85.6052974,87.4800984,0.0743885
5,0.0500596,0.8735119,1.8367425,1.8672497,0.97,0.8799223,0.9861111,0.9084036,0.0182434,0.0934738,83.6742524,86.7249702,0.0920004
6,0.1000199,0.8227266,1.7730846,1.8202139,0.9363817,0.8459136,0.9612711,0.8771896,0.0885838,0.1820576,77.3084644,82.0213928,0.1738488
7,0.1502781,0.77892,1.6802441,1.7734031,0.8873518,0.8006258,0.9365499,0.851584,0.0844461,0.2665037,68.0244068,77.3403122,0.2462974
8,0.2001391,0.7414836,1.5691561,1.7225188,0.8286853,0.7597102,0.9096774,0.8286953,0.0782396,0.3447433,56.9156138,72.2518762,0.3064356
9,0.3001589,0.6648409,1.455419,1.633515,0.7686197,0.7032327,0.8626737,0.7868883,0.1455708,0.4903141,45.5418988,63.3514966,0.4029641
10,0.3999801,0.5911217,1.2567136,1.5394784,0.6636816,0.6266095,0.8130122,0.7468883,0.1254467,0.6157608,25.6713611,53.9478372,0.4572678

Unnamed: 0,NO,YES,Error,Rate
NO,7109.0,9657.0,0.576,(9657.0/16766.0)
YES,2373.0,16112.0,0.1284,(2373.0/18485.0)
Total,9482.0,25769.0,0.3413,(12030.0/35251.0)

metric,threshold,value,idx
max f1,0.3610693,0.7281602,283.0
max f2,0.176969,0.8485025,372.0
max f0point5,0.5608147,0.7076517,178.0
max accuracy,0.4967571,0.6890585,210.0
max precision,0.976162,1.0,0.0
max recall,0.0730931,1.0,399.0
max specificity,0.976162,1.0,0.0
max absolute_mcc,0.5150185,0.3767557,201.0
max min_per_class_accuracy,0.5128185,0.6868663,202.0
max mean_per_class_accuracy,0.5150185,0.688582,201.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100139,0.9231134,1.8043623,1.8043623,0.9461756,0.9439106,0.9461756,0.9439106,0.0180687,0.0180687,80.4362315,80.4362315,0.0169355
2,0.0200278,0.9064982,1.7449372,1.7746498,0.9150142,0.9141833,0.9305949,0.929047,0.0174736,0.0355423,74.4937209,77.4649762,0.0326198
3,0.0300133,0.8924418,1.7498944,1.7664136,0.9176136,0.8993509,0.926276,0.919167,0.0174736,0.053016,74.9894417,76.6413579,0.0483637
4,0.0400272,0.8803131,1.663903,1.7407678,0.8725212,0.886277,0.9128278,0.9109387,0.0166622,0.0696781,66.3902973,74.0767765,0.0623418
5,0.0500411,0.8682711,1.7017189,1.7329536,0.8923513,0.8740045,0.9087302,0.9035476,0.0170408,0.086719,70.171895,73.2953575,0.0771162
6,0.1000255,0.8191489,1.6440077,1.6885059,0.8620885,0.8429744,0.8854226,0.8732782,0.0821747,0.1688937,64.4007735,68.850588,0.1447973
7,0.1500099,0.7769741,1.5141322,1.6304033,0.7939841,0.797673,0.8549546,0.848086,0.075683,0.2445767,51.4132206,63.0403306,0.1988293
8,0.2000227,0.7386506,1.4862313,1.5943552,0.7793534,0.7570568,0.8360516,0.8253255,0.0743305,0.3189072,48.6231313,59.4355196,0.2499582
9,0.3000199,0.6642188,1.3346335,1.5077895,0.6998582,0.7020252,0.7906581,0.7842293,0.1334596,0.4523668,33.4633479,50.7789476,0.3203138
10,0.400017,0.5898867,1.188565,1.427989,0.6232624,0.6271415,0.7488121,0.7449601,0.1188531,0.5712199,18.8564959,42.7989006,0.359959

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.6575674,0.0068684,0.6624911,0.6596321,0.6607819,0.6454726,0.6594595
auc,0.7526929,0.0019961,0.7515209,0.7557032,0.7533578,0.7504595,0.7524232
err,0.3424326,0.0068684,0.3375089,0.3403679,0.3392182,0.3545274,0.3405405
err_count,2414.2,50.206573,2371.0,2387.0,2421.0,2498.0,2394.0
f0point5,0.6616153,0.0056471,0.6664169,0.6612577,0.6645367,0.6520711,0.6637942
f1,0.7289358,0.0028274,0.7298006,0.7285340,0.7332819,0.7259763,0.7270862
f2,0.8115892,0.0066969,0.8065085,0.8110504,0.8178914,0.8187756,0.8037199
lift_top_group,1.7834879,0.0593886,1.8491638,1.7314646,1.8473775,1.7448102,1.7446233
logloss,0.5885047,0.0017558,0.5893163,0.5863451,0.5876075,0.5909843,0.5882701
max_per_class_error,0.5855004,0.0267832,0.5643565,0.57674,0.5894706,0.6300388,0.5668961


In [None]:
print("Best Model Parameters: ", best_model.params)

Best Model Parameters:  {'model_id': {'default': None, 'actual': {'__meta': {'schema_version': 3, 'schema_name': 'ModelKeyV3', 'schema_type': 'Key<Model>'}, 'name': 'StackedEnsemble_AllModels_1_AutoML_1_20231125_233359', 'type': 'Key<Model>', 'URL': '/3/Models/StackedEnsemble_AllModels_1_AutoML_1_20231125_233359'}, 'input': None}, 'training_frame': {'default': None, 'actual': {'__meta': {'schema_version': 3, 'schema_name': 'FrameKeyV3', 'schema_type': 'Key<Frame>'}, 'name': 'AutoML_1_20231125_233359_training_py_5_sid_b6a1', 'type': 'Key<Frame>', 'URL': '/3/Frames/AutoML_1_20231125_233359_training_py_5_sid_b6a1'}, 'input': {'__meta': {'schema_version': 3, 'schema_name': 'FrameKeyV3', 'schema_type': 'Key<Frame>'}, 'name': 'AutoML_1_20231125_233359_training_py_5_sid_b6a1', 'type': 'Key<Frame>', 'URL': '/3/Frames/AutoML_1_20231125_233359_training_py_5_sid_b6a1'}}, 'response_column': {'default': None, 'actual': {'__meta': {'schema_version': 3, 'schema_name': 'ColSpecifierV3', 'schema_type':

**Answer:**
We can see that the best model was a Stacked Ensemble model that used cross validation strategy and 11 base models. Out of these 11 models, 3 were GBM, 5 XGBoost, 2 DRF, and 1 DeepLearning base model. We can also see that the model achieved AUC of 0.82 on train data and 0.75 on cross validation data.





### (c)

In [None]:
# Display the AUC score of the best model for the test set
best_model_performance = best_model.model_performance(test)
print("AUC: ", best_model_performance.auc())

AUC:  0.7565548897901022


We can see that the previously mentioned best model - Stacked Ensemble model achieves the AUC of 0.76 on test data, which is slightly higer than models in previous 2 approaches, 'regular' grid search and randomized grid search (0.72 for those). But it is worth noting that we should consider if this increase of 0.04 is worth the significantly longer training time.

### (d)

In [None]:
# Identify the best XGBoost model
best_log_loss_model = automl.get_best_model(algorithm="xgboost", criterion="logloss")
performance = best_log_loss_model.model_performance(test)
best_log_loss = performance.logloss()
best_performance_auc = performance.auc()

# Display the best XGBoost model and its log loss
print("Best XGBoost Model: ", best_log_loss_model.model_id)
print("Best XGBoost Model Log Loss: ", best_log_loss)
print("Best XGBoost AUC: ", best_performance_auc)

Best XGBoost Model:  XGBoost_grid_1_AutoML_1_20231125_233359_model_2
Best XGBoost Model Log Loss:  0.5859663231369966
Best XGBoost AUC:  0.755329369507694


**Answer:**
Looking at the leaderboard from part 3.b. we can see that after 2 Stacked Ensemble models next best one is XGBoost model. This model achieved the AUC of 0.755 which is just slighly lower than 0.756 the best Stacked Ensemble model achieved. However, XGBoost on its own has significantly lower training time compared to Stacked Ensemble modles as it does not require many base models as Stacked Ensemble does. Due to this I would argue that 0.001 decrese in accuracy is worth it as XGBoost has significantly lower (and cheaper) training time.