<a href="https://colab.research.google.com/github/Shoab02/Ensemble-Learning/blob/main/The_Boosters.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The Boosters:


=> AdaBoost
=> GradientBoost
=> XgBoost

These algos work on stagewise additive modelling.

Some key points about each algorithm:

AdaBoost:
-> Multiple weak learners combined (Decision stumps which are nothing DT with max_depth=1)
-> Weights (α-alpha) are assigned to each model based on their error rate. This decides the importance of this model in the final prediction.


GradientBoost:
-> 1st pred (model) is mean
->pseduo residuals are calculated( actual-pred) which in turn will passed as target to the next model. The next model works on reducing this residual. 


XgBoost:
-> Uses customized DTs
-> Uses Similarity score to create the DTs
-> handles missing values

Dataset used:
Heart attack prediction
https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset


In [1]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/

In [2]:
!kaggle datasets download -d rashikrahmanpritom/heart-attack-analysis-prediction-dataset

Downloading heart-attack-analysis-prediction-dataset.zip to /content
  0% 0.00/4.11k [00:00<?, ?B/s]
100% 4.11k/4.11k [00:00<00:00, 3.60MB/s]


In [3]:
import zipfile
zip_ref = zipfile.ZipFile('/content/heart-attack-analysis-prediction-dataset.zip','r')
zip_ref.extractall('/content')
zip_ref.close()

In [4]:
import pandas as pd
import numpy as np

from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier

In [5]:
df = pd.read_csv('/content/heart.csv')

In [6]:
df.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [7]:
df.shape

(303, 14)

In [6]:
X = df.drop("output",axis=1)
y = df['output']

In [9]:
abc = AdaBoostClassifier()
gbc = GradientBoostingClassifier()
xgb = XGBClassifier() 

In [10]:
from sklearn.model_selection import cross_val_score, GridSearchCV,RepeatedStratifiedKFold


In [11]:
print("AdaBoostClassifier", np.mean(cross_val_score(abc,X, y, cv=10)))
print("GradientBoostingClassifier", np.mean(cross_val_score(gbc,X, y, cv=10)))
print("XGBClassifier", np.mean(cross_val_score(xgb,X, y, cv=10)))

AdaBoostClassifier 0.8112903225806452
GradientBoostingClassifier 0.801505376344086
XGBClassifier 0.8016129032258064


In [7]:
abc = AdaBoostClassifier()
gbc = GradientBoostingClassifier()
xgb = XGBClassifier() 

In [23]:
abc_tuners = {
    'n_estimators':[10,50,100,200],
    'learning_rate':[0.001,0.01,0.1],
    'algorithm': ['SAMME', 'SAMME.R']
}

gbc_tuners = {
    'learning_rate':[0.001,0.01,0.1],
    'n_estimators':[10,50,100,200],
    'max_leaf_nodes': [8,16,32],
    'subsample':[0.6,1],
    'max_features':['auto',None]
}

xgb_tuners={
    'n_estimators':[10,50,100,200],
    'eta':[0.1,0.3,0.5],
    'subsample':[0.6,1],
    'max_depth': [5, 10, 20],
    'gamma':[20,50],
    'reg_lambda':[1,5,10]
}

In [None]:
def model_tuner():
  pass

In [11]:
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search = GridSearchCV(estimator=abc, param_grid=abc_tuners, n_jobs=-1, cv=cv, scoring='accuracy')
grid_result = grid_search.fit(X, y)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Best: 0.846308 using {'algorithm': 'SAMME', 'learning_rate': 0.1, 'n_estimators': 100}


In [13]:
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search_gbc = GridSearchCV(estimator=gbc, param_grid=gbc_tuners, n_jobs=-1, cv=cv, scoring='accuracy')
grid_result_gbc = grid_search_gbc.fit(X, y)
print("Best: %f using %s" % (grid_result_gbc.best_score_, grid_result_gbc.best_params_))

Best: 0.823082 using {'learning_rate': 0.1, 'max_features': None, 'max_leaf_nodes': 32, 'n_estimators': 50, 'subsample': 0.6}


In [14]:
grid_search_xgb = GridSearchCV(estimator=xgb, param_grid=xgb_tuners, n_jobs=-1, cv=cv, scoring='accuracy')
grid_result_xgb = grid_search_xgb.fit(X, y)
print("Best: %f using %s" % (grid_result_xgb.best_score_, grid_result_xgb.best_params_))

Best: 0.827634 using {'eta': 0.1, 'gamma': 20, 'max_depth': 5, 'n_estimators': 50, 'reg_lambda': 10, 'subsample': 1}


Let's check if data preprocessing helps the algorithms

In [15]:
from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
X_scaled = ss.fit_transform(X)


In [19]:
def model_tuner(model, tuning_params, X, y):

  cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

  grid_search = GridSearchCV(estimator=model, param_grid=tuning_params, n_jobs=-1, cv=cv, scoring='accuracy')
  grid_result = grid_search.fit(X, y)
  return grid_result.best_score_, grid_result.best_params_



In [25]:
abc = AdaBoostClassifier()
gbc = GradientBoostingClassifier()
xgb = XGBClassifier() 

In [26]:
best_score_, best_params_ = model_tuner(abc, abc_tuners, X_scaled, y)
print("Best: %f using %s" % (best_score_, best_params_))

Best: 0.848459 using {'algorithm': 'SAMME', 'learning_rate': 0.1, 'n_estimators': 200}


In [27]:
best_score_, best_params_ = model_tuner(xgb, xgb_tuners, X_scaled, y)
print("Best: %f using %s" % (best_score_, best_params_))

Best: 0.827634 using {'eta': 0.1, 'gamma': 20, 'max_depth': 5, 'n_estimators': 50, 'reg_lambda': 10, 'subsample': 1}


In [28]:
best_score_, best_params_ = model_tuner(gbc, gbc_tuners, X_scaled, y)
print("Best: %f using %s" % (best_score_, best_params_))

Best: 0.824122 using {'learning_rate': 0.1, 'max_features': 'auto', 'max_leaf_nodes': 16, 'n_estimators': 50, 'subsample': 0.6}


Let's check best models using Automl

In [30]:
!pip install h2o

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting h2o
  Downloading h2o-3.38.0.2.tar.gz (177.4 MB)
[K     |████████████████████████████████| 177.4 MB 40 kB/s 
Building wheels for collected packages: h2o
  Building wheel for h2o (setup.py) ... [?25l[?25hdone
  Created wheel for h2o: filename=h2o-3.38.0.2-py2.py3-none-any.whl size=177521195 sha256=ba075b490b90159da9de0d3fbdadfdd11178b26e9eec8f044804ed8c8eb82948
  Stored in directory: /root/.cache/pip/wheels/e4/ef/ab/a9b2e452e18b3dfea0b6114bc57c3b9e8b0e464eb2d03230e1
Successfully built h2o
Installing collected packages: h2o
Successfully installed h2o-3.38.0.2


In [31]:
import h2o
from h2o.automl import H2OAutoML

In [32]:
#Initializing h2o cluster
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.16" 2022-07-19; OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu118.04); OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu118.04, mixed mode, sharing)
  Starting server from /usr/local/lib/python3.7/dist-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpw822k3dw
  JVM stdout: /tmp/tmpw822k3dw/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmpw822k3dw/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,03 secs
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.38.0.2
H2O_cluster_version_age:,6 days
H2O_cluster_name:,H2O_from_python_unknownUser_buca13
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.172 Gb
H2O_cluster_total_cores:,2
H2O_cluster_allowed_cores:,2


In [41]:
from sklearn.model_selection import train_test_split

In [42]:
X_train, X_test, y_train, y_test = train_test_split(df.drop('output',axis=1),df['output'],test_size=0.2,random_state=22 )

In [48]:
type(X_train),type(y_train)
y_frame=pd.DataFrame(y_train)
type(X_train),type(y_frame)

(pandas.core.frame.DataFrame, pandas.core.frame.DataFrame)

In [58]:
train_df = pd.concat([X_train,y_train],axis=1)
test_df = pd.concat([X_test,y_test],axis=1)


In [59]:
h2o_train_df = h2o.H2OFrame(train_df)
h2o_test_df = h2o.H2OFrame(test_df)

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


In [60]:
h2o_train_df['output'] = h2o_train_df['output'].asfactor()
h2o_test_df['output']=  h2o_test_df['output'].asfactor()

In [61]:
aml = H2OAutoML(max_models = 5, seed = 10, exclude_algos = ["StackedEnsemble"], verbosity="info", nfolds=0, balance_classes=True, max_after_balance_size=0.3)


In [64]:
x=list(X_train.columns)
y='output'

In [65]:
aml.train(x = x, y = y, training_frame = h2o_train_df, validation_frame=h2o_test_df)


AutoML progress: |
14:29:11.868: Project: AutoML_1_20221103_142911
14:29:11.870: Cross-validation disabled by user: no fold column nor nfolds > 1.
14:29:11.872: Setting stopping tolerance adaptively based on the training frame: 0.05
14:29:11.872: Build control seed: 10
14:29:11.873: training frame: Frame key: AutoML_1_20221103_142911_training_py_2_sid_968f    cols: 14    rows: 242  chunks: 1    size: 4314  checksum: 1205836456694626329
14:29:11.884: validation frame: Frame key: py_3_sid_968f    cols: 14    rows: 61  chunks: 1    size: 2231  checksum: 2145970696736566986
14:29:11.884: leaderboard frame: Frame key: py_3_sid_968f    cols: 14    rows: 61  chunks: 1    size: 2231  checksum: 2145970696736566986
14:29:11.884: blending frame: NULL
14:29:11.884: response column: output
14:29:11.884: fold column: null
14:29:11.884: weights column: null
14:29:11.926: Loading execution steps: [{XGBoost : [def_2 (1g, 10w), def_1 (2g, 10w), def_3 (3g, 10w), grid_1 (4g, 90w), lr_search (7g, 30w)]}, {

Unnamed: 0,number_of_trees
,35.0

Unnamed: 0,0,1,Error,Rate
0,89.0,22.0,0.1982,(22.0/111.0)
1,4.0,127.0,0.0305,(4.0/131.0)
Total,93.0,149.0,0.1074,(26.0/242.0)

metric,threshold,value,idx
max f1,0.4130957,0.9071429,145.0
max f2,0.3825178,0.9453471,149.0
max f0point5,0.634675,0.9137056,111.0
max accuracy,0.4130957,0.892562,145.0
max precision,0.9917115,1.0,0.0
max recall,0.0661617,1.0,188.0
max specificity,0.9917115,1.0,0.0
max absolute_mcc,0.4130957,0.7900736,145.0
max min_per_class_accuracy,0.5533786,0.8828829,125.0
max mean_per_class_accuracy,0.5655742,0.8893817,122.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0123967,0.9883184,1.8473282,1.8473282,1.0,0.9907518,1.0,0.9907518,0.0229008,0.0229008,84.7328244,84.7328244,0.0229008
2,0.0206612,0.9872629,1.8473282,1.8473282,1.0,0.9875282,1.0,0.9894624,0.0152672,0.0381679,84.7328244,84.7328244,0.0381679
3,0.0330579,0.9844326,1.8473282,1.8473282,1.0,0.9864781,1.0,0.9883433,0.0229008,0.0610687,84.7328244,84.7328244,0.0610687
4,0.0413223,0.9807498,1.8473282,1.8473282,1.0,0.9820315,1.0,0.9870809,0.0152672,0.0763359,84.7328244,84.7328244,0.0763359
5,0.053719,0.9795764,1.8473282,1.8473282,1.0,0.9798495,1.0,0.9854121,0.0229008,0.0992366,84.7328244,84.7328244,0.0992366
6,0.1033058,0.9714745,1.8473282,1.8473282,1.0,0.9749477,1.0,0.9803892,0.0916031,0.1908397,84.7328244,84.7328244,0.1908397
7,0.1528926,0.957519,1.6933842,1.7974005,0.9166667,0.9649424,0.972973,0.9753794,0.0839695,0.2748092,69.3384224,79.7400454,0.2658002
8,0.2024793,0.9272907,1.8473282,1.8096277,1.0,0.9419918,0.9795918,0.9672028,0.0916031,0.3664122,84.7328244,80.9627668,0.3574032
9,0.3016529,0.8479324,1.7703562,1.7967165,0.9583333,0.8868186,0.9726027,0.9407751,0.1755725,0.5419847,77.0356234,79.6716512,0.5239667
10,0.4008264,0.7201695,1.6164122,1.7521051,0.875,0.7956696,0.9484536,0.9048727,0.1603053,0.7022901,61.6412214,75.2105139,0.657245

Unnamed: 0,0,1,Error,Rate
0,21.0,6.0,0.2222,(6.0/27.0)
1,5.0,29.0,0.1471,(5.0/34.0)
Total,26.0,35.0,0.1803,(11.0/61.0)

metric,threshold,value,idx
max f1,0.5257733,0.8405797,34.0
max f2,0.1316284,0.923913,47.0
max f0point5,0.7048687,0.8561644,27.0
max accuracy,0.5538145,0.8196721,32.0
max precision,0.9868347,1.0,0.0
max recall,0.1316284,1.0,47.0
max specificity,0.9868347,1.0,0.0
max absolute_mcc,0.5538145,0.6362683,32.0
max min_per_class_accuracy,0.5538145,0.8148148,32.0
max mean_per_class_accuracy,0.5538145,0.8191721,32.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0163934,0.9863723,1.7941176,1.7941176,1.0,0.9868347,1.0,0.9868347,0.0294118,0.0294118,79.4117647,79.4117647,0.0294118
2,0.0327869,0.9841105,1.7941176,1.7941176,1.0,0.9860641,1.0,0.9864494,0.0294118,0.0588235,79.4117647,79.4117647,0.0588235
3,0.0327869,0.9782498,0.0,1.7941176,0.0,0.0,1.0,0.9864494,0.0,0.0588235,-100.0,79.4117647,0.0588235
4,0.0491803,0.9753241,1.7941176,1.7941176,1.0,0.9762962,1.0,0.983065,0.0294118,0.0882353,79.4117647,79.4117647,0.0882353
5,0.0655738,0.973866,1.7941176,1.7941176,1.0,0.973866,1.0,0.9807653,0.0294118,0.1176471,79.4117647,79.4117647,0.1176471
6,0.1147541,0.9626427,1.7941176,1.7941176,1.0,0.9658952,1.0,0.9743924,0.0882353,0.2058824,79.4117647,79.4117647,0.2058824
7,0.1639344,0.947624,1.7941176,1.7941176,1.0,0.9517198,1.0,0.9675906,0.0882353,0.2941176,79.4117647,79.4117647,0.2941176
8,0.2131148,0.887206,1.7941176,1.7941176,1.0,0.9044521,1.0,0.9530202,0.0882353,0.3823529,79.4117647,79.4117647,0.3823529
9,0.3114754,0.8109767,1.495098,1.6996904,0.8333333,0.8417008,0.9473684,0.9178667,0.1470588,0.5294118,49.5098039,69.9690402,0.4923747
10,0.4098361,0.7324206,1.1960784,1.5788235,0.6666667,0.7694858,0.88,0.8822553,0.1176471,0.6470588,19.6078431,57.8823529,0.5359477

Unnamed: 0,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error,validation_rmse,validation_logloss,validation_auc,validation_pr_auc,validation_lift,validation_classification_error
,2022-11-03 14:29:14,0.006 sec,0.0,0.5,0.6931472,0.5,0.5413223,1.0,0.4586777,0.5,0.6931472,0.5,0.557377,1.0,0.442623
,2022-11-03 14:29:14,0.039 sec,5.0,0.3481772,0.4057613,0.9355959,0.9368337,1.8473282,0.1198347,0.3865748,0.4660341,0.8839869,0.9116546,1.7941176,0.2131148
,2022-11-03 14:29:14,0.076 sec,10.0,0.312216,0.3303244,0.9490406,0.9531206,1.8473282,0.1115702,0.3690259,0.4186221,0.8910675,0.9190842,1.7941176,0.1967213
,2022-11-03 14:29:14,0.137 sec,15.0,0.3066,0.3117292,0.9482842,0.9526932,1.8473282,0.1198347,0.3673681,0.4099944,0.8959695,0.9161245,1.7941176,0.1803279
,2022-11-03 14:29:14,0.179 sec,20.0,0.3024315,0.3032442,0.9486968,0.9513245,1.8473282,0.1115702,0.3660201,0.4076762,0.8932462,0.9195245,1.7941176,0.1639344
,2022-11-03 14:29:14,0.246 sec,25.0,0.2985549,0.2955313,0.9475965,0.9490853,1.8473282,0.107438,0.3624162,0.3922638,0.8997821,0.9270086,1.7941176,0.1639344
,2022-11-03 14:29:14,0.305 sec,30.0,0.2942441,0.2893842,0.9517227,0.9535362,1.8473282,0.107438,0.3612231,0.3932875,0.9041394,0.9286637,1.7941176,0.1639344
,2022-11-03 14:29:14,0.359 sec,35.0,0.2924203,0.2864209,0.9521353,0.954376,1.8473282,0.107438,0.3680348,0.4050875,0.8943355,0.9212871,1.7941176,0.1803279

variable,relative_importance,scaled_importance,percentage
cp,74.8160248,1.0,0.2675664
caa,58.9933167,0.7885118,0.2109792
thall,35.7895927,0.478368,0.1279952
oldpeak,30.5465527,0.4082889,0.1092444
age,26.000042,0.3475197,0.0929846
thalachh,16.8727722,0.2255235,0.0603425
trtbps,9.9449205,0.132925,0.0355663
sex,8.3162766,0.1111564,0.0297417
restecg,6.4270611,0.0859049,0.0229853
exng,5.5999351,0.0748494,0.0200272


In [66]:
aml.leaderboard

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
XGBoost_2_AutoML_1_20221103_142911,0.894336,0.405088,0.921287,0.184641,0.368035,0.13545
GLM_1_AutoML_1_20221103_142911,0.889978,0.42894,0.908451,0.192266,0.371694,0.138156
DRF_1_AutoML_1_20221103_142911,0.870915,0.451553,0.895501,0.17756,0.386607,0.149465
XGBoost_1_AutoML_1_20221103_142911,0.868736,0.44982,0.895406,0.199891,0.388207,0.150704
GBM_1_AutoML_1_20221103_142911,0.5,0.687068,0.557377,0.5,0.496956,0.246966


In [67]:
pred = aml.leader.predict(h2o_test_df)

xgboost prediction progress: |███████████████████████████████████████████████████| (done) 100%


In [68]:
aml.leader.model_performance(h2o_test_df)

Unnamed: 0,0,1,Error,Rate
0,21.0,6.0,0.2222,(6.0/27.0)
1,5.0,29.0,0.1471,(5.0/34.0)
Total,26.0,35.0,0.1803,(11.0/61.0)

metric,threshold,value,idx
max f1,0.5257733,0.8405797,34.0
max f2,0.1316284,0.923913,47.0
max f0point5,0.7048687,0.8561644,27.0
max accuracy,0.5538145,0.8196721,32.0
max precision,0.9868347,1.0,0.0
max recall,0.1316284,1.0,47.0
max specificity,0.9868347,1.0,0.0
max absolute_mcc,0.5538145,0.6362683,32.0
max min_per_class_accuracy,0.5538145,0.8148148,32.0
max mean_per_class_accuracy,0.5538145,0.8191721,32.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0163934,0.9863723,1.7941176,1.7941176,1.0,0.9868347,1.0,0.9868347,0.0294118,0.0294118,79.4117647,79.4117647,0.0294118
2,0.0327869,0.9841105,1.7941176,1.7941176,1.0,0.9860641,1.0,0.9864494,0.0294118,0.0588235,79.4117647,79.4117647,0.0588235
3,0.0327869,0.9782498,0.0,1.7941176,0.0,0.0,1.0,0.9864494,0.0,0.0588235,-100.0,79.4117647,0.0588235
4,0.0491803,0.9753241,1.7941176,1.7941176,1.0,0.9762962,1.0,0.983065,0.0294118,0.0882353,79.4117647,79.4117647,0.0882353
5,0.0655738,0.973866,1.7941176,1.7941176,1.0,0.973866,1.0,0.9807653,0.0294118,0.1176471,79.4117647,79.4117647,0.1176471
6,0.1147541,0.9626427,1.7941176,1.7941176,1.0,0.9658952,1.0,0.9743924,0.0882353,0.2058824,79.4117647,79.4117647,0.2058824
7,0.1639344,0.947624,1.7941176,1.7941176,1.0,0.9517198,1.0,0.9675906,0.0882353,0.2941176,79.4117647,79.4117647,0.2941176
8,0.2131148,0.887206,1.7941176,1.7941176,1.0,0.9044521,1.0,0.9530202,0.0882353,0.3823529,79.4117647,79.4117647,0.3823529
9,0.3114754,0.8109767,1.495098,1.6996904,0.8333333,0.8417008,0.9473684,0.9178667,0.1470588,0.5294118,49.5098039,69.9690402,0.4923747
10,0.4098361,0.7324206,1.1960784,1.5788235,0.6666667,0.7694858,0.88,0.8822553,0.1176471,0.6470588,19.6078431,57.8823529,0.5359477


In [71]:
model_ids = list(aml.leaderboard['model_id'].as_data_frame().iloc[:,0])
model_ids

['XGBoost_2_AutoML_1_20221103_142911',
 'GLM_1_AutoML_1_20221103_142911',
 'DRF_1_AutoML_1_20221103_142911',
 'XGBoost_1_AutoML_1_20221103_142911',
 'GBM_1_AutoML_1_20221103_142911']

In [73]:
out = h2o.get_model([mid for mid in model_ids if "XGBoost" in mid][0])
out


Unnamed: 0,number_of_trees
,35.0

Unnamed: 0,0,1,Error,Rate
0,89.0,22.0,0.1982,(22.0/111.0)
1,4.0,127.0,0.0305,(4.0/131.0)
Total,93.0,149.0,0.1074,(26.0/242.0)

metric,threshold,value,idx
max f1,0.4130957,0.9071429,145.0
max f2,0.3825178,0.9453471,149.0
max f0point5,0.634675,0.9137056,111.0
max accuracy,0.4130957,0.892562,145.0
max precision,0.9917115,1.0,0.0
max recall,0.0661617,1.0,188.0
max specificity,0.9917115,1.0,0.0
max absolute_mcc,0.4130957,0.7900736,145.0
max min_per_class_accuracy,0.5533786,0.8828829,125.0
max mean_per_class_accuracy,0.5655742,0.8893817,122.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0123967,0.9883184,1.8473282,1.8473282,1.0,0.9907518,1.0,0.9907518,0.0229008,0.0229008,84.7328244,84.7328244,0.0229008
2,0.0206612,0.9872629,1.8473282,1.8473282,1.0,0.9875282,1.0,0.9894624,0.0152672,0.0381679,84.7328244,84.7328244,0.0381679
3,0.0330579,0.9844326,1.8473282,1.8473282,1.0,0.9864781,1.0,0.9883433,0.0229008,0.0610687,84.7328244,84.7328244,0.0610687
4,0.0413223,0.9807498,1.8473282,1.8473282,1.0,0.9820315,1.0,0.9870809,0.0152672,0.0763359,84.7328244,84.7328244,0.0763359
5,0.053719,0.9795764,1.8473282,1.8473282,1.0,0.9798495,1.0,0.9854121,0.0229008,0.0992366,84.7328244,84.7328244,0.0992366
6,0.1033058,0.9714745,1.8473282,1.8473282,1.0,0.9749477,1.0,0.9803892,0.0916031,0.1908397,84.7328244,84.7328244,0.1908397
7,0.1528926,0.957519,1.6933842,1.7974005,0.9166667,0.9649424,0.972973,0.9753794,0.0839695,0.2748092,69.3384224,79.7400454,0.2658002
8,0.2024793,0.9272907,1.8473282,1.8096277,1.0,0.9419918,0.9795918,0.9672028,0.0916031,0.3664122,84.7328244,80.9627668,0.3574032
9,0.3016529,0.8479324,1.7703562,1.7967165,0.9583333,0.8868186,0.9726027,0.9407751,0.1755725,0.5419847,77.0356234,79.6716512,0.5239667
10,0.4008264,0.7201695,1.6164122,1.7521051,0.875,0.7956696,0.9484536,0.9048727,0.1603053,0.7022901,61.6412214,75.2105139,0.657245

Unnamed: 0,0,1,Error,Rate
0,21.0,6.0,0.2222,(6.0/27.0)
1,5.0,29.0,0.1471,(5.0/34.0)
Total,26.0,35.0,0.1803,(11.0/61.0)

metric,threshold,value,idx
max f1,0.5257733,0.8405797,34.0
max f2,0.1316284,0.923913,47.0
max f0point5,0.7048687,0.8561644,27.0
max accuracy,0.5538145,0.8196721,32.0
max precision,0.9868347,1.0,0.0
max recall,0.1316284,1.0,47.0
max specificity,0.9868347,1.0,0.0
max absolute_mcc,0.5538145,0.6362683,32.0
max min_per_class_accuracy,0.5538145,0.8148148,32.0
max mean_per_class_accuracy,0.5538145,0.8191721,32.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0163934,0.9863723,1.7941176,1.7941176,1.0,0.9868347,1.0,0.9868347,0.0294118,0.0294118,79.4117647,79.4117647,0.0294118
2,0.0327869,0.9841105,1.7941176,1.7941176,1.0,0.9860641,1.0,0.9864494,0.0294118,0.0588235,79.4117647,79.4117647,0.0588235
3,0.0327869,0.9782498,0.0,1.7941176,0.0,0.0,1.0,0.9864494,0.0,0.0588235,-100.0,79.4117647,0.0588235
4,0.0491803,0.9753241,1.7941176,1.7941176,1.0,0.9762962,1.0,0.983065,0.0294118,0.0882353,79.4117647,79.4117647,0.0882353
5,0.0655738,0.973866,1.7941176,1.7941176,1.0,0.973866,1.0,0.9807653,0.0294118,0.1176471,79.4117647,79.4117647,0.1176471
6,0.1147541,0.9626427,1.7941176,1.7941176,1.0,0.9658952,1.0,0.9743924,0.0882353,0.2058824,79.4117647,79.4117647,0.2058824
7,0.1639344,0.947624,1.7941176,1.7941176,1.0,0.9517198,1.0,0.9675906,0.0882353,0.2941176,79.4117647,79.4117647,0.2941176
8,0.2131148,0.887206,1.7941176,1.7941176,1.0,0.9044521,1.0,0.9530202,0.0882353,0.3823529,79.4117647,79.4117647,0.3823529
9,0.3114754,0.8109767,1.495098,1.6996904,0.8333333,0.8417008,0.9473684,0.9178667,0.1470588,0.5294118,49.5098039,69.9690402,0.4923747
10,0.4098361,0.7324206,1.1960784,1.5788235,0.6666667,0.7694858,0.88,0.8822553,0.1176471,0.6470588,19.6078431,57.8823529,0.5359477

Unnamed: 0,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error,validation_rmse,validation_logloss,validation_auc,validation_pr_auc,validation_lift,validation_classification_error
,2022-11-03 14:29:14,0.006 sec,0.0,0.5,0.6931472,0.5,0.5413223,1.0,0.4586777,0.5,0.6931472,0.5,0.557377,1.0,0.442623
,2022-11-03 14:29:14,0.039 sec,5.0,0.3481772,0.4057613,0.9355959,0.9368337,1.8473282,0.1198347,0.3865748,0.4660341,0.8839869,0.9116546,1.7941176,0.2131148
,2022-11-03 14:29:14,0.076 sec,10.0,0.312216,0.3303244,0.9490406,0.9531206,1.8473282,0.1115702,0.3690259,0.4186221,0.8910675,0.9190842,1.7941176,0.1967213
,2022-11-03 14:29:14,0.137 sec,15.0,0.3066,0.3117292,0.9482842,0.9526932,1.8473282,0.1198347,0.3673681,0.4099944,0.8959695,0.9161245,1.7941176,0.1803279
,2022-11-03 14:29:14,0.179 sec,20.0,0.3024315,0.3032442,0.9486968,0.9513245,1.8473282,0.1115702,0.3660201,0.4076762,0.8932462,0.9195245,1.7941176,0.1639344
,2022-11-03 14:29:14,0.246 sec,25.0,0.2985549,0.2955313,0.9475965,0.9490853,1.8473282,0.107438,0.3624162,0.3922638,0.8997821,0.9270086,1.7941176,0.1639344
,2022-11-03 14:29:14,0.305 sec,30.0,0.2942441,0.2893842,0.9517227,0.9535362,1.8473282,0.107438,0.3612231,0.3932875,0.9041394,0.9286637,1.7941176,0.1639344
,2022-11-03 14:29:14,0.359 sec,35.0,0.2924203,0.2864209,0.9521353,0.954376,1.8473282,0.107438,0.3680348,0.4050875,0.8943355,0.9212871,1.7941176,0.1803279

variable,relative_importance,scaled_importance,percentage
cp,74.8160248,1.0,0.2675664
caa,58.9933167,0.7885118,0.2109792
thall,35.7895927,0.478368,0.1279952
oldpeak,30.5465527,0.4082889,0.1092444
age,26.000042,0.3475197,0.0929846
thalachh,16.8727722,0.2255235,0.0603425
trtbps,9.9449205,0.132925,0.0355663
sex,8.3162766,0.1111564,0.0297417
restecg,6.4270611,0.0859049,0.0229853
exng,5.5999351,0.0748494,0.0200272


In [74]:
out.convert_H2OXGBoostParams_2_XGBoostParams()

({'silent': True,
  'eta': 0.3,
  'scale_pos_weight': 0.84732825,
  'objective': 'binary:logistic',
  'colsample_bylevel': 0.8,
  'nthread': 2,
  'seed': 13,
  'min_child_weight': 5.0,
  'max_depth': 10,
  'colsample_bytree': 0.8,
  'lambda': 1.0,
  'gamma': 0.0,
  'alpha': 0.0,
  'booster': 'gbtree',
  'grow_policy': 'depthwise',
  'nround': 10000,
  'subsample': 0.6,
  'max_delta_step': 0.0,
  'tree_method': 'exact'},
 10000)

In [75]:
from sklearn.metrics import classification_report, confusion_matrix

In [77]:
xgb_clf = XGBClassifier()

In [90]:
xgb_tuners_new = {
  'eta': [0.3],
  'scale_pos_weight':[0.84732825],
  'objective': ['binary:logistic'],
  'colsample_bylevel': [0.8],
  'nthread': [2],
  'seed': [13],
  'min_child_weight': [5.0],
  'max_depth': [10],
  'colsample_bytree': [0.8],
  'lambda':[1.0],
  'gamma': [0.0],
  'alpha': [0.0],
  'booster': ['gbtree'],
  'grow_policy':['depthwise'],
  'nround': [10000],
  'subsample': [0.6],
  'max_delta_step': [0.0],
  'tree_method': ['exact']}

In [92]:
X = df.drop('output',axis=1)
y=df['output']

In [93]:
best_score_, best_params_ = model_tuner(xgb_clf, xgb_tuners_new, X, y)
print("Best: %f using %s" % (best_score_, best_params_))

Best: 0.840789 using {'alpha': 0.0, 'booster': 'gbtree', 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.3, 'gamma': 0.0, 'grow_policy': 'depthwise', 'lambda': 1.0, 'max_delta_step': 0.0, 'max_depth': 10, 'min_child_weight': 5.0, 'nround': 10000, 'nthread': 2, 'objective': 'binary:logistic', 'scale_pos_weight': 0.84732825, 'seed': 13, 'subsample': 0.6, 'tree_method': 'exact'}
