# Hyperparameter tuning for XGBoost
- Grid Search 
    - exhaustive search , evaluates every combinations of hyperparameters for mlmodel 
    - longer time to run when there are a lot of hyper parameteres 
- Random Search 
    - picks fixed set of hyperparameters combinations randomly 
    - sometimes the randomly selected may not include the top performance 
- Bayesian Optimization 
    - gridsearchcv and random search runs on each combination but bayesian optimization is based on previous runs evaluation 

- source 
    - https://colab.research.google.com/drive/18ooFZ4e7cW_zpbvwhBzzhWxCze0Mi6LA#scrollTo=1-FxiavJMirS 

# imports 

In [1]:
from sklearn import datasets 
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, precision_recall_fscore_support as score 

# hyperparameters tuning 
from sklearn.model_selection import GridSearchCV, StratifiedKFold, cross_val_score, RandomizedSearchCV
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials, space_eval

# mlflow
import mlflow 

# dataset creation

In [2]:
data=datasets.load_breast_cancer()
df=pd.DataFrame(data=data.data, columns=data.feature_names)
df['target']=data.target
print(df.info())
df.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [3]:
# test train split
xtrain,xtest,ytrain,ytest=train_test_split(df.drop('target', axis=1), df['target'], test_size=0.2, random_state=42)
xtrain.shape, xtest.shape, ytrain.shape, ytest.shape

((455, 30), (114, 30), (455,), (114,))

In [4]:
# Standardaize the data
sc=StandardScaler()
xtrain_transformed=pd.DataFrame(sc.fit_transform(xtrain),index=xtrain.index,columns=xtrain.columns)
xtest_transformed=pd.DataFrame(sc.transform(xtest),index=xtest.index,columns=xtest.columns)
# summary after standardization
xtrain_transformed.describe().T 

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
mean radius,455.0,-1.811494e-15,1.001101,-1.819583,-0.683093,-0.231498,0.459343,3.961679
mean texture,455.0,-3.373126e-15,1.001101,-2.2235,-0.707536,-0.118516,0.563199,4.715674
mean perimeter,455.0,-3.634699e-15,1.001101,-1.809497,-0.690761,-0.242938,0.48848,3.976811
mean area,455.0,-2.537653e-16,1.001101,-1.365036,-0.660205,-0.289597,0.319339,5.208312
mean smoothness,455.0,-4.232024e-15,1.001101,-3.100011,-0.713204,-0.08082,0.633173,4.864642
mean compactness,455.0,1.011157e-15,1.001101,-1.607228,-0.777087,-0.24134,0.528128,3.964311
mean concavity,455.0,9.857804e-16,1.001101,-1.119899,-0.750539,-0.344646,0.547387,4.256736
mean concave points,455.0,5.817081e-16,1.001101,-1.26991,-0.734905,-0.391123,0.673757,4.022271
mean symmetry,455.0,-5.910779e-15,1.001101,-2.34543,-0.701046,-0.069151,0.535429,4.476124
mean fractal dimension,455.0,-3.36727e-15,1.001101,-1.776889,-0.709792,-0.177285,0.464223,4.815921


In [5]:
# summary before standardization
xtrain.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
mean radius,455.0,14.117635,3.535815,7.691,11.705,13.3,15.74,28.11
mean texture,455.0,19.185033,4.266005,9.71,16.17,18.68,21.585,39.28
mean perimeter,455.0,91.882242,24.322027,47.92,75.1,85.98,103.75,188.5
mean area,455.0,654.377582,354.943187,170.4,420.3,551.7,767.6,2501.0
mean smoothness,455.0,0.095744,0.013923,0.05263,0.085825,0.09462,0.10455,0.1634
mean compactness,455.0,0.103619,0.05247,0.01938,0.06289,0.09097,0.1313,0.3114
mean concavity,455.0,0.088898,0.079468,0.0,0.02932,0.06154,0.13235,0.4268
mean concave points,455.0,0.04828,0.03806,0.0,0.02034,0.03341,0.073895,0.2012
mean symmetry,455.0,0.181099,0.027487,0.1167,0.16185,0.1792,0.1958,0.304
mean fractal dimension,455.0,0.062757,0.00721,0.04996,0.057645,0.06148,0.0661,0.09744


In [6]:
# params 
params={
    'base_score': 0.5,
 'booster': 'gbtree',
 'colsample_bylevel': 1,
 'colsample_bynode': 1,
 'colsample_bytree': 1,
 'gamma': 0,
 'learning_rate': 0.1,
 'max_delta_step': 0,
 'max_depth': 3,
 'min_child_weight': 1,
 'missing': None,
 'n_estimators': 100,
 'n_jobs': 1,
 'nthread': None,
 'objective': 'binary:logistic',
 'random_state': 0,
 'reg_alpha': 0,
 'reg_lambda': 1,
 'scale_pos_weight': 1,
 'seed': None,
 'silent': None,
 'subsample': 1,
 'verbosity': 1
}

In [7]:
# Step 5 XGBoost Classifier wityh no Hyperparameter tuning 
from  basic import mlflow_utils

experimentid=mlflow_utils.create_mlflow_experiment(experiment_name='XGBoost_with_no_parameters',artifact_location='plain_xgboost',tags={'Xgboost':'no_parameters'})


with mlflow.start_run(run_name='no_parameter_run') as run:
    xgboost=XGBClassifier()
    xgboost=XGBClassifier(seed=0).fit(xtrain_transformed,ytrain)
    xgboost_prediction=xgboost.predict(xtest_transformed)
    # prediction probability 
    xboost_prediction_prob=xgboost.predict_proba(xtest_transformed)[:-1]
    precision,recall,fscore,support=score(ytest,xgboost_prediction)

    #  log it on mlflow
    mlflow.log_params(xgboost.get_params())
    mlflow.log_metrics({'precision':precision[0],'recall':recall[0],'fscore':fscore[0],'support':support[0]})
    mlflow.xgboost.log_model(xgboost,f'{run.info.run_id}-xgboostmodel')
    print(precision,recall,fscore,support)


Experiment XGBoost_with_no_parameters already exists




[0.95238095 0.95833333] [0.93023256 0.97183099] [0.94117647 0.96503497] [43 71]


# using grid search for XGBoost


In [28]:
from mlflow.models.signature import infer_signature
# define the search grid 
param_grid={
    'colsample_bytree':[0.3,0.5,0.8],
    'reg_alpha':[0,0.5,1,5],
    'reg_lambda':[0,0.5,1,5]
}
#set up score
scoring=['recall']

#Set yp the k-fold cross-validation
kfold=StratifiedKFold(n_splits=3,shuffle=True,random_state=0)

# define grid search 
grid_search=GridSearchCV(
    estimator=xgboost,
    param_grid=param_grid,
    scoring=scoring,
    refit='recall',
    #n_jobs=-1,
    cv=kfold,
    verbose=0
)

#fit grid search
experimentid=mlflow_utils.create_mlflow_experiment(experiment_name='XGBoost_with_no_parameters',artifact_location='plain_xgboost',tags={'Xgboost':'no_parameters'})

with mlflow.start_run(run_name='grid_search_cv') as run:
    grid_results=grid_search.fit(xtrain_transformed,ytrain)
    grid_predict=grid_search.predict(xtest_transformed)

    precision,recall,fscore,support=score(ytest,grid_predict)
    
    #log mlflow results 
    mlflow.log_params(grid_results.best_params_)
    mlflow.log_metrics({'best_score':grid_results.best_score_})
    #mlflow.log_metrics(grid_results.best_params_)
    mlflow.log_metrics({'precision':precision[1],'recall':recall[1],'fscore':fscore[1],'support':support[1]})
    model_signature=infer_signature(xtrain,ytrain,params={'model_name':'model1'})
    mlflow.sklearn.log_model(
        sk_model=grid_results,
        artifact_path='best_model_signature',
        signature=model_signature
    )



Experiment XGBoost_with_no_parameters already exists




In [11]:
# Random Search for XGBoost

GridSearchCV(cv=StratifiedKFold(n_splits=3, random_state=0, shuffle=True),
             estimator=XGBClassifier(base_score=None, booster=None,
                                     callbacks=None, colsample_bylevel=None,
                                     colsample_bynode=None,
                                     colsample_bytree=None, device=None,
                                     early_stopping_rounds=None,
                                     enable_categorical=False, eval_metric=None,
                                     feature_types=None, gamma=None,
                                     grow_policy=None, importance_type...
                                     max_cat_to_onehot=None,
                                     max_delta_step=None, max_depth=None,
                                     max_leaves=None, min_child_weight=None,
                                     missing=nan, monotone_constraints=None,
                                     multi_strategy=None, n_estimator

# using random search

In [30]:
param_grid={
    'learning_rate':[0.0001,0.001,0.01,0.1,1],
    'max_depth':range(3,21,3),
    'gamma':[i/10.0 for i in range(0,5)],
    'colsample_bytree':[i/10.0 for i in range(3,10)],
    'reg_alpha':[1e-5,1e-2,0.1,1,10,100],
    'reg_lambda':[1e-5,1e-2,0.1,1,10,100]
}
scoring=['recall']

kfold=StratifiedKFold(n_splits=3,shuffle=True,random_state=0)

In [31]:
random_searh=RandomizedSearchCV(
    estimator=xgboost,
    param_distributions=param_grid,
    n_iter=48,
    scoring=scoring,
    refit='recall',
    # n_jobs=-1,
    cv=kfold,
    verbose=0
    )

with mlflow.start_run(run_name='randomSearchCV') as run:
    random_result=random_searh.fit(xtrain_transformed,ytrain)
    random_predict=random_result.predict(xtest_transformed)
     #log mlflow results 
    mlflow.log_params(random_result.best_params_)
    mlflow.log_metrics({'best_score':random_result.best_score_})
    #mlflow.log_metrics(grid_results.best_params_)
    mlflow.log_metrics({'precision':precision[1],'recall':recall[1],'fscore':fscore[1],'support':support[1]})
    model_signature=infer_signature(xtrain,ytrain,params={'model_name':'model1'})
    mlflow.sklearn.log_model(
        sk_model=random_result,
        artifact_path='best_random_signature',
        signature=model_signature
    )




# Bayesian Optimization for XGBoost

In [58]:
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score
from typing import Dict, List, Optional 
from functools import partial

def get_classification_metrics(y_true:pd.Series,y_pred:pd.Series,prefix:str)->Dict[str,float]:
    """
    calculate classification metrics
    """
    accuracy=accuracy_score(y_true,y_pred)
    precision=precision_score(y_true,y_pred)
    recall=recall_score(y_true,y_pred)
    f1=f1_score(y_true,y_pred)
    return {
        f'{prefix}_accuracy':accuracy,
        f'{prefix}_precision':precision,
        f'{prefix}_recall':recall,
        f'{prefix}_f1':f1
    }

In [89]:
space={
    'learning_rate':hp.choice('learning_rate',[0.0001,0.001,0.01,0.1,1]),
    'max_depth':hp.choice('max_depth',range(3,21,3)),
    'gamma':hp.choice('gamma',[i/10.0 for i in range(0,5)]),
    'colsample_bytree':hp.choice('colsample_bytree',[i/10.0 for i in range(3,10)]),
    'reg_alpha':hp.choice('reg_alpha',[1e-5,1e-2,0.1,1,10,100]),
    'reg_lambda':hp.choice('reg_lambda',[1e-5,1e-2,0.1,1,10,100])
}

In [90]:
kfold=StratifiedKFold(n_splits=3,shuffle=True,random_state=0)

In [91]:
def objective(params,xtrain_transformed,ytrain,xtest_transformed,ytest):
    #xgboost.set_params(**params)
    with mlflow.start_run(nested=True) as run: 
        xgboost=XGBClassifier(seed=0,**params)
        xgboost.fit(xtrain_transformed,ytrain)
        ypred=xgboost.predict(xtest_transformed)

        metrics=get_classification_metrics(y_true=ytest,y_pred=ypred,prefix='test')
        mlflow.log_metrics(metrics)
        mlflow.log_params(params)
        mlflow.xgboost.log_model(xgboost,f'{run.info.run_id}-model')
        
    return -metrics['test_f1']

In [101]:

with mlflow.start_run(run_name='hyper parameter ') as run:
    trails=Trials()
    best=fmin(
    fn=partial(
        objective,
        xtrain_transformed=xtrain_transformed,
        ytrain=ytrain,
        xtest_transformed=xtest_transformed,
        ytest=ytest
    ),
    space=space,
    algo=tpe.suggest,
    max_evals=48,
    trials=trails
    )
    best_run=sorted(trails.results,key=lambda x:x['loss'])[0]
    print(best_run)
    best_xghoost=XGBClassifier(**best)
    best_xghoost.fit(xtrain_transformed,ytrain)
    ypred=best_xghoost.predict(xtest_transformed)
    metrics=get_classification_metrics(y_true=ytest,y_pred=ypred,prefix='best_test')
    mlflow.log_metrics(metrics)
    mlflow.log_params(best_run)
    mlflow.xgboost.log_model(xgboost,f'{run.info.run_id}-best_model',registered_model_name='best_model')


  0%|          | 0/48 [00:00<?, ?trial/s, best loss=?]





  2%|▏         | 1/48 [00:04<03:22,  4.30s/trial, best loss: -0.965034965034965]





  4%|▍         | 2/48 [00:08<03:03,  3.98s/trial, best loss: -0.965034965034965]





  6%|▋         | 3/48 [00:12<03:03,  4.09s/trial, best loss: -0.965034965034965]





  8%|▊         | 4/48 [00:16<03:05,  4.22s/trial, best loss: -0.965034965034965]





 10%|█         | 5/48 [00:21<03:10,  4.43s/trial, best loss: -0.965034965034965]





 12%|█▎        | 6/48 [00:26<03:10,  4.53s/trial, best loss: -0.965034965034965]





 15%|█▍        | 7/48 [00:31<03:11,  4.67s/trial, best loss: -0.965034965034965]





 17%|█▋        | 8/48 [00:35<03:06,  4.67s/trial, best loss: -0.9722222222222222]





 19%|█▉        | 9/48 [00:40<03:01,  4.65s/trial, best loss: -0.9722222222222222]





 21%|██        | 10/48 [00:44<02:54,  4.59s/trial, best loss: -0.9722222222222222]





 23%|██▎       | 11/48 [00:49<02:51,  4.63s/trial, best loss: -0.9722222222222222]





 25%|██▌       | 12/48 [00:54<02:49,  4.70s/trial, best loss: -0.9722222222222222]





 27%|██▋       | 13/48 [00:59<02:44,  4.70s/trial, best loss: -0.9722222222222222]





 29%|██▉       | 14/48 [01:04<02:43,  4.82s/trial, best loss: -0.9722222222222222]





 31%|███▏      | 15/48 [01:08<02:35,  4.73s/trial, best loss: -0.9722222222222222]





 33%|███▎      | 16/48 [01:13<02:30,  4.70s/trial, best loss: -0.9722222222222222]





 35%|███▌      | 17/48 [01:18<02:26,  4.72s/trial, best loss: -0.9722222222222222]





 38%|███▊      | 18/48 [01:22<02:21,  4.72s/trial, best loss: -0.9722222222222222]





 40%|███▉      | 19/48 [01:27<02:17,  4.73s/trial, best loss: -0.9722222222222222]





 42%|████▏     | 20/48 [01:32<02:14,  4.81s/trial, best loss: -0.9722222222222222]





 44%|████▍     | 21/48 [01:37<02:08,  4.75s/trial, best loss: -0.9722222222222222]





 46%|████▌     | 22/48 [01:41<02:03,  4.73s/trial, best loss: -0.9722222222222222]





 48%|████▊     | 23/48 [01:46<01:56,  4.68s/trial, best loss: -0.9722222222222222]





 50%|█████     | 24/48 [01:50<01:49,  4.58s/trial, best loss: -0.9722222222222222]





 52%|█████▏    | 25/48 [01:55<01:43,  4.51s/trial, best loss: -0.9722222222222222]





 54%|█████▍    | 26/48 [01:59<01:40,  4.57s/trial, best loss: -0.9722222222222222]





 56%|█████▋    | 27/48 [02:04<01:35,  4.56s/trial, best loss: -0.9722222222222222]





 58%|█████▊    | 28/48 [02:08<01:31,  4.56s/trial, best loss: -0.9722222222222222]





 60%|██████    | 29/48 [02:13<01:25,  4.52s/trial, best loss: -0.9722222222222222]





 62%|██████▎   | 30/48 [02:18<01:22,  4.60s/trial, best loss: -0.9722222222222222]





 65%|██████▍   | 31/48 [02:23<01:19,  4.66s/trial, best loss: -0.9722222222222222]





 67%|██████▋   | 32/48 [02:27<01:14,  4.68s/trial, best loss: -0.9722222222222222]





 69%|██████▉   | 33/48 [02:32<01:09,  4.64s/trial, best loss: -0.9722222222222222]





 71%|███████   | 34/48 [02:36<01:05,  4.65s/trial, best loss: -0.9722222222222222]





 73%|███████▎  | 35/48 [02:41<00:59,  4.56s/trial, best loss: -0.9722222222222222]





 75%|███████▌  | 36/48 [02:45<00:53,  4.47s/trial, best loss: -0.9722222222222222]





 77%|███████▋  | 37/48 [02:50<00:49,  4.53s/trial, best loss: -0.9722222222222222]





 79%|███████▉  | 38/48 [02:55<00:46,  4.63s/trial, best loss: -0.9722222222222222]





 81%|████████▏ | 39/48 [02:59<00:41,  4.63s/trial, best loss: -0.9722222222222222]





 83%|████████▎ | 40/48 [03:04<00:38,  4.80s/trial, best loss: -0.9722222222222222]





 85%|████████▌ | 41/48 [03:09<00:32,  4.64s/trial, best loss: -0.9722222222222222]





 88%|████████▊ | 42/48 [03:14<00:28,  4.70s/trial, best loss: -0.9722222222222222]





 90%|████████▉ | 43/48 [03:19<00:24,  4.83s/trial, best loss: -0.9722222222222222]





 92%|█████████▏| 44/48 [03:23<00:19,  4.81s/trial, best loss: -0.9722222222222222]





 94%|█████████▍| 45/48 [03:28<00:14,  4.73s/trial, best loss: -0.9722222222222222]





 96%|█████████▌| 46/48 [03:33<00:09,  4.70s/trial, best loss: -0.9722222222222222]





 98%|█████████▊| 47/48 [03:37<00:04,  4.56s/trial, best loss: -0.9722222222222222]





100%|██████████| 48/48 [03:42<00:00,  4.63s/trial, best loss: -0.9722222222222222]
{'loss': -0.9722222222222222, 'status': 'ok'}


Successfully registered model 'best_model'.
Created version '1' of model 'best_model'.


In [100]:
for i in range(3,10):
    print(i/10.0)

0.3
0.4
0.5
0.6
0.7
0.8
0.9


In [88]:
best_xghoost=XGBClassifier(**best)
best_xghoost.fit(xtest_transformed,ytest)

XGBoostError: value 2 for Parameter colsample_bytree exceed bound [0,1]
colsample_bytree: Subsample ratio of columns, resample on each tree construction.