<a id=000> </a>
# <center style='color:blue;background:yellow'> Hyperparameters Optimization with Bayesian Optimization </center>
Author:  [**Dayal Chand Aichara**](https://www.linkedin.com/in/dcaichara/) <div style="text-align:left">Date: 16-08-2019 </div>
***  
There are three steps to optimize hyperparameters with bayesian optimization.
1. Define Objective Function  <br>
    Write a function which has model and its output which has be maximized or minimized. 
2. Define Parameters Search Space (Range) <br>
    Write a domain space of parameters in which parameters have to be optimized.
3. Define Bayesian Optimization and Optimize.   <br>
    Put model function and domain space in Bayesian Optimization function and optimize parameters.


***
### Install <span style='color:blue;background:orange;font-family:romon;font-size:25px'>[**bayesian-optimization**](https://github.com/fmfn/BayesianOptimization) </span> python package via pip.
`pip install bayesian-optimization`

##  Notebook Content
1. [Import Libraries](#0)
1. [Data](#1)
1. [Simple Example](#2)
1. [LGBM](#3)
1. [CatBoost](#4)
1. [XGBoost](#5)

***

<a id=0> </a>
##  <span style='color:red;background:gray'>1. </span> <span style='color:blue;background:orange'>Import Libraries </span>

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb
import catboost as cgb
import xgboost as xgb
from bayes_opt import BayesianOptimization
from sklearn.datasets import load_boston
from sklearn.metrics import r2_score

import warnings
warnings.filterwarnings('ignore')


<a id=1> </a>
##  <span style='color:red;background:gray'>2. </span> <span style='color:blue;background:orange'>Data </span>

In [None]:
boston=load_boston()
X =pd.DataFrame(boston.data,columns=boston.feature_names)
y = boston.target

In [5]:
df= X
df['Price'] = y
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,Price
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [6]:
X.shape

(506, 14)

### Data has 14 columns in which 13 are features and <span style='background:blue;color:gray'>**last column is Price**</span>. 

<a id=2> </a>
##  <span style='color:red;background:gray'>3. </span> <span style='color:blue;background:orange'>Simple Example </span>

In [None]:
# Define objective function
def simple_fx(x, y, z ):
    return -x ** 2 - (y - 1) -z** 2 + 1

In [None]:
# Search Space
pds = {'x': (1, 4), 'y': (-3, 3), 'z': (1,6)}

In [9]:
# optimization function and optimization
optimizer = BayesianOptimization(f=simple_fx,
                                 pbounds=pds,
                                 random_state=1)
optimizer.maximize(init_points=3,n_iter=10)

|   iter    |  target   |     x     |     y     |     z     |
-------------------------------------------------------------
| [0m 1       [0m | [0m-5.39    [0m | [0m 2.251   [0m | [0m 1.322   [0m | [0m 1.001   [0m |
| [95m 2       [0m | [95m-1.654   [0m | [95m 1.907   [0m | [95m-2.119   [0m | [95m 1.462   [0m |
| [0m 3       [0m | [0m-8.406   [0m | [0m 1.559   [0m | [0m-0.9266  [0m | [0m 2.984   [0m |
| [0m 4       [0m | [0m-11.46   [0m | [0m 2.593   [0m | [0m 0.8625  [0m | [0m 2.424   [0m |
| [0m 5       [0m | [0m-32.77   [0m | [0m 3.861   [0m | [0m 0.963   [0m | [0m 4.348   [0m |
| [0m 6       [0m | [0m-2.077   [0m | [0m 1.857   [0m | [0m-1.913   [0m | [0m 1.594   [0m |
| [95m 7       [0m | [95m-1.186   [0m | [95m 1.729   [0m | [95m-2.236   [0m | [95m 1.559   [0m |
| [95m 8       [0m | [95m 0.02719 [0m | [95m 1.519   [0m | [95m-2.223   [0m | [95m 1.375   [0m |
| [95m 9       [0m | [95m 1.4     [0m | 

In [10]:
# Check best results
optimizer.max

{'params': {'x': 1.0, 'y': -2.9903855634307375, 'z': 1.0},
 'target': 2.9903855634307375}

In [11]:
#Get search history
optimizer.res

[{'params': {'x': 2.251066014107722,
   'y': 1.3219469606529488,
   'z': 1.0005718740867244},
  'target': -5.390389235737196},
 {'params': {'x': 1.9069977178955193,
   'y': -2.119464655097322,
   'z': 1.461692973843989},
  'target': -1.653721990746281},
 {'params': {'x': 1.5587806341330128,
   'y': -0.9266356377417138,
   'z': 2.9838373711533497},
  'target': -8.406446885097736},
 {'params': {'x': 2.593123135025164,
   'y': 0.8625228467524133,
   'z': 2.423543047259528},
  'target': -11.460371342075145},
 {'params': {'x': 3.8608951135676843,
   'y': 0.9630016393643439,
   'z': 4.347945130143197},
  'target': -32.77413957207111},
 {'params': {'x': 1.8573822765338168,
   'y': -1.9128784857090562,
   'z': 1.5938529307268505},
  'target': -2.0773576002594583},
 {'params': {'x': 1.7294837840933757,
   'y': -2.236381944166488,
   'z': 1.5593432461249863},
  'target': -1.1862835745110636},
 {'params': {'x': 1.5186870970142565,
   'y': -2.222855440130697,
   'z': 1.3745031039706659},
  'target

<a id=3> </a>
##  <span style='color:red;background:gray'>4. </span> <span style='color:blue;background:orange'>LightGBM </span>

In [None]:
dtrain = lgb.Dataset(data=X, label=y)



In [None]:
def lgb_r2_score(preds, dtrain):
    labels = dtrain.get_label()
    return 'r2', r2_score(labels, preds), True

In [None]:
# Objective Function
def hyp_lgbm(num_leaves, feature_fraction, bagging_fraction, max_depth, min_split_gain, min_child_weight):
      
        params = {'application':'regression','num_iterations': 200,
                  'learning_rate':0.05, 'early_stopping_round':50,
                  'metric':'lgb_r2_score'} # Default parameters
        params["num_leaves"] = int(round(num_leaves))
        params['feature_fraction'] = max(min(feature_fraction, 1), 0)
        params['bagging_fraction'] = max(min(bagging_fraction, 1), 0)
        params['max_depth'] = int(round(max_depth))
        params['min_split_gain'] = min_split_gain
        params['min_child_weight'] = min_child_weight
        cv_results = lgb.cv(params, dtrain, nfold=5, seed=101,categorical_feature=[], stratified=False,
                            verbose_eval =None, feval=lgb_r2_score)
        # print(cv_results)
        return np.max(cv_results['r2-mean'])

In [None]:
# Domain space-- Range of hyperparameters 
pds = {'num_leaves': (80, 100),
          'feature_fraction': (0.1, 0.9),
          'bagging_fraction': (0.8, 1),
          'max_depth': (17, 25),
          'min_split_gain': (0.001, 0.1),
          'min_child_weight': (10, 25)
          }

In [16]:
# Surrogate model
optimizer = BayesianOptimization(hyp_lgbm, pds, random_state=77)
                                  
# Optimize
optimizer.maximize(init_points=5, n_iter=15)

|   iter    |  target   | baggin... | featur... | max_depth | min_ch... | min_sp... | num_le... |
-------------------------------------------------------------------------------------------------
| [0m 1       [0m | [0m 0.9766  [0m | [0m 0.9838  [0m | [0m 0.6138  [0m | [0m 23.03   [0m | [0m 12.09   [0m | [0m 0.009645[0m | [0m 95.76   [0m |
| [0m 2       [0m | [0m 0.976   [0m | [0m 0.8652  [0m | [0m 0.5329  [0m | [0m 18.92   [0m | [0m 18.18   [0m | [0m 0.04065 [0m | [0m 94.3    [0m |
| [0m 3       [0m | [0m 0.976   [0m | [0m 0.9673  [0m | [0m 0.5708  [0m | [0m 19.37   [0m | [0m 14.22   [0m | [0m 0.07085 [0m | [0m 88.45   [0m |
| [95m 4       [0m | [95m 0.9848  [0m | [95m 0.8115  [0m | [95m 0.6976  [0m | [95m 20.62   [0m | [95m 12.64   [0m | [95m 0.005888[0m | [95m 85.85   [0m |
| [95m 5       [0m | [95m 0.9848  [0m | [95m 0.8134  [0m | [95m 0.7009  [0m | [95m 17.51   [0m | [95m 16.48   [0m | [95m 0.03705 [0m |

In [17]:
optimizer.max

{'params': {'bagging_fraction': 0.8,
  'feature_fraction': 0.9,
  'max_depth': 25.0,
  'min_child_weight': 18.437809423883323,
  'min_split_gain': 0.004544331379525425,
  'num_leaves': 89.5835303879813},
 'target': 0.9921383303951028}

In [18]:
def bayesion_opt_lgbm(X=X, y=y, init_iter=3, n_iters=7, random_state=11, seed = 101, num_iterations = 200):
  dtrain = lgb.Dataset(data=X, label=y)
  def lgb_r2_score(preds, dtrain):
      labels = dtrain.get_label()
      return 'r2', r2_score(labels, preds), True
  # Objective Function
  def hyp_lgbm(num_leaves, feature_fraction, bagging_fraction, max_depth, min_split_gain, min_child_weight):
        
          params = {'application':'regression','num_iterations': num_iterations,
                    'learning_rate':0.05, 'early_stopping_round':50,
                    'metric':'lgb_r2_score'} # Default parameters
          params["num_leaves"] = int(round(num_leaves))
          params['feature_fraction'] = max(min(feature_fraction, 1), 0)
          params['bagging_fraction'] = max(min(bagging_fraction, 1), 0)
          params['max_depth'] = int(round(max_depth))
          params['min_split_gain'] = min_split_gain
          params['min_child_weight'] = min_child_weight
          cv_results = lgb.cv(params, dtrain, nfold=5, seed=seed,categorical_feature=[], stratified=False,
                              verbose_eval =None, feval=lgb_r2_score)
          # print(cv_results)
          return np.max(cv_results['r2-mean'])
  # Domain space-- Range of hyperparameters 
  pds = {'num_leaves': (50, 70),
            'feature_fraction': (0.1, 0.9),
            'bagging_fraction': (0.8, 1),
            'max_depth': (13, 23),
            'min_split_gain': (0.001, 0.1),
            'min_child_weight': (10, 25)
            }

  # Surrogate model
  optimizer = BayesianOptimization(hyp_lgbm, pds, random_state=random_state)
                                    
  # Optimize
  optimizer.maximize(init_points=init_iter, n_iter=n_iters)

bayesion_opt_lgbm(X=X, y=y, init_iter=5, n_iters=15, random_state=717, seed = 1011, num_iterations = 300)

|   iter    |  target   | baggin... | featur... | max_depth | min_ch... | min_sp... | num_le... |
-------------------------------------------------------------------------------------------------
| [0m 1       [0m | [0m 0.9853  [0m | [0m 0.9304  [0m | [0m 0.6844  [0m | [0m 14.93   [0m | [0m 17.8    [0m | [0m 0.06796 [0m | [0m 66.09   [0m |
| [95m 2       [0m | [95m 0.9917  [0m | [95m 0.9067  [0m | [95m 0.8264  [0m | [95m 17.43   [0m | [95m 13.84   [0m | [95m 0.02859 [0m | [95m 50.87   [0m |
| [0m 3       [0m | [0m 0.9862  [0m | [0m 0.8938  [0m | [0m 0.8533  [0m | [0m 19.0    [0m | [0m 23.03   [0m | [0m 0.03538 [0m | [0m 60.4    [0m |
| [0m 4       [0m | [0m 0.9769  [0m | [0m 0.8164  [0m | [0m 0.5826  [0m | [0m 13.68   [0m | [0m 16.29   [0m | [0m 0.05168 [0m | [0m 52.17   [0m |
| [0m 5       [0m | [0m 0.9742  [0m | [0m 0.9952  [0m | [0m 0.5607  [0m | [0m 19.89   [0m | [0m 23.38   [0m | [0m 0.09741 [0m | [0m 5

<a id=4> </a>
##  <span style='color:red;background:gray'>5. </span> <span style='color:blue;background:orange'>CatBoost </span>

In [None]:
cat_features = []

cv_dataset = cgb.Pool(data=X,
                  label=y,
                  cat_features=cat_features)

In [None]:
def hyp_cat(depth, bagging_temperature):
    params = {"iterations": 300,
              "learning_rate": 0.05,
              "eval_metric": "R2",
              "verbose": False}
    params[ "depth"] = int(round(depth))
    params["bagging_temperature"] = bagging_temperature

    scores = cgb.cv(cv_dataset,
                params,
                fold_count=3)
    return np.max(scores['test-R2-mean'])

In [None]:
pds = {'depth': (6, 10),
          'bagging_temperature': (1,5),
          }

In [23]:
# Surrogate model
optimizer = BayesianOptimization(hyp_cat, pds, random_state=100)
                                  
# Optimize
optimizer.maximize(init_points=3, n_iter=7)

|   iter    |  target   | baggin... |   depth   |
-------------------------------------------------
| [0m 1       [0m | [0m 0.9481  [0m | [0m 3.174   [0m | [0m 7.113   [0m |
| [0m 2       [0m | [0m 0.9275  [0m | [0m 2.698   [0m | [0m 9.379   [0m |
| [0m 3       [0m | [0m 0.936   [0m | [0m 1.019   [0m | [0m 6.486   [0m |
| [95m 4       [0m | [95m 0.9592  [0m | [95m 5.0     [0m | [95m 6.0     [0m |
| [0m 5       [0m | [0m 0.9571  [0m | [0m 5.0     [0m | [0m 7.292   [0m |
| [0m 6       [0m | [0m 0.9592  [0m | [0m 5.0     [0m | [0m 6.0     [0m |
| [95m 7       [0m | [95m 0.9598  [0m | [95m 4.991   [0m | [95m 6.017   [0m |
| [0m 8       [0m | [0m 0.9598  [0m | [0m 4.992   [0m | [0m 6.009   [0m |
| [0m 9       [0m | [0m 0.9596  [0m | [0m 4.996   [0m | [0m 6.004   [0m |
| [0m 10      [0m | [0m 0.9589  [0m | [0m 4.998   [0m | [0m 6.038   [0m |


In [24]:
optimizer.max

{'params': {'bagging_temperature': 4.991446336204385,
  'depth': 6.0172335892654525},
 'target': 0.9598253840405738}

<a id=5> </a>
##  <span style='color:red;background:gray'>6. </span> <span style='color:blue;background:orange'>XGBoost </span>

In [None]:
dtrain = xgb.DMatrix(X, y, feature_names=X.columns.values)
def xgb_r2(preds, dtrain):
    labels = dtrain.get_label()
    return 'r2', r2_score(preds, labels)

In [None]:
def hyp_xgb(max_depth, subsample, colsample_bytree,min_child_weight, gamma ):
    params = {
    'n_estimators': 300,
    'eta': 0.05,
    'objective': 'reg:linear',
    'eval_metric':'mae',
    'silent': 1
     }
    params['max_depth'] = int(round(max_depth))
    params['subsample'] = max(min(subsample, 1), 0)
    params['colsample_bytree'] = max(min(colsample_bytree, 1), 0)
    params['min_child_weight'] = int(min_child_weight)
    params['gamma'] = max(gamma, 0)
    scores = xgb.cv(params, dtrain, num_boost_round=1000,verbose_eval=False, early_stopping_rounds=5, feval=xgb_r2, maximize=True, nfold=5)
    return  scores['test-r2-mean'].iloc[-1]

In [None]:
pds ={
  'min_child_weight':(14, 20),
  'gamma':(0, 5),
  'subsample':(0.5, 1),
  'colsample_bytree':(0.1, 1),
  'max_depth': (6, 10)
}

In [28]:
# Surrogate model
optimizer = BayesianOptimization(hyp_xgb, pds, random_state=103)
                                  
# Optimize
optimizer.maximize(init_points=5, n_iter=15)

|   iter    |  target   | colsam... |   gamma   | max_depth | min_ch... | subsample |
-------------------------------------------------------------------------------------
| [0m 1       [0m | [0m 0.9732  [0m | [0m 0.4889  [0m | [0m 0.8711  [0m | [0m 6.684   [0m | [0m 18.97   [0m | [0m 0.7936  [0m |
| [95m 2       [0m | [95m 0.9774  [0m | [95m 0.5134  [0m | [95m 4.113   [0m | [95m 9.286   [0m | [95m 15.84   [0m | [95m 0.6004  [0m |
| [0m 3       [0m | [0m 0.97    [0m | [0m 0.4629  [0m | [0m 4.739   [0m | [0m 8.71    [0m | [0m 17.65   [0m | [0m 0.8362  [0m |
| [0m 4       [0m | [0m 0.8031  [0m | [0m 0.1072  [0m | [0m 1.683   [0m | [0m 7.429   [0m | [0m 16.92   [0m | [0m 0.9391  [0m |
| [95m 5       [0m | [95m 0.9909  [0m | [95m 0.7794  [0m | [95m 3.152   [0m | [95m 7.506   [0m | [95m 17.54   [0m | [95m 0.8687  [0m |
| [95m 6       [0m | [95m 0.9919  [0m | [95m 1.0     [0m | [95m 5.0     [0m | [95m 6.0     [0m

In [29]:
optimizer.max

{'params': {'colsample_bytree': 1.0,
  'gamma': 2.830706438799185,
  'max_depth': 8.201814553535812,
  'min_child_weight': 14.0,
  'subsample': 1.0},
 'target': 0.9989032}

***
## References: 
1. https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f <br>

2. https://towardsdatascience.com/an-introductory-example-of-bayesian-optimization-in-python-with-hyperopt-aae40fff4ffo <br>

3. https://medium.com/spikelab/hyperparameter-optimization-using-bayesian-optimization-f1f393dcd36d  <br>

4. https://www.kaggle.com/omarito/xgboost-bayesianoptimization <br>

5. https://github.com/fmfn/BayesianOptimization

***



<center><span style='color:red;background:pink;font-size:40px'>End of the Notebook </span> </center>

***

### <center style='color:blue;background:yellow'>  [GOT TO TOP](#000) </center>