# BayesSearch

By Alberto Valdés 

**Mail 1:** anvaldes@uc.cl 

**Mail 2:** alberto.valdes.gonzalez.96@gmail.com

In this notebook we will talk about BayesSearch which is a useful tool to determine the do hyperparameters tuning and determine the best hyperparameters.

### How Bayesian Optimization Works

In Bayesian optimization, hyperparameter values, known as data points, are chosen randomly in the first iteration. Then there is a trade off between:

* **Active learning:** Choosing the point with the highest uncertainty in each iteration. This is also called exploitation.

* **Best objective function:** Choosing a point from a region that currently has the best results. This is also called exploration.

For example, if you are working on a maximization problem, for each iteration, the Bayesian optimization method runs the algorithm with several random hyperparameter values, and then either finds the point that achieves the maximal result (exploitation), or the point that has the highest uncertainty and thus the best potential to achieve an even better result (exploration).

There are several ways by which this method chooses, at each iteration, whether to go down the path of exploitation or exploration. Here are three common functions that can make this choice.

### 1. Upper Confidence Bound

With this function, the next selected point is the one with the highest upper confidence bound. Assuming a Gaussian process, this can be obtained as:

$ UCB(p) = \mu (p) + \kappa \cdot \sigma(p) $

$ \mu $ is the mean, $ \sigma $ is the standard deviation, and $ \kappa $ is an exploration parameter (larger values cause the function to perform more exploration).

### 2. Probability of Improvement (PI)

This function selects the next point with the highest potential for improvement, compared to the current maximum objective function, obtained from the previously evaluated points. The current maximum objective function is denoted fmax. This function also assumes a Gaussian process:

$ PI(p)= \Phi(μ(p) − f_{max} − \epsilon \cdot \sigma(p)) $

The variable $ \epsilon $ is used to trade off between exploration and exploitation. Larger values result in more exploration vs. exploitation.

### 3. Expected Improvement (EI)

This function attempts to quantify the degree of improvement achieved by new points. It picks the new point with the highest expected value. Again it assumes a Gaussian process:

$ EI(p) = ( \mu (p) − f_{max}) \cdot \Phi( \mu(p) − f_{max} − \epsilon \cdot \sigma(p)) + \sigma(p) \cdot \Phi( \mu(x) − f_{max} − \epsilon \cdot \sigma(p)) $

As in the previous function, $ \epsilon $ is used to trade off between exploration and exploitation.

### We import all the libraries

In [1]:
!pip install -q xgboost

In [2]:
import time
import warnings
import numpy as np
import pandas as pd
from sklearn import metrics
import matplotlib.pyplot as plt
from xgboost import XGBClassifier
from matplotlib.pyplot import figure
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.exceptions import DataConversionWarning
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier


from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import make_scorer

warnings.simplefilter("ignore")
warnings.filterwarnings(action='ignore', category=DataConversionWarning)



In [3]:
def compute_auc(y, y_pred):

    fpr, tpr, thresholds = metrics.roc_curve(y, y_pred, pos_label=1)

    return metrics.auc(fpr, tpr)

In [4]:
start = time.time()

### i. Load the dataset

This dataset is about default in credit cards.

In [5]:
df = pd.read_csv('creditcard.csv')

In [6]:
X_cols = [f'V{i}' for i in range(1, 28 + 1)]
X_cols = X_cols + ['Amount']

y_col = ['Class']

In [7]:
X = df[X_cols].copy()
y = df[y_col].copy()

In [8]:
X.isna().sum()

V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
dtype: int64

In [9]:
y.isna().sum()

Class    0
dtype: int64

In [10]:
round(y.value_counts(normalize = True)*100, 2)

Class
0        99.83
1         0.17
dtype: float64

In [11]:
y.value_counts()

Class
0        284315
1           492
dtype: int64

**Note:** This is a imbalanced problem.

### ii. Prepare the data

When we use the GridSearch method, this incorporate K-Folds which made unnecesary the creation of a validation set.

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 10)

In [13]:
mean_X = X.mean()
std_X = X.std()

In [14]:
X_train = (X_train - mean_X)/std_X
X_test = (X_test - mean_X)/std_X

### iii. Hyperparameter Tuning

In [15]:
from skopt.space import Real, Categorical, Integer

In [16]:
from skopt import BayesSearchCV

In [17]:
total_iter = 100
number_CV = 5

In [18]:
model = XGBClassifier()

In [19]:
search_space = {
    'n_estimators': Integer(1, 50),
    'max_depth': Integer(1, 5)
    }

In [20]:
fit_params = {
    'early_stopping_rounds': 10,
    'eval_set':[(X_train, y_train)],
    'verbose': False,
}

In [21]:
opt = BayesSearchCV(
    estimator = model,
    search_spaces = search_space,
    fit_params = fit_params,
    cv = number_CV,
    scoring = 'recall',
    random_state = 42,
    n_iter = total_iter,
    verbose = 0
)

In [22]:
opt = opt.fit(X_train, y_train)

In [23]:
report = pd.DataFrame(opt.cv_results_)

In [24]:
report = report[['param_n_estimators', 'param_max_depth', 'rank_test_score', 'mean_test_score']]

In [25]:
report = report.sort_values(by = ['rank_test_score'], ascending = True)

In [26]:
report

Unnamed: 0,param_n_estimators,param_max_depth,rank_test_score,mean_test_score
99,7,5,1,0.780483
97,7,5,1,0.780483
96,7,5,1,0.780483
95,7,5,1,0.780483
98,7,5,1,0.780483
...,...,...,...,...
11,1,3,96,0.712072
45,33,1,97,0.706358
13,1,1,98,0.675252
21,22,1,99,0.666640


In [27]:
n_est_opt = report.iloc[[0]]['param_n_estimators'].iloc[0]
max_depth_opt = report.iloc[[0]]['param_max_depth'].iloc[0]

In [28]:
print('N estimators:', n_est_opt)
print('Max Depth:', max_depth_opt)

N estimators: 7
Max Depth: 5


### iv. Definitive Train

In [29]:
clf = XGBClassifier(n_estimators = n_est_opt, max_depth = max_depth_opt, random_state = 10)

In [30]:
clf = clf.fit(X_train, y_train)

In [31]:
y_train_pred = clf.predict(X_train)
y_test_pred = clf.predict(X_test)

In [32]:
recall_train = recall_score(y_train, y_train_pred)
recall_test = recall_score(y_test, y_test_pred)

In [33]:
print('Recall Train:', round(recall_train*100, 2))
print('Recall Test:', round(recall_test*100, 2))

Recall Train: 82.91
Recall Test: 79.43


### v. Time of execution

In [34]:
end = time.time()

In [35]:
delta = (end - start)

hours = int(delta/3600)
mins = int((delta - hours*3600)/60)
segs = int(delta - hours*3600 - mins*60)
print(f'Execute this notebook take us {hours} hours, {mins} minutes and {segs} seconds.')

Execute this notebook take us 0 hours, 14 minutes and 0 seconds.
