# Bayesian Optimization of Hyperparameter Tuning With scikit-opitmize

Two popular libraries for Bayesian Optimization include `Scikit-Optimize` and `HyperOpt`. In machine learning, these libraries are often used to tune the hyperparameters of algorithms.

## Prepare the data and define the model

In [1]:
# example of bayesian optimization with scikit-optimize
from numpy import mean
from sklearn.datasets import make_blobs
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from skopt.space import Integer
from skopt.utils import use_named_args
from skopt import gp_minimize

# generate 2d classification dataset
X, y = make_blobs(n_samples=500, centers=3, n_features=2)
# define the model
model = KNeighborsClassifier()

## Define the search space

In this case, we will tune the number of neighbors (n_neighbors) and the shape of the neighborhood function (p). This requires ranges be defined for a given data type. In this case, they are Integers, defined with the min, max, and the name of the parameter to the scikit-learn model. 

In [2]:
# define the space of hyperparameters to search
search_space = [Integer(1, 5, name='n_neighbors'), Integer(1, 2, name='p')]

## define a function to evaluate a given set of hyperparameters 

We want to minimize this function, therefore smaller values returned must indicate a better performing model.

We can use the `use_named_args()` decorator from the scikit-optimize project on the function definition that allows the function to be called directly with a specific set of parameters from the search space.

As such, our custom function will take the hyperparameter values as arguments, which can be provided to the model directly in order to configure it. We can define these arguments generically in python using the `**params` argument to the function, then pass them to the model via the `set_params(**)` function.

Now that the model is configured, we can evaluate it. In this case, we will use 5-fold cross-validation on our dataset and evaluate the accuracy for each fold. We can then report the performance of the model as one minus the mean accuracy across these folds. This means that a perfect model with an accuracy of 1.0 will return a value of 0.0 (1.0 – mean accuracy).

In [3]:
# define the function used to evaluate a given configuration
@use_named_args(search_space)
def evaluate_model(**params):
    # something
    model.set_params(**params)
    # calculate 5-fold cross validation
    result = cross_val_score(model, X, y, cv=5, n_jobs=-1, scoring='accuracy')
    # calculate the mean of the scores
    estimate = mean(result)
    return 1.0 - estimate

## Perform the optimization

This is achieved by calling the `gp_minimize()` function with the name of the `objective function` and the defined `search space`.

By default, this function will use a ‘gp_hedge‘ acquisition function that tries to figure out the best strategy, but this can be configured via the `acq_func` argument. The optimization will also run for 100 iterations by default, but this can be controlled via the `n_calls` argument.

In [4]:
# perform optimization
result = gp_minimize(evaluate_model, search_space)



Once run, we can access the best score via the “fun” property and the best set of hyperparameters via the “x” array property.

In [5]:
# summarizing finding:
print('Best Accuracy: %.3f' % (1.0 - result.fun))
print('Best Parameters: n_neighbors=%d, p=%d' % (result.x[0], result.x[1]))

Best Accuracy: 1.000
Best Parameters: n_neighbors=5, p=1


## References

- [How to Implement Bayesian Optimization from Scratch in Python](https://machinelearningmastery.com/what-is-bayesian-optimization/)