<a href="https://colab.research.google.com/github/ShaunakSen/Data-Science-and-Machine-Learning/blob/master/Parameter_Tuning_with_Hyperopt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Parameter Tuning with Hyperopt

[tutorial link](https://districtdatalabs.silvrback.com/parameter-tuning-with-hyperopt)

There are two common methods of parameter tuning: grid search and random search. Each have their pros and cons. Grid search is slow but effective at searching the whole search space, while random search is fast, but could miss important points in the search space. Luckily, a third option exists: Bayesian optimization. In this post, we will focus on one implementation of Bayesian optimization, a Python module called hyperopt.

Using Bayesian optimization for parameter tuning allows us to obtain the best parameters for a given model, e.g., logistic regression. This also allows us to perform optimal model selection. Typically, a machine learning engineer or data scientist will perform some form of manual parameter tuning (grid search or random search) for a few models - like decision tree, support vector machine, and k nearest neighbors - then compare the accuracy scores and select the best one for use. This method has the possibility of comparing sub-optimal models. Maybe the data scientist found the optimal parameters for the decision tree, but missed the optimal parameters for SVM. This means their model comparison was flawed. K nearest neighbors may beat SVM every time if the SVM parameters are poorly tuned. Bayesian optimization allow the data scientist to find the best parameters for all models, and therefore compare the best models. This results in better model selection, because you are comparing the best k nearest neighbors to the best decision tree.

### Objective Functions - A Motivating Example

Suppose you have a function defined over some range, and you want to minimize it. That is, you want to find the input value that result in the lowest output value. The trivial example below finds the value of x that minimizes a linear function y(x) = x.

In [0]:
import hyperopt

In [5]:
best = hyperopt.fmin(
    fn=lambda x:x,
    space = hyperopt.hp.uniform('x', 0, 1),
    algo = hyperopt.tpe.suggest,
    max_evals = 100
)

print (best)

100%|██████████| 100/100 [00:00<00:00, 425.62it/s, best loss: 0.0005860873247447989]
{'x': 0.0005860873247447989}


Let's break this down.

The function fmin first takes a function to minimize, denoted fn, which we here specify with an anonymous function lambda x: x. This function could be any valid value-returning function, such as mean absolute error in regression.

The next parameter specifies the search space, and in this example it is the continuous range of numbers between 0 and 1, specified by hp.uniform('x', 0, 1). hp.uniform is a built-in hyperopt function that takes three parameters: the name, x, and the lower and upper bound of the range, 0 and 1.

The parameter algo takes a search algorithm, in this case tpe which stands for tree of Parzen estimators. This topic is beyond the scope of this blog post, but the mathochistic reader may peruse this for details. The algo parameter can also be set to hyperopt.random, but we do not cover that here as it is widely known search strategy. However, in a future post, we can.

Finally, we specify the maximum number of evaluations max_evals the fmin function will perform. This fmin function returns a python dictionary of values.

An example of the output for the function above is {'x': 0.000269455723739237}.

Here is the plot of the function. The red dot is the point we are trying to find.

![](https://silvrback.s3.amazonaws.com/uploads/403df379-e565-4f7f-81a4-911c8f77ae28/ex1_large.png)

#### More Complicated Examples
Here is a more complicated objective function: `lambda x: (x-1)**2`. This time we are trying to minimize a quadratic equation `y(x) = (x-1)**2`. So we alter the search space to include what we know to be the optimal value (x=1) plus some sub-optimal ranges on either side: hp.uniform('x', -2, 2).

In [10]:
best = hyperopt.fmin(
    fn = lambda x: (x-1)**2,
    space= hyperopt.hp.uniform('x', -5, 5),
    algo = hyperopt.tpe.suggest,
    max_evals = 1000
)

print (best)

100%|██████████| 1000/1000 [00:04<00:00, 214.87it/s, best loss: 2.448635558780595e-07]
{'x': 1.0004948368982585}


Here is the plot.

![](https://silvrback.s3.amazonaws.com/uploads/7499c5c5-5bd6-4bdb-8030-f82258f686ef/ex2_large.png)

Instead of minimizing an objective function, maybe we want to maximize it. To to this we need only return the negative of the function. For example, we could have a function `y(x) = -(x**2):`

![](https://silvrback.s3.amazonaws.com/uploads/04d36416-60fe-46a3-9cf4-7645b7059989/ex3_large.png)

How could we go about solving this? We just take the objective function `lambda x: -(x**2) and return the negative, giving lambda x: -1*-(x**2) or just lambda x: (x**2).`

Here is a function with many (infinitely many given an infinite range) local minima, which we are also trying to maximize:

![](https://silvrback.s3.amazonaws.com/uploads/661ff1fe-740b-4112-86dc-7f583184d143/ex5_large.png)
