<img src="../../imgs/CampQMIND_banner.png">

# Hyperparameter Tuning

Author: [Umur Gokalp](https://github.com/uGokalp)

Some of the parameters of the models are not learned. We call these parameters "hyperparameters". These hyperparameters include the regularization terms (l1, l2), C and the kernel in support-vector machines and most parameters of tree based models. 

Though sklearn's standard parameters are usually enough, you can expect to see an increase in accuracy by 1-5% in most cases.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Hyperparameter-Tuning" data-toc-modified-id="Hyperparameter-Tuning-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Hyperparameter Tuning</a></span><ul class="toc-item"><li><span><a href="#The-General-Idea" data-toc-modified-id="The-General-Idea-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>The General Idea</a></span></li><li><span><a href="#A-Primer-of-Cross-Validation" data-toc-modified-id="A-Primer-of-Cross-Validation-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>A Primer of Cross Validation</a></span></li></ul></li><li><span><a href="#Sklearn-GridSearchCV" data-toc-modified-id="Sklearn-GridSearchCV-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Sklearn GridSearchCV</a></span><ul class="toc-item"><li><span><a href="#Parameter-Grid-for-GridSearchCV" data-toc-modified-id="Parameter-Grid-for-GridSearchCV-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Parameter Grid for GridSearchCV</a></span></li><li><span><a href="#Making-a-scorer" data-toc-modified-id="Making-a-scorer-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Making a scorer</a></span></li></ul></li><li><span><a href="#GridSearchCV-in-Action" data-toc-modified-id="GridSearchCV-in-Action-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>GridSearchCV in Action</a></span></li><li><span><a href="#Sklearn-RandomizedSearchCV" data-toc-modified-id="Sklearn-RandomizedSearchCV-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Sklearn RandomizedSearchCV</a></span></li><li><span><a href="#RandomizedSearchCV-in-action" data-toc-modified-id="RandomizedSearchCV-in-action-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>RandomizedSearchCV in action</a></span></li><li><span><a href="#Resources" data-toc-modified-id="Resources-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Resources</a></span></li></ul></div>

## The General Idea

You have:
- A model
- A loss function to optimize
- A set of parameters to optimize

You need:
- A parameter space

You can:
- Run an exhaustive search on all possible combinations of parameter values and test against the loss function
- Run a random search on some combinations of parameter values and test against the loss function


It's common to cross-validate the results of each model produced from the grid search. A set of parameters chosen strictly on one part of the dataset can cause a model to overfit.

## A Primer of Cross Validation

![](https://garthtarr.github.io/avfs/lectures/imgs/k_fold_cv.jpg)

# Sklearn GridSearchCV

```python
from sklearn.model_selection import GridSearchCV

GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, cv=None)
```
__estimator:__ A sklearn estimator/model that has the methods fit and predict

__param_grid:__ A dictionary of parameters

__scoring:__ Either a string such as 'f1', 'roc_auc' or a scorer function

__cv:__ Either an ```int``` specifying the number of folds for cross validation or a specific cross validation function such as StratifiedKfoldSplit

__n_jobs:__ The number of CPU cores you are willing use. -1 is for all available CPUs




more on scoring strings: https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter



## Parameter Grid for GridSearchCV

For an exhaustive search, explicitly define which values the model should evaluate. The keys must share the same name as the arguments to the model.

In [16]:
{
    'C': [0.1,1, 10, 100],
    'gamma': [1,0.1,0.01,0.001],
    'kernel': ['rbf', 'poly', 'sigmoid']
}

{'C': [0.1, 1, 10, 100],
 'gamma': [1, 0.1, 0.01, 0.001],
 'kernel': ['rbf', 'poly', 'sigmoid']}

## Making a scorer

```python
from sklearn.metrics import make_scorer

make_scorer(score_func, greater_is_better=True, needs_proba=False)
```

The arguments:
- score_func: a function that takes y and the prediction as arguments
- greater_is_better: Helpful boolean to distinguish between minimizing and maximizing
- needs_proba: Helpful boolean to define functions that use probabilities. E.g. negative logarithmic loss


# GridSearchCV in Action

In [22]:
from seaborn import load_dataset
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.svm import SVC
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

In [23]:
params = {
    'C': [0.1,100],
    'gamma': [1,0.1,0.001],
    'kernel': ['rbf', 'poly','linear'],
    'degree': [3, 5 ]
}

In [24]:
data = load_dataset('titanic').drop(["alive","adult_male","who","class",'embarked'],axis=1)
hot_cols = ['pclass','sex','sibsp','parch','deck','embark_town','alone']
df = pd.get_dummies(data,columns=hot_cols)
df.fillna(df.median(),inplace=True)
df.head()

Unnamed: 0,survived,age,fare,pclass_1,pclass_2,pclass_3,sex_female,sex_male,sibsp_0,sibsp_1,...,deck_C,deck_D,deck_E,deck_F,deck_G,embark_town_Cherbourg,embark_town_Queenstown,embark_town_Southampton,alone_False,alone_True
0,0,22.0,7.25,0,0,1,0,1,0,1,...,0,0,0,0,0,0,0,1,1,0
1,1,38.0,71.2833,1,0,0,1,0,0,1,...,1,0,0,0,0,1,0,0,1,0
2,1,26.0,7.925,0,0,1,1,0,1,0,...,0,0,0,0,0,0,0,1,0,1
3,1,35.0,53.1,1,0,0,1,0,0,1,...,1,0,0,0,0,0,0,1,1,0
4,0,35.0,8.05,0,0,1,0,1,1,0,...,0,0,0,0,0,0,0,1,0,1


In [25]:
# Note you should grid search on your validation set and evaluate the result of grid search on a holdout set.
# In total you would need to divide your dataset in to at least 3 (Train, Validation, Holdout).
grid = GridSearchCV(estimator=SVC(max_iter=100000),param_grid=params,cv=StratifiedKFold(3), verbose =2, n_jobs=-1)
# What I'm doing here is for illustrative purposes only , this is not standard practice.
grid.fit(df.drop("survived",axis=1), df.survived) 
print(f"Best params are {grid.best_params_}")
print(f"Best score is {grid.best_score_}")

Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    3.6s


Best params are {'C': 100, 'degree': 3, 'gamma': 0.001, 'kernel': 'rbf'}
Best score is 0.7912457912457912


[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed:    8.5s finished


# Sklearn RandomizedSearchCV

```python
from sklearn.model_selection import RandomizedSearchCV
import scipy.stats as stats
```

The differences between GridSearchCV and RandomomizedSearchCV are the greater parameters to be explored and the n_iter parameter decides the number of parameters sampled.

The underlying dictionary of parameters must provide a rvs method, where the rvs method samples a random variable from the distribution. The objects with rvs methods are implemented in the stats module of scipy.

For more information, see [Scipy.stats.](https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html)


In [29]:
# An illustiration of scipy stats rvs
from scipy import stats
[stats.uniform(0,scale=100).rvs() for i in range(10)]
# Instead of testing for C = [0.1,1, 10, 100] explicitly we sample rvs

[1.4991422914433672,
 3.728731101947025,
 97.38051860253182,
 43.420810866103146,
 37.61719889196103,
 75.80703272587459,
 84.34600913314632,
 66.63380694655658,
 52.106879406071094,
 60.422935551230005]

# RandomizedSearchCV in action

In [30]:
from sklearn.model_selection import RandomizedSearchCV


In [37]:
random_params = {
    'C': stats.uniform(0,100),
    'gamma': stats.uniform(0.1,1),
    'kernel': ['rbf', 'poly','linear'],
    'degree': [3,5]
    }

In [38]:
random_search = RandomizedSearchCV(SVC(max_iter=100000),random_params, random_state=15, verbose =2, n_jobs=-1)
# What I'm doing here is for illustrative purposes only , this is not standard practice.
search = random_search.fit(df.drop("survived",axis=1), df.survived)
print(f"Best params are {search.best_params_}")
print(f"Best score is {search.best_score_}")

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    3.3s


Best params are {'C': 70.59166434906898, 'degree': 3, 'gamma': 0.1394223117574981, 'kernel': 'rbf'}
Best score is 0.7127298976837613


[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:    4.2s finished


# Resources

- Simple medium post on hyperparameter tuning https://towardsdatascience.com/hyperparameter-tuning-c5619e7e6624

- A deep dive into hyperparameter tuning. https://www.kaggle.com/willkoehrsen/intro-to-model-tuning-grid-and-random-search

- A more interesting approach to hyperparameter tuning. https://www.kaggle.com/willkoehrsen/automated-model-tuning