# Hyperparameter Tuning
As hyper-parameters are not trainable parameters, there are different search-based algorithms for it. The basic idea is to create combination of parameter values and assess the relative performance. Some popular ones are:

1. Grid search (an exhaustive search in the parameter space)
2. Random search (a random search in the parameter space)
3. Bayesian Optimization
4. Evolutionary Algorithms 
5. Gradient-based optimization
6. Keras' Tuner
7. Population-based optimization
8. ParamILS 

[*Source here*](https://analyticsindiamag.com/top-8-approaches-for-tuning-hyperparameters-of-machine-learning-models/)

## #1 Grid search & #2 Random Search

[*Some theories*](https://medium.com/@cjl2fv/an-intro-to-hyper-parameter-optimization-using-grid-search-and-random-search-d73b9834ca0a)

In [3]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.ensemble import RandomForestRegressor
import numpy as np

In [5]:
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state=42)

**Our estimator and parameter space**: 

In [16]:
estimator = RandomForestRegressor(random_state=0) # we can reuse this object as the fit method returns a new estimator object with trained parameters.


### Grid search
Next we create a GridSearchCV object with does a grid-search with cross-validation.
Our regressor have two hyperparamenters: n_estimators, and max_features. We create a grid with discrete values on each dimension (hyperparameter).
We are using R<sup>2</sup> score which represents the proportion of variance of the output that has been explained by the model. More about R<sup>2</sup> [here](https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score).

In [14]:
param_grid = {
        'n_estimators': np.arange(5, 100, 5), 
        'max_features': np.arange(0.1, 1.0, 0.05)
    }

grid_search = GridSearchCV(
    estimator = estimator,
    param_grid = param_grid,
    cv = 5,
    scoring = 'r2',
    verbose = 1,
    n_jobs = -1 # use all processors
)
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 342 candidates, totalling 1710 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  56 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 656 tasks      | elapsed:    5.4s
[Parallel(n_jobs=-1)]: Done 1656 tasks      | elapsed:   14.3s
[Parallel(n_jobs=-1)]: Done 1710 out of 1710 | elapsed:   14.9s finished


GridSearchCV(cv=5, estimator=RandomForestRegressor(random_state=0), n_jobs=-1,
             param_grid={'max_features': array([0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 , 0.55, 0.6 ,
       0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95]),
                         'n_estimators': array([ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,
       90, 95])},
             scoring='r2', verbose=1)

In [11]:
# grid_search.cv_results_
print(grid_search.best_params_)
print(grid_search.best_score_)

{'max_features': 0.5000000000000001, 'n_estimators': 90}
0.4153440140065655


### Random Search
Here we create parameter distributions with a list which results in a uniform distribution

In [20]:
param_distributions = {
        'n_estimators': np.arange(5, 100, 5), 
        'max_features': np.arange(0.1, 1.0, 0.05)
    }
rand_search = RandomizedSearchCV(
    estimator = estimator,
    param_distributions = param_distributions,
    cv = 5,
    scoring = 'r2',
    verbose = 1,
    n_jobs = -1,
    n_iter=50, # we are going to sample 50 times only.
    random_state=0
)
rand_search.fit(X_train, y_train)

Fitting 5 folds for each of 50 candidates, totalling 250 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  56 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:    2.5s finished


RandomizedSearchCV(cv=5, estimator=RandomForestRegressor(random_state=0),
                   n_iter=50, n_jobs=-1,
                   param_distributions={'max_features': array([0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 , 0.55, 0.6 ,
       0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95]),
                                        'n_estimators': array([ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,
       90, 95])},
                   random_state=0, scoring='r2', verbose=1)