# Hyperparameter Tuning


## Grid Search CV

Grid search runs all possible combinations of hyperparameters onto a model and validate the results using cross-validation. It is not always feasable given the time constraints, but with it you can always modify the granularity of the parameter space. For example, we can use consequtive powers of 10, or less for more grainilarity.

Grid search does not guarentee optimization of the defined estimator. It only guarentees an optimization of the defined parameter grid. Also note, if `refit` parameter is set to true (as default) it will refit to the entire training set once the optimized parameters are found. This is generally a good idea because the model would only be otherwise fit according to cross validated parameters. Adding more data is like to improve the overall performance of the model.

In [13]:
from sklearn.model_selection import GridSearchCV
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor


dataset = datasets.load_diabetes()
X, y = dataset.data, dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

param_grid = [
    {'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
    {'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]

forest_reg = RandomForestRegressor()
grid_search = GridSearchCV(forest_reg, param_grid, cv=5, scoring='neg_mean_squared_error', return_train_score=True)
grid_search.fit(X_train, y_train)



GridSearchCV(cv=5, error_score='raise-deprecating',
       estimator=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid=[{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]}, {'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring='neg_mean_squared_error', verbose=0)

### Grid Search Results

* `best_params_` - the selection of parameters defined from the parameter grid
* `best_estimator_` - including defaults

In [14]:
print(grid_search.best_params_, end='\n\n')
print(grid_search.best_estimator_)

{'max_features': 4, 'n_estimators': 30}

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features=4, max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           n_estimators=30, n_jobs=None, oob_score=False,
           random_state=None, verbose=0, warm_start=False)


Grid search also comes along with many measured attributes that were accumulated in the search process within `cv_results_`. We can use this as the datasource into understanding how the estimated parameters changed over time.

In [17]:
import pandas as pd


pd.DataFrame(grid_search.cv_results_).head(3)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_max_features,param_n_estimators,param_bootstrap,params,split0_test_score,split1_test_score,...,mean_test_score,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,split3_train_score,split4_train_score,mean_train_score,std_train_score
0,0.002772,0.000782,0.000644,0.00018,2,3,,"{'max_features': 2, 'n_estimators': 3}",-4208.587037,-5282.414313,...,-4703.358859,400.680922,16,-1177.05838,-1219.181435,-1422.753399,-1239.068917,-1173.38631,-1246.289688,91.688947
1,0.009617,0.002998,0.00108,0.000192,2,10,,"{'max_features': 2, 'n_estimators': 10}",-3322.445833,-4795.341864,...,-3880.621622,554.87123,10,-659.387203,-585.610886,-825.062025,-688.465612,-691.871013,-690.079348,77.565204
2,0.017478,0.000388,0.001491,7.2e-05,2,30,,"{'max_features': 2, 'n_estimators': 30}",-3310.73263,-3959.101864,...,-3464.835533,264.065922,2,-509.478884,-484.545312,-575.314759,-570.353821,-537.190717,-535.376699,34.860479


For example, we can individually look at the mean score and parameters as a result of each cross validation.

In [10]:
import numpy as np


cvres = grid_search.cv_results_
for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
    print(np.sqrt(-mean_score), params)

69.86038737169817 {'max_features': 2, 'n_estimators': 3}
61.407704585275226 {'max_features': 2, 'n_estimators': 10}
59.71343089145715 {'max_features': 2, 'n_estimators': 30}
61.85193438193171 {'max_features': 4, 'n_estimators': 3}
62.45619248505091 {'max_features': 4, 'n_estimators': 10}
59.21863299666748 {'max_features': 4, 'n_estimators': 30}
65.76869902751439 {'max_features': 6, 'n_estimators': 3}
62.141164491215676 {'max_features': 6, 'n_estimators': 10}
59.6739232904195 {'max_features': 6, 'n_estimators': 30}
69.38288142066625 {'max_features': 8, 'n_estimators': 3}
60.997227728320084 {'max_features': 8, 'n_estimators': 10}
59.27140624574677 {'max_features': 8, 'n_estimators': 30}
70.19185955042897 {'bootstrap': False, 'max_features': 2, 'n_estimators': 3}
61.489026157826224 {'bootstrap': False, 'max_features': 2, 'n_estimators': 10}
66.77991226170997 {'bootstrap': False, 'max_features': 3, 'n_estimators': 3}
61.848008718409304 {'bootstrap': False, 'max_features': 3, 'n_estimators'

### Grid Search Extensions

In addition to fine tuning parameters library-defined hyper parameters, we can also use to exhaustively search customly defined transformers. These tranformers can include anything, so the search space as a result can be entirely built from our imaginations. Options can include:

* Optimized and exhaustive feature selection.
* Handle outliers.

## Randomized Search

Grid Seach exhaustive meaning it may take a while to run depending the sparsity of the search space, size of the data, and the estimator involved.

In comes `RandomizedSearch`. With randomized search, the number of iterations are set. Then within each iteration a random hyper parameter is selected from the parameter grid. For example, with 1000 iterations we can randomly select 1000 sets of hyperparameter combinations. This way, we can both progressively see and chosen how much computation resource are we willing to trade off for a better result.

Mathematically, we can think that as the number of iterations approach infinity `RandomizedSearch` the more it becomes like `GridSearch` in a probabilistic sense. We can achieve the some computational complexity by iterating through the parameter space recursively and then stop once we set a time out, but the idea of doing a random comparison is that it increases the odds finding something more prominant. Take for instance the number of estimators as a hyper parameter. There is a greater likelihood that in general the difference between the score of a `estimators=10` and `estimators=11` is smaller than that of `estimators=10` and `estimators=not(11 or 9)`.