## Hyperparameter Tuning
Parameters which define the model architecture are referred to as hyperparameters. As a result, the process of searching for the ideal model architecture is referred to as hyperparameter tuning. 
Building an accurate machine learning model depends heavily on choosing the right hyperparameters for training. For example, typical hyperameters for SVM include C, kernel, and gamma. It is important to note that hyperparameters are not model parameters and they cannot be directly trained from the data. Model parameters are learned during training when we optimize a loss function using something like gradient descent

This modules gives practical introduction to using some common methodologies for automated hyperparameter tuning in Python using Scikit Learn. These techniques include Grid Search, Random Search & advanced optimization methodologies including Bayesian and Genetic algorithms. 

These methods can be used to optimize any parameter that is provided when constructing a model. 
To find the names and current values of all parameters for a given model in sklearn, on can simply use:

model.get_params()

It is common that a small subset of parameters can have a large impact on the predictive or computation performance of the model while others can be left to their default values. 


## Grid Search 
The grid search method which is provided by GridSearchCV exhaustively generates candidates from a grid of parameter values specified with the param_grid. It then searches for all the possible combonation of parameters and retain the best combination. For example as specified below, the grid search method will explore all possible combinations of  the RBF kernel, with the cross-product of C values ranging in [1, 10, 100, 1000] and gamma values in [0.001, 0.0001].

In [1]:
param_grid = [
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['linear']},
 ]

In [2]:
from sklearn.model_selection import GridSearchCV

In [10]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.svm import SVC

dataset = datasets.load_breast_cancer()

X = dataset.data
Y = dataset.target

# split the dataset in training set and testing set using train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state = 0)


# define parameter range 
para_grid = {
    'C': [1, 10, 100, 1000], 'gamma':[0.001, 0.0001], 'kernel': ['linear', 'rbf']
}


# define the grid model
grid = GridSearchCV(SVC(), para_grid, verbose=0, cv=5)

# fitting the model for grid search 
grid.fit(X_train, y_train) 

# print best parameter after tuning 
print('The best parameters are:', grid.best_params_) 

The best parameters are: {'C': 1, 'gamma': 0.001, 'kernel': 'linear'}


In [11]:
# results of hyperparameter optimization
cv_results = pd.DataFrame(grid.cv_results_)
cv_results

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_gamma,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.970332,0.657182,0.00071,7.4e-05,1,0.001,linear,"{'C': 1, 'gamma': 0.001, 'kernel': 'linear'}",0.989011,0.967033,0.912088,0.967033,0.967033,0.96044,0.025631,1
1,0.011639,0.001181,0.003982,0.000456,1,0.001,rbf,"{'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}",0.967033,0.89011,0.879121,0.934066,0.923077,0.918681,0.031544,13
2,1.30433,0.909316,0.000952,0.000257,1,0.0001,linear,"{'C': 1, 'gamma': 0.0001, 'kernel': 'linear'}",0.989011,0.967033,0.912088,0.967033,0.967033,0.96044,0.025631,1
3,0.00774,0.001382,0.003062,0.000716,1,0.0001,rbf,"{'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}",0.956044,0.934066,0.923077,0.934066,0.956044,0.940659,0.013187,9
4,5.518434,3.770595,0.00098,0.000435,10,0.001,linear,"{'C': 10, 'gamma': 0.001, 'kernel': 'linear'}",0.956044,0.956044,0.89011,0.956044,0.989011,0.949451,0.032301,5
5,0.011503,0.001281,0.004127,0.000603,10,0.001,rbf,"{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}",0.934066,0.912088,0.868132,0.945055,0.923077,0.916484,0.026556,14
6,5.319839,3.592618,0.000736,0.000194,10,0.0001,linear,"{'C': 10, 'gamma': 0.0001, 'kernel': 'linear'}",0.956044,0.956044,0.89011,0.956044,0.989011,0.949451,0.032301,5
7,0.005545,0.000173,0.002237,0.000282,10,0.0001,rbf,"{'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}",0.967033,0.901099,0.945055,0.945055,0.923077,0.936264,0.022413,10
8,4.637252,2.293389,0.00076,0.000178,100,0.001,linear,"{'C': 100, 'gamma': 0.001, 'kernel': 'linear'}",0.934066,0.967033,0.934066,0.945055,0.967033,0.949451,0.014906,5
9,0.012076,0.001502,0.003451,8.2e-05,100,0.001,rbf,"{'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}",0.934066,0.912088,0.868132,0.945055,0.923077,0.916484,0.026556,14


In [12]:
cv_results[['param_C', 'param_gamma', 'param_kernel', 'mean_test_score']]

Unnamed: 0,param_C,param_gamma,param_kernel,mean_test_score
0,1,0.001,linear,0.96044
1,1,0.001,rbf,0.918681
2,1,0.0001,linear,0.96044
3,1,0.0001,rbf,0.940659
4,10,0.001,linear,0.949451
5,10,0.001,rbf,0.916484
6,10,0.0001,linear,0.949451
7,10,0.0001,rbf,0.936264
8,100,0.001,linear,0.949451
9,100,0.001,rbf,0.916484


In [13]:
# to view other properties of our grid object
dir(grid)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_check_is_fitted',
 '_check_n_features',
 '_check_refit_for_multimetric',
 '_estimator_type',
 '_format_results',
 '_get_param_names',
 '_get_tags',
 '_more_tags',
 '_pairwise',
 '_repr_html_',
 '_repr_html_inner',
 '_repr_mimebundle_',
 '_required_parameters',
 '_run_search',
 '_validate_data',
 'best_estimator_',
 'best_index_',
 'best_params_',
 'best_score_',
 'classes_',
 'cv',
 'cv_results_',
 'decision_function',
 'error_score',
 'estimator',
 'fit',
 'get_params',
 'inverse_transform',
 'multimetric_',
 'n_features_in_',
 'n_jobs',
 'n_splits

## Randomized Search 

Random search bears some similarities to grid search. However, a key difference is that in random search we do not specify the possible set of values for every hyperparameter. Alternatively, values are sampled from a statistical distribution for each hyperparameter. A sampling distribution is defined for every hyperparameter to do a random search.

This technique allows us to control the number of attempted hyperparameter combinations. Unlike grid search, where every possible combination is attempted, random search allows us to specify the number of models to train. We can base our search iterations on our computational resources or the time taken per iteration. 

In [14]:
from sklearn.model_selection import RandomizedSearchCV

# params
params = {
    'C': [1, 10, 100, 1000], 'gamma':[0.001, 0.0001], 'kernel': ['linear', 'rbf']   
}

# define the random model 
RS = RandomizedSearchCV(SVC(), params, verbose=0, cv=5, n_iter=5)

# fit RS
RS.fit(X_train, y_train)

# print best parameter after tuning 
print('The best parameters are:', RS.best_params_) 


The best parameters are: {'kernel': 'linear', 'gamma': 0.0001, 'C': 1}


In [15]:
RS_results = pd.DataFrame(RS.cv_results_)
RS_results

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_kernel,param_gamma,param_C,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,1.134266,0.796562,0.000679,9.3e-05,linear,0.0001,1,"{'kernel': 'linear', 'gamma': 0.0001, 'C': 1}",0.989011,0.967033,0.912088,0.967033,0.967033,0.96044,0.025631,1
1,6.091259,3.50784,0.000713,7.3e-05,linear,0.001,10,"{'kernel': 'linear', 'gamma': 0.001, 'C': 10}",0.956044,0.956044,0.89011,0.956044,0.989011,0.949451,0.032301,4
2,7.974022,2.984244,0.000727,0.000143,linear,0.0001,1000,"{'kernel': 'linear', 'gamma': 0.0001, 'C': 1000}",0.934066,0.967033,0.934066,0.967033,0.989011,0.958242,0.021308,2
3,8.315732,3.354364,0.000664,0.000116,linear,0.001,1000,"{'kernel': 'linear', 'gamma': 0.001, 'C': 1000}",0.934066,0.967033,0.934066,0.967033,0.989011,0.958242,0.021308,2
4,0.010234,0.000429,0.00422,0.000422,rbf,0.001,1,"{'kernel': 'rbf', 'gamma': 0.001, 'C': 1}",0.967033,0.89011,0.879121,0.934066,0.923077,0.918681,0.031544,5


In [16]:
RS_results[['param_kernel', 'param_gamma', 'param_C', 'mean_test_score']]

Unnamed: 0,param_kernel,param_gamma,param_C,mean_test_score
0,linear,0.0001,1,0.96044
1,linear,0.001,10,0.949451
2,linear,0.0001,1000,0.958242
3,linear,0.001,1000,0.958242
4,rbf,0.001,1,0.918681


### Conclusion
We mentioned that grid search method exhaustively searches all hyperparameter combinations. Whereas Random search lets us specify how many models we want to train, therefore controlling the number of combinations attempted. This introduces a trade-off between the assurance of finding the best parameters and computational times.
