Hyperparameters are model configuration properties that define a model and they remain constant during the training of the model. 


An easier way to think of hyperparameters is that they are part of the model design. 

It's something that you specify that makes up the design of your model. 

When you talk about a machine learning model, there are typically three bits of data that you associate with the model, 

- model inputs, which is what you use to train the model; 


- model parameters, these are what you're trying to figure out during model training; and finally, 


- model hyperparameters, which make up the design of your model. 


Model inputs refer to the training data, this is the data that your model uses to learn. You feed in training data during the training process of your model, and this is what the model uses to find the model parameters.


The model parameters here are the regression coefficients in a regression model. Based on whether you have a regression model or a classification model, model parameters will of course be different. 



In the case of regression, the coefficient and the intercept are your model parameters. And finally, you have model hyperparameters. 


These are the configuration properties for your model. If it's a decision tree model, the depth of the decision tree could be a hyperparameter. 


If it's a regularized regression model that you are building, the value of alpha that you use to multiply the penalty function is a hyperparameter. 


The best way to distinguish model parameters and hyperparameters is that model parameters learn or change during the training process, model hyperparameters remain constant. That's something you specify up front. 


Hyperparameter tuning in scikit-learn is performed using grid search. 


<img src="../files/Capture14.png">

Here is where you specify all possible values for a hyperparameter, all possible values for alpha, say. 

All of these values form a grid where every cell is a candidate model, a model that you want to test. So every cell is a particular design of a model that you want to evaluate. 

You will then use GridSearchCV, where CV stands for cross-validation, to evaluate each candidate model. cross-validation is a technique that you use to evaluate models. 

Once you use GridSearchCV, scikit-learn is responsible for ensuring proper evaluation and cross-validation. Grid search is very computationally expensive. If you have two hyperparameters and three values each, then you have 3 multiplied by 3, which gives you 9, so a total of 9 candidate models that you have to train and evaluate to find the best possible one. 

That can get pretty intense. So you need to be aware upfront that the cost and complexity of grid search can grow very, very quickly. 

And if you're performing grid search to find the best candidate model on a cloud platform such as AWS, GCP or Azure, cloud-based evaluation can quickly become very expensive. 

Also, grid search does not differentiate between important and trivial hyperparameters. 

You might know up front that these hyperparameters are important and these are not, grid search does not have this information. 


An alternative to grid search for hyperparameter tuning is random search of the hyperparameter space. 

Specify the important parameters and randomly pick values to find the best possible candidate model.

So which machine learning model works best on your dataset? Well, you'll never know until you evaluate a bunch of different models with different parameters, and this is where we'll use hyperparameter tuning. 

we'll perform hyperparameter tuning using the grid search object in scikit-learn to find the best model for our dataset. 

We'll find the best model for our dataset using the GridSearchCV library in scikit-learn. Grid search is so called because it sets up the range of parameters that you specify in the form of a grid and trains a model with each combination of parameters. Once again, we'll work with the auto-mpg-processed dataset

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import statsmodels.api as sm

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.preprocessing import StandardScaler

In [3]:
automobile_df = pd.read_csv('../datasets/auto-mpg-processed.csv')

In [4]:
from sklearn.linear_model import Lasso
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor

In [5]:
import warnings
warnings.filterwarnings("ignore") # never print matching warnings

In [6]:
from sklearn.model_selection import GridSearchCV

We'll first perform grid search to find the best parameters for our lasso regression model. Now, every regression model will have different hyperparameters that you can tune, and each model can have more than one. 

In [7]:
X = automobile_df.drop(columns=['mpg','age'])

Y = automobile_df['mpg']

In [8]:
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2)

In [9]:
parameters = {'alpha': [0.2,0.4,0.6,0.7,0.8,0.9,1.0]}

grid_search = GridSearchCV(Lasso(),parameters,cv=3,return_train_score=True)
grid_search.fit(x_train,y_train)

grid_search.best_params_

{'alpha': 1.0}

Here we will use grid search to tune just one hyperparameter, the value of alpha. 

Alpha is what we use to multiply the penalty terms of the lasso regression model. The alpha values that we want to test range from 0.2 to 1.0. Instantiate the GridSearchCV object with the lasso estimator. CV, as you know, stands for cross-validation. 

Specify the parameter dictionary for the different parameters that you want to use for alpha. 

Grid search will instantiate and train a lasso regression model for each value of alpha, so we have about seven values of alpha here, seven lasso regression models will be built and trained, each with a different value for alpha. 

The grid search object will then try and find the best model for your dataset by evaluating the model using three-fold cross validation. Your original training dataset will be split into three parts. Each model will be trained using three different runs. Two parts of the dataset will be used for training, and the third part will be used to evaluate your model. This is what cross-validation is. 

Your model will be scored and evaluated based on the default scoring mechanism used for that particular estimator object. In the case of regression models, the default scoring is the R square score. If you want to use grid search in more advanced ways, you can specify other scoring mechanisms to evaluate your models as well. 

Grid_search.fit will start a process of training on all of the different models and builds, and grid_search.best_params will return the parameters of the best model. 

And in the case of lasso regression here, you can see that the best value of alpha, the one which gave us the best model for our data set, is 1. But how did a value of alpha equal to 1 stack up against other models that grid search built and trained? 

We can get this using a simple for loop. Use a for loop to iterate through the number of models that grid search built. In this case it's 7, the number of values we had specified for alpha. 

In [11]:
for i in range(len(parameters['alpha'])):
    
    print('Parameters : ',grid_search.cv_results_['params'][i])
    
    print('Mean Test Score : ',grid_search.cv_results_['mean_test_score'][i])
    
    print('Rank: ',grid_search.cv_results_['rank_test_score'][i])

Parameters :  {'alpha': 0.2}
Mean Test Score :  0.6925363169147964
Rank:  7
Parameters :  {'alpha': 0.4}
Mean Test Score :  0.6953550564054829
Rank:  6
Parameters :  {'alpha': 0.6}
Mean Test Score :  0.6959665194466846
Rank:  5
Parameters :  {'alpha': 0.7}
Mean Test Score :  0.6960464964835806
Rank:  4
Parameters :  {'alpha': 0.8}
Mean Test Score :  0.6961197952739048
Rank:  3
Parameters :  {'alpha': 0.9}
Mean Test Score :  0.6961927195556029
Rank:  2
Parameters :  {'alpha': 1.0}
Mean Test Score :  0.696265448479771
Rank:  1


For each of the models that grid search built and trained, the grid_search.cv_results_ variable holds the model parameters, the mean test score, and the rank of that particular model, whether it was the best model, the second best, and so on.  

And you can see here that alpha is equal to 1 produced the best model, all of the other models are listed here as well with their corresponding ranks. You can see that the worst model at rank 7 was with alpha equal to 0.2, its R square was 69.25. And here is the model at rank 1 with alpha equal to 1 with an R-square of 69.62, not that much difference here, but still enough for grid search to say that this was the best possible model. 


Once you have the best possible value for the alpha parameter, you can instantiate a lasso regression model using this value of alpha and train it on your dataset.

In [18]:
lasso_model = Lasso(alpha=grid_search.best_params_['alpha']).fit(x_train,y_train)
y_pred = lasso_model.predict(x_test)

print("training score : ", lasso_model.score(x_train,y_train))
print("testing score : ", r2_score(y_test,y_pred))

training score :  0.7113149481002754
testing score :  0.6806665639877405


 Let's print out the training score and the test score, and you can see that these are very close to what we got when we ran grid search. Thus, grid search is an extremely easy and convenient mechanism available in scikit-learn to perform hyperparameter tuning for your models.

### Neighbors

In [19]:
parameters = {'n_neighbors': [10,12,14,18,20,25,30,35,50]}

grid_search = GridSearchCV(KNeighborsRegressor(),parameters,cv=3,return_train_score=True)
grid_search.fit(x_train,y_train)

grid_search.best_params_

{'n_neighbors': 25}

In [20]:
for i in range(len(parameters['n_neighbors'])):
    
    print('Parameters : ',grid_search.cv_results_['params'][i])
    
    print('Mean Test Score : ',grid_search.cv_results_['mean_test_score'][i])
    
    print('Rank: ',grid_search.cv_results_['rank_test_score'][i])

Parameters :  {'n_neighbors': 10}
Mean Test Score :  0.6921393882411534
Rank:  9
Parameters :  {'n_neighbors': 12}
Mean Test Score :  0.6929877389878504
Rank:  8
Parameters :  {'n_neighbors': 14}
Mean Test Score :  0.6958488237227393
Rank:  7
Parameters :  {'n_neighbors': 18}
Mean Test Score :  0.6989496555105245
Rank:  6
Parameters :  {'n_neighbors': 20}
Mean Test Score :  0.7053507002683133
Rank:  4
Parameters :  {'n_neighbors': 25}
Mean Test Score :  0.7118030791078617
Rank:  1
Parameters :  {'n_neighbors': 30}
Mean Test Score :  0.7074296119146241
Rank:  2
Parameters :  {'n_neighbors': 35}
Mean Test Score :  0.7073293542798775
Rank:  3
Parameters :  {'n_neighbors': 50}
Mean Test Score :  0.6991845408446635
Rank:  5


In [23]:
k_neighbors_model = KNeighborsRegressor(n_neighbors=grid_search.best_params_['n_neighbors']).fit(x_train,y_train)
y_pred = k_neighbors_model.predict(x_test)

print("training score : ", k_neighbors_model.score(x_train,y_train))
print("testing score : ", r2_score(y_test,y_pred))

training score :  0.7305902712708826
testing score :  0.6893821614407205


# D-Tree

In [25]:
parameters = {'max_depth': [1,2,3,4,5,7,8]}

grid_search = GridSearchCV(DecisionTreeRegressor(),parameters,cv=3,return_train_score=True)
grid_search.fit(x_train,y_train)

grid_search.best_params_

{'max_depth': 3}

In [26]:
for i in range(len(parameters['max_depth'])):
    
    print('Parameters : ',grid_search.cv_results_['params'][i])
    
    print('Mean Test Score : ',grid_search.cv_results_['mean_test_score'][i])
    
    print('Rank: ',grid_search.cv_results_['rank_test_score'][i])

Parameters :  {'max_depth': 1}
Mean Test Score :  0.5123229862572404
Rank:  7
Parameters :  {'max_depth': 2}
Mean Test Score :  0.6612507974874987
Rank:  4
Parameters :  {'max_depth': 3}
Mean Test Score :  0.6925394732079502
Rank:  1
Parameters :  {'max_depth': 4}
Mean Test Score :  0.6680105782603305
Rank:  3
Parameters :  {'max_depth': 5}
Mean Test Score :  0.6718559090438972
Rank:  2
Parameters :  {'max_depth': 7}
Mean Test Score :  0.6323535537036307
Rank:  5
Parameters :  {'max_depth': 8}
Mean Test Score :  0.6196843121468425
Rank:  6


In [27]:
d_tree_model = DecisionTreeRegressor(max_depth=grid_search.best_params_['max_depth']).fit(x_train,y_train)
y_pred = d_tree_model.predict(x_test)

print("training score : ", d_tree_model.score(x_train,y_train))
print("testing score : ", r2_score(y_test,y_pred))

training score :  0.7946369482319195
testing score :  0.7020737307060259


### MULTIPLE PARAMETERS

 Now there are multiple hyperparameters that you can use to design your model, and grid search can be used to train with multiple hyperparameters as well.
 
Now, we'll train a support vector regression model using two hyperparameters, epsilon and C. Support vector regression works by trying to fit as many points as possible into a margin surrounding the best fit line, and epsilon determines the size of that margin. The parameter C, on the other hand, is the penalty that we apply to outlier points that lie outside of this margin. Grid search will train models for every combination of these hyperparameters. It will train a total of 8 models, 2 multiplied by 4. As we have many more complex models to train, this grid search takes about a minute or two on my machine. 

In [29]:
parameters = {'epsilon': [0.5,0.1,0.2,0.3],
             'C':[0.2,0.3]}

grid_search = GridSearchCV(SVR(kernel='linear'),parameters,cv=3,return_train_score=True)
grid_search.fit(x_train,y_train)

grid_search.best_params_

{'C': 0.3, 'epsilon': 0.2}

And the best model here was the combination where C was equal to 0.3 and epsilon was 0.2. 

Now that we know this combination, let's instantiate a support vector regressor using this value of C and this value of epsilon and train it on our dataset. And let's calculate the training and test score for this model. 

In [32]:
SVR_model = SVR(kernel='linear',
                epsilon=grid_search.best_params_['epsilon'],
                C=grid_search.best_params_['C']).fit(x_train,y_train)
y_pred = SVR_model.predict(x_test)

print("training score : ", SVR_model.score(x_train,y_train))
print("testing score : ", r2_score(y_test,y_pred))

training score :  0.7057710425121682
testing score :  0.6928936612688478


This model did fairly well. Training score of 70% and a test score of 69%. This is an example of how you'd use grid search to train multiple hyperparameters to find the best possible model. The more hyperparameters you add, the longer it takes for grid search to find the best model.