<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Grid Search

---


![](https://snag.gy/aYcCt2.jpg)

### Learning Objective
- Understand what the terms gridsearch and hyperparameter refer to.
- Understand how to manually build a gridsearching procedure.
- Apply sklearn's `GridSearchCV` object with the boston housing data to optimize a linear regression model.
- Practice using and evaluating attributes of the gridsearch object.
- Practice the gridsearch procedure independently optimizing regularized linear regression.

### Lesson Guide
- [What is "Gridsearching"? What are "hyperparameters"?](#intro)
- [An example](#example)
- [A more sophisticated example](#example2)
- [How many possible parameter combinations are there](#parameters)
- [Implementing GridSearchCV](#gridsearch)
- [Setup GridSearchCV Parameters](#setup)
- [Review results](#review)
- [Conclusion](#conclusion)

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import seaborn as sns

plt.style.use('fivethirtyeight')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

<a id='intro'></a>

## What is "Gridsearching"? What are "hyperparameters"?

---

Models often have specifications that can be set. For example, when we choose a linear regression, we may decide to add a penalty to the loss function such as the Ridge or the Lasso. Those penalties require the regularization strength, alpha, to be set. 

**Model parameters are called hyperparameters.**

Hyperparameters are different than the parameters of the model resulting from a fit, such as the coefficients. The **hyperparameters are set prior to the fit** and determine the behaviour of the model.

There are often more than one kind of hyperparamter to set for a model. For example, in
the sklearn linear regression, we have hyperparameters to set for if to include an intercept term and if to normalize the data. Other models may contain many more parameters. We want to know the *optimal* hyperparameter settings, the set that results in the best model evaluation. 

**The search for the optimal set of hyperparameters is called gridsearching.**

Gridsearching gets its name from the fact that we are searching over a "grid" of parameters. For example, imagine the `fit_intercept` hyperparameters on the x-axis and `normalize` on the y-axis, and we need to test all points on the grid. You could add further points to your search grid by testing regularization for Ridge or Lasso with varying alpha.

Scikit learn contains a gridsearch method with which this procedure can be implemented straight-forwardly. **Gridsearching uses cross-validation internally to evaluate the performance of each set of hyperparameters.** More on this later.

<a id='example'></a>
## An Example

So far we haven't really done much to tune linear regression apart from regularization.  The prime example we will look at will be regularization, but let's first look at the mechanics of our model to establish some basic assumptions.

### Linear Regression Parameters
| Parameter | Potential Values |
| --- | ---|
| **fit_intercept** | bool: True/False |
| **normalize** | bool:  True/False |

> The normalize parameter:  If **True**, the regressors X will be normalized before  regression.


Gridsearch is as if we were to run and score the following code, using all combinations of the specified parameters:

```python

# Case 1
lm = LinearRegression(fit_intercept=True, normalize=False)
model = lm.fit(X, y)
score = model.score(X,y)

# Case 2
lm = LinearRegression(fit_intercept=False, normalize=False)
model = lm.fit(X, y)
score = model.score(X,y)

# Case 3
lm = LinearRegression(fit_intercept=True, normalize=True)
model = lm.fit(X, y)
score = model.score(X,y)

# Case 4
lm = LinearRegression(fit_intercept=False, normalize=True)
model = lm.fit(X, y)
score = model.score(X,y)

```

The cases are:

| Case | fit_intercept | normalize |
| ---- | ------------- |----------:|
|  1   | True          | False     |
|  2   | False         | False     |
|  3   | True          | True      |
|  4   | False         | True      |


- How could you test these cases in python in one go?

<a id='gridsearch'></a>
## Implementing GridSearchCV

GridSearchCV implements cross validation automatically.
By default the `cv` parameter is set to `3`. You can set this as high as the number of datapoints. 

In [2]:
# Load gridsearch, libraries, test data

from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression
import pandas as pd, patsy
import pprint

boston = datasets.load_boston()

X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target)

In [3]:
X.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [4]:
y.head()

0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
dtype: float64

<a id='setup'></a>
## Setup GridSearchCV Parameters

In [5]:
# Setup our GridSearch Parmaters
search_parameters = {
    'fit_intercept':  [True, False], 
    'normalize':      [False, True]
}

# Intialize a blank model object
lm = LinearRegression()

# Initialize gridsearch
estimator = GridSearchCV(
    lm, # estimator
    search_parameters, # hyper-parameter space to search 
    cv=5, # number of folds
    scoring="neg_mean_squared_error" # scoring metric to optimise for
)

# Fit some data
results = estimator.fit(X, y)
print( results.best_estimator_)

LinearRegression(copy_X=True, fit_intercept=False, n_jobs=1, normalize=False)


<a id='review'></a>
## Review results

There are a number of interesting result properties to explore.

| Property | Use |
| --- | ---|
| **`results.param_grid`** | Displays parameters used |
| **`results.best_score_`** | Best score achieved |
| **`results.best_estimator_`** | Reference to model with best score; is usable / callable |
| **`results.best_params_`** | The parameters that have been found to perform with the best score |
| **`results.cv_results_`** | Display score attributes with corresponding parameters | 

In [6]:
results.best_estimator_.fit(X,y)

LinearRegression(copy_X=True, fit_intercept=False, n_jobs=1, normalize=False)

In [7]:
print( "Best estimator:")
print( results.best_estimator_)

print() 

print( "Best score:")
print( np.sqrt(-1 * results.best_score_))

print()

print( "Best params:")
print( results.best_params_)

print()

print( "Grid parameters")
print( results.param_grid)

print()
print( "CV results:")
pprint.pprint(results.cv_results_)

Best estimator:
LinearRegression(copy_X=True, fit_intercept=False, n_jobs=1, normalize=False)

Best score:
5.8681529185

Best params:
{'fit_intercept': False, 'normalize': False}

Grid parameters
{'fit_intercept': [True, False], 'normalize': [False, True]}

CV results:
{'mean_fit_time': array([ 0.01892209,  0.00117221,  0.00106263,  0.00103197]),
 'mean_score_time': array([ 0.00032353,  0.00026536,  0.00025039,  0.00024476]),
 'mean_test_score': array([-37.17394602, -37.17394602, -34.43521867, -34.43521867]),
 'mean_train_score': array([-20.7345436 , -20.7345436 , -23.31488392, -23.31488392]),
 'param_fit_intercept': masked_array(data = [True True False False],
             mask = [False False False False],
       fill_value = ?)
,
 'param_normalize': masked_array(data = [False True False True],
             mask = [False False False False],
       fill_value = ?)
,
 'params': [{'fit_intercept': True, 'normalize': False},
            {'fit_intercept': True, 'normalize': True},
        

### Read the results into a pandas dataframe

In [8]:
pd.DataFrame(results.cv_results_)



Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_fit_intercept,param_normalize,params,rank_test_score,split0_test_score,split0_train_score,...,split2_test_score,split2_train_score,split3_test_score,split3_train_score,split4_test_score,split4_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
0,0.018922,0.000324,-37.173946,-20.734544,True,False,"{'fit_intercept': True, 'normalize': False}",4,-12.48065,-24.588963,...,-33.119956,-21.187043,-80.833054,-12.917454,-33.584356,-22.737189,0.035239,0.000138,23.102665,4.060823
1,0.001172,0.000265,-37.173946,-20.734544,True,True,"{'fit_intercept': True, 'normalize': True}",3,-12.48065,-24.588963,...,-33.119956,-21.187043,-80.833054,-12.917454,-33.584356,-22.737189,2.9e-05,4.3e-05,23.102665,4.060823
2,0.001063,0.00025,-34.435219,-23.314884,False,False,"{'fit_intercept': False, 'normalize': False}",1,-8.560442,-28.269414,...,-23.779028,-25.602626,-86.095282,-13.048049,-31.591894,-24.121012,4.3e-05,8e-06,26.848121,5.305852
3,0.001032,0.000245,-34.435219,-23.314884,False,True,"{'fit_intercept': False, 'normalize': True}",1,-8.560442,-28.269414,...,-23.779028,-25.602626,-86.095282,-13.048049,-31.591894,-24.121012,2.3e-05,2e-06,26.848121,5.305852
