### Colab Activity 9.3: Using GridSearchCV

**Expected Time: 45 Minutes**


This activity focuses on using `GridSearchCV` to search over different hyperparameter values within the `Ridge` estimator.  You will first use the grid search to search parameters for an estimator.  Then, you will incorporate a pipeline into the grid search and identify the step in the pipeline you are searching along with the hyperparameters. 

#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)

In [1]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### The Data

We again use the California housing dataset from scikit-learn.  You are building regression models with the `MedHouseVal` as the target feature.  The data is loaded and described below.  

In [2]:
cali = fetch_california_housing(as_frame=True)

In [3]:
cali.frame.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [4]:
X = cali.frame.drop('MedHouseVal', axis = 1)
y = cali.frame['MedHouseVal']

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Problem 1

#### Dictionary for grid search



As discussed in the videos, to search over hyperparameters you have to create a dictionary with a key whose name is exactly that of the hyperparameter to search over.  With the `Ridge` estimator, this will be `alpha`.  Create a dictionary with `alpha` as the key and values `[0.1, 1.0, 10.0]` and assign it to the variable `params_dict` below.  

In [6]:

params_dict = {'alpha': [0.1, 1.0, 10.0]}


# Answer check
print(params_dict.values())
print(params_dict.keys())

dict_values([[0.1, 1.0, 10.0]])
dict_keys(['alpha'])


### Problem 2

#### Creating the grid search object


Instantiate a `Ridge()` regressor and assign to `ridge`.

Next, use `GridSearchCV(` to instantiate a grid search object using `ridge` as the estimator. Set the argument `param_grid` equal to `params_dict`. Assign your grid to `grid` below. 

In [7]:

ridge = Ridge()
grid = GridSearchCV(ridge, param_grid=params_dict)


# Answer check
print(grid.get_params()['param_grid'])
print(grid)

{'alpha': [0.1, 1.0, 10.0]}
GridSearchCV(estimator=Ridge(), param_grid={'alpha': [0.1, 1.0, 10.0]})


### Problem 3

#### Performing the grid search


- Use the `fit` function on `grid` to train your model using `X_train`  and `y_train`.
- Use the `predict` function on `grid` to compute the predictions on `X_train`. Assign your result to `train_preds`.
- Use the `predict` function on `gird` to compute the predictions on `X_test`. Assign your result to `test_preds`.
- Use the `mean_squared_error` function to compute the MSE between `y_train` and `train_preds`. Assign your result to `train_mse`.
- Use the `mean_squared_error` function to compute the MSE between `y_test` and `test_preds`. Assign your result to `test_mse`.



In [8]:

ridge = Ridge()
grid = GridSearchCV(ridge, param_grid=params_dict)
grid.fit(X_train, y_train)
train_preds = grid.predict(X_train)
test_preds = grid.predict(X_test)
train_mse = mean_squared_error(y_train, train_preds)
test_mse = mean_squared_error(y_test, test_preds)

# Answer check
print(f'Train MSE: {train_mse}')
print(f'Test MSE: {test_mse}')

Train MSE: 0.5233576299656519
Test MSE: 0.5305615027470352


### Problem 4

#### Identify optimal alpha value


Use y fit grid to determine the optimal alpha value.  Assign this as a float to `best_alpha` below.  (**Hint**: Use the `best_params_` attribute of the fit grid.)

In [9]:

ridge = Ridge()
grid = GridSearchCV(ridge, param_grid=params_dict)
grid.fit(X_train, y_train)
best_alpha = grid.best_params_


# Answer check
print(f'Best alpha: {list(best_alpha.values())[0]}')

Best alpha: 0.1


### Problem 5

#### Pipeline with Grid Search


To use a `Pipeline` in a `GridSearchCV`, you want to preface the value in your parameter dictionary with an all-lowercase version of the object.  For example, to search over a ridge estimator's alpha value, we will create a pipeline with names `scaler` and `ridge` to use the `StandardScaler` followed by the `Ridge` regressor.  To search over the ridge object alpha parameter, we write `ridge__alpha`. (Note there are two underscores here.)

Below, you are provided a pipeline and dictionary ready to be used in a new grid search.  You are to instantiate, fit, and score a grid search on the train and test data using mean squared error. Create your grid object as `grid_2` below and assign the training error and test error to `model_2_train_mse` and `model_2_test_mse`.  Determine the optimal value for `alpha` and assign it as a dictionary to `model_2_best_alpha` below.

In [10]:
pipe = Pipeline([('scale', StandardScaler()), ('ridge', Ridge())])

In [11]:
param_dict = {'ridge__alpha': [0.001, 0.1, 1.0, 10.0, 100.0, 1000.0]}

In [12]:

grid_2 = GridSearchCV(pipe, param_grid=param_dict)
grid_2.fit(X_train, y_train)
train_preds = grid_2.predict(X_train)
test_preds = grid_2.predict(X_test)
model_2_train_mse = mean_squared_error(y_train, train_preds)
model_2_test_mse = mean_squared_error(y_test, test_preds)
model_2_best_alpha = grid_2.best_params_


# Answer check
print(f'Test MSE: {model_2_test_mse}')
print(f'Best Alpha: {list(model_2_best_alpha.values())[0]}')

Test MSE: 0.5305677582888797
Best Alpha: 0.001
