### Ridge Regression and Grid Search Cross Validation


This program uses `GridSearchCV` to search over different hyperparameter values within the `Ridge` estimator.  

In [86]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#### Load data - California Housing Data

We again use the California housing dataset from scikit-learn.  You are building regression models with the `MedHouseVal` as the target feature.  The data is loaded and described below.  

In [89]:
cali = fetch_california_housing(as_frame=True)
cali.frame.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [91]:
# Build Training and Test Data
X = cali.frame.drop('MedHouseVal', axis = 1)
y = cali.frame['MedHouseVal']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#### Grid Search Cross Validation


Create hyperparameter `alpha` as potential values of `[0.1, 1.0, 10.0]`. Use `Ridge` estimator to use `alpha`.  


In [94]:


params_dict = ''

params_dict = {"alpha": [0.1, 1.0, 10.0]}

ridge = Ridge()
grid = GridSearchCV(estimator=ridge, param_grid=params_dict)
print(grid.get_params()['param_grid'])
print(grid)

# Fit grid with X_train and y_train
grid.fit(X_train, y_train)
train_preds = grid.predict(X_train)
train_mse = mean_squared_error(train_preds, y_train)

test_preds = grid.predict(X_test)
test_mse = mean_squared_error(test_preds, y_test)
print(f'Train MSE: {train_mse}')
print(f'Test MSE: {test_mse}')

#Identify best alpha hyperparameter
best_alpha = grid.best_params_
print(f'Best alpha: {list(best_alpha.values())[0]}')

{'alpha': [0.1, 1.0, 10.0]}
GridSearchCV(estimator=Ridge(), param_grid={'alpha': [0.1, 1.0, 10.0]})
Train MSE: 0.5233576299656519
Test MSE: 0.530561502747035
Best alpha: 0.1


#### Pipeline with Grid Search

Build pipeline with Grid Search using `StandardScaler` and `Ridge` regressor. 

In [97]:
pipe = Pipeline([('scale', StandardScaler()), ('ridge', Ridge())])
param_dict = {'ridge__alpha': [0.001, 0.1, 1.0, 10.0, 100.0, 1000.0]}

grid_2 = GridSearchCV(estimator=pipe,
                           param_grid=param_dict)
grid_2.fit(X_train, y_train)

model_2_best_alpha = grid_2.best_params_

train_preds = grid_2.predict(X_train)
model_2_train_mse = mean_squared_error(train_preds, y_train)

test_preds = grid_2.predict(X_test)
model_2_test_mse = mean_squared_error(test_preds, y_test)

#print Test MSE
print(f'Test MSE: {model_2_test_mse}')
print(f'Best Alpha: {list(model_2_best_alpha.values())[0]}')


Test MSE: 0.5305677582888797
Best Alpha: 0.001
