# Employing Ridge Regression

As mentioned before, OLS has a propensity to find spurious fits. **Ridge regression** helps prevent this by introducing regularization parameters, such as $\alpha$. Larger $\alpha$ implies more regularization and less overfitting.

Ridge regression is implemented via the `Ridge` object provided in **scikit-learn**.

We will work with the Boston housing price dataset. In an earlier video we saw that removing some features may lead to better models. We will remove those features here as well.

In [1]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

In [2]:
boston_obj = load_boston()
data_train, data_test, price_train, price_test = train_test_split(boston_obj.data, boston_obj.target)
data_train = np.delete(data_train, [2, 6], axis=1)
data_test = np.delete(data_test, [2, 6], axis=1)

data_train[:5, :]

array([[  7.75223000e+00,   0.00000000e+00,   0.00000000e+00,
          7.13000000e-01,   6.30100000e+00,   2.78310000e+00,
          2.40000000e+01,   6.66000000e+02,   2.02000000e+01,
          2.72210000e+02,   1.62300000e+01],
       [  1.78667000e+01,   0.00000000e+00,   0.00000000e+00,
          6.71000000e-01,   6.22300000e+00,   1.38610000e+00,
          2.40000000e+01,   6.66000000e+02,   2.02000000e+01,
          3.93740000e+02,   2.17800000e+01],
       [  4.47910000e-01,   0.00000000e+00,   1.00000000e+00,
          5.07000000e-01,   6.72600000e+00,   3.65190000e+00,
          8.00000000e+00,   3.07000000e+02,   1.74000000e+01,
          3.60200000e+02,   8.05000000e+00],
       [  5.78900000e-02,   1.25000000e+01,   0.00000000e+00,
          4.09000000e-01,   5.87800000e+00,   6.49800000e+00,
          4.00000000e+00,   3.45000000e+02,   1.89000000e+01,
          3.96210000e+02,   8.10000000e+00],
       [  7.84200000e-01,   0.00000000e+00,   0.00000000e+00,
          5.38

## Fitting With Ridge Regression

Ridge regression is implemented in the `Ridge` object supplied by **scikit-learn**.

In [3]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score

In [4]:
ridge1 = Ridge(alpha=1)    # alpha is a hyperparameter controlling regularization
ridge1.fit(data_train, price_train)
ridge1.predict([[    # An example prediction
    1,      # Per capita crime rate
    25,     # Proportion of land zoned for large homes
    1,      # Tract bounds the Charles River
    0.3,    # NOX concentration
    10,     # Average number of rooms per dwelling
    10,     # Weighted distance to employment centers
    3,      # Index for highway accessibility
    400,    # Tax rate
    15,     # Pupil/teacher ratio
    200,    # Index for number of blacks
    5       # % lower status of population
]])

array([ 35.60976252])

In [5]:
predprice = ridge1.predict(data_train)
mean_squared_error(price_train, predprice)

24.805913396896099

We can use cross-validation to pick a good value for $\alpha$. I will use `cross_val_score()` for this work.

In [6]:
import pandas as pd
from pandas import DataFrame

In [7]:
alpha = [.125, .25, .5, 1, 2, 4, 8, 16, 32, 64, 128]    # Candidate alphas
res = dict()

for a in alpha:
    ridge2 = Ridge(alpha=a)
    res[a] = cross_val_score(ridge2, data_train, price_train, scoring='neg_mean_squared_error', cv = 10)

res_df = DataFrame(res)

res_df

Unnamed: 0,0.125,0.25,0.5,1.0,2.0,4.0,8.0,16.0,32.0,64.0,128.0
0,-17.304222,-17.234732,-17.152534,-17.09521,-17.102031,-17.167592,-17.256061,-17.357533,-17.508714,-17.773929,-18.193869
1,-26.522072,-26.595469,-26.734186,-26.958167,-27.229521,-27.434632,-27.463042,-27.299119,-27.050952,-26.924008,-27.095314
2,-21.936741,-21.933755,-21.96207,-22.065557,-22.271016,-22.551581,-22.866333,-23.221994,-23.683907,-24.351623,-25.273893
3,-33.058802,-33.026976,-32.992844,-32.975853,-32.985608,-32.991102,-32.939767,-32.813181,-32.66997,-32.666176,-32.999845
4,-32.901645,-32.77977,-32.629602,-32.514981,-32.526166,-32.718449,-33.119272,-33.789371,-34.831769,-36.328163,-38.241471
5,-20.348829,-20.502741,-20.771642,-21.183444,-21.701443,-22.226018,-22.68924,-23.1263,-23.659356,-24.440287,-25.558801
6,-35.209947,-35.134525,-35.039888,-34.962723,-34.95291,-35.030622,-35.186064,-35.419867,-35.769407,-36.3228,-37.19951
7,-19.130555,-19.268035,-19.51676,-19.90485,-20.378083,-20.790514,-21.014771,-21.030392,-20.898038,-20.689463,-20.431257
8,-37.943773,-37.998168,-38.133085,-38.402693,-38.801072,-39.229614,-39.600395,-39.94166,-40.380643,-41.042134,-41.928551
9,-22.658957,-22.759768,-22.939704,-23.210058,-23.501946,-23.64683,-23.462662,-22.848066,-21.82752,-20.588279,-19.431041


In [8]:
res_df.mean()

0.125     -26.701554
0.250     -26.723394
0.500     -26.787232
1.000     -26.927354
2.000     -27.144980
4.000     -27.378695
8.000     -27.559761
16.000    -27.684748
32.000    -27.828028
64.000    -28.112686
128.000   -28.635355
dtype: float64

It appears that a small $\alpha$ leads to smaller MSE. Thus I will choose $\alpha = 0.125$. Let's now see how ridge regression with this chosen $\alpha$ performs on the test set.

In [None]:
ridge3 = Ridge(alpha=0.125)
ridge3.fit(data_train, price_train)

testpredprice = ridge3.predict(data_test)
mean_squared_error(price_test, testpredprice)

This is higher than the corresponding MSE obtained via OLS; ridge regression does not appear to be a superior choice.