# Employing Ridge Regression
*Curtis Miller*

As mentioned before, OLS has a propensity to find spurious fits. **Ridge regression** helps prevent this by introducing regularization parameters, such as $\alpha$. Larger $\alpha$ implies more regularization and less overfitting.

Ridge regression is implemented via the `Ridge` object provided in **scikit-learn**.

We will work with the Boston housing price dataset. In an earlier video we saw that removing some features may lead to better models. We will remove those features here as well.

In [1]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

In [2]:
boston_obj = load_boston()
data_train, data_test, price_train, price_test = train_test_split(boston_obj.data, boston_obj.target)
data_train = np.delete(data_train, [2, 6], axis=1)
data_test = np.delete(data_test, [2, 6], axis=1)

data_train[:5, :]

array([[1.27346e+00, 0.00000e+00, 1.00000e+00, 6.05000e-01, 6.25000e+00,
        1.79840e+00, 5.00000e+00, 4.03000e+02, 1.47000e+01, 3.38920e+02,
        5.50000e+00],
       [3.23700e-02, 0.00000e+00, 0.00000e+00, 4.58000e-01, 6.99800e+00,
        6.06220e+00, 3.00000e+00, 2.22000e+02, 1.87000e+01, 3.94630e+02,
        2.94000e+00],
       [8.24400e-02, 3.00000e+01, 0.00000e+00, 4.28000e-01, 6.48100e+00,
        6.18990e+00, 6.00000e+00, 3.00000e+02, 1.66000e+01, 3.79410e+02,
        6.36000e+00],
       [7.15100e-02, 0.00000e+00, 0.00000e+00, 4.49000e-01, 6.12100e+00,
        3.74760e+00, 3.00000e+00, 2.47000e+02, 1.85000e+01, 3.95150e+02,
        8.44000e+00],
       [1.10690e-01, 0.00000e+00, 1.00000e+00, 5.50000e-01, 5.95100e+00,
        2.88930e+00, 5.00000e+00, 2.76000e+02, 1.64000e+01, 3.96900e+02,
        1.79200e+01]])

## Fitting With Ridge Regression

Ridge regression is implemented in the `Ridge` object supplied by **scikit-learn**.

In [3]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score

In [4]:
ridge1 = Ridge(alpha=1)    # alpha is a hyperparameter controlling regularization
ridge1.fit(data_train, price_train)
ridge1.predict([[    # An example prediction
    1,      # Per capita crime rate
    25,     # Proportion of land zoned for large homes
    1,      # Tract bounds the Charles River
    0.3,    # NOX concentration
    10,     # Average number of rooms per dwelling
    10,     # Weighted distance to employment centers
    3,      # Index for highway accessibility
    400,    # Tax rate
    15,     # Pupil/teacher ratio
    200,    # Index for number of blacks
    5       # % lower status of population
]])

array([40.3079661])

In [5]:
predprice = ridge1.predict(data_train)
mean_squared_error(price_train, predprice)

21.351348759006743

We can use cross-validation to pick a good value for $\alpha$. I will use `cross_val_score()` for this work.

In [6]:
import pandas as pd
from pandas import DataFrame

In [7]:
alpha = [.125, .25, .5, 1, 2, 4, 8, 16, 32, 64, 128]    # Candidate alphas
res = dict()

for a in alpha:
    ridge2 = Ridge(alpha=a)
    res[a] = cross_val_score(ridge2, data_train, price_train, scoring='neg_mean_squared_error', cv = 10)

res_df = DataFrame(res)

res_df

Unnamed: 0,0.125,0.25,0.5,1.0,2.0,4.0,8.0,16.0,32.0,64.0,128.0
0,-21.136397,-21.174217,-21.239738,-21.335265,-21.439335,-21.512837,-21.543107,-21.581599,-21.734692,-22.127897,-22.87087
1,-35.295151,-35.369547,-35.482326,-35.61342,-35.691366,-35.608797,-35.290118,-34.770318,-34.242058,-33.991896,-34.178383
2,-47.688921,-47.722503,-47.790422,-47.916107,-48.12048,-48.419823,-48.838968,-49.409453,-50.166443,-51.200846,-52.684054
3,-14.23417,-14.217945,-14.196529,-14.173981,-14.152665,-14.130729,-14.124144,-14.205387,-14.511105,-15.171413,-16.178626
4,-36.5014,-36.581716,-36.702679,-36.844673,-36.940679,-36.893613,-36.636722,-36.197699,-35.740667,-35.559596,-35.988827
5,-21.158588,-21.178485,-21.21977,-21.292714,-21.395285,-21.523307,-21.728595,-22.172775,-23.078143,-24.5412,-26.347936
6,-20.030027,-19.956883,-19.857809,-19.750478,-19.651889,-19.548398,-19.40462,-19.218301,-19.107265,-19.356012,-20.194475
7,-10.969761,-11.003824,-11.079949,-11.223218,-11.436194,-11.694832,-11.988998,-12.348718,-12.836282,-13.52275,-14.393156
8,-12.562399,-12.441207,-12.280951,-12.123326,-12.029545,-12.037861,-12.165042,-12.444006,-12.934736,-13.673027,-14.549999
9,-13.282412,-13.18809,-13.066362,-12.946314,-12.854185,-12.780166,-12.707226,-12.678688,-12.840152,-13.426312,-14.650781


In [8]:
res_df.mean()

0.125     -23.285923
0.250     -23.283442
0.500     -23.291653
1.000     -23.321950
2.000     -23.371162
4.000     -23.415037
8.000     -23.442754
16.000    -23.502694
32.000    -23.719154
64.000    -24.257095
128.000   -25.203711
dtype: float64

It appears that a small $\alpha$ leads to smaller MSE. Thus I will choose $\alpha = 0.125$. Let's now see how ridge regression with this chosen $\alpha$ performs on the test set.

In [9]:
ridge3 = Ridge(alpha=0.125)
ridge3.fit(data_train, price_train)

testpredprice = ridge3.predict(data_test)
mean_squared_error(price_test, testpredprice)

25.896595152731784

This is higher than the corresponding MSE obtained via OLS; ridge regression does not appear to be a superior choice.