# Ridge Regression

This is a method we can use to fit a regression model when multicollinearity is present in the data

Least Squared Regression tries to minimize the sum of squared residuals (RSS)

Conversely, ridge regression seeks to minimize RSS + shrinkage penalty

In ridge regression, we select a value of lambda that produces lowest possible test mean squared error

In [1]:
import pandas as pd
from numpy import arange
from sklearn.linear_model import Ridge
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import RepeatedKFold

In [2]:
#define URL where data is located
url = "https://raw.githubusercontent.com/Statology/Python-Guides/main/mtcars.csv"

#read in data
data_full = pd.read_csv(url)

#select subset of data
data = data_full[["mpg", "wt", "drat", "qsec", "hp"]]

#view first six rows of data
data[0:6]

Unnamed: 0,mpg,wt,drat,qsec,hp
0,21.0,2.62,3.9,16.46,110
1,21.0,2.875,3.9,17.02,110
2,22.8,2.32,3.85,18.61,93
3,21.4,3.215,3.08,19.44,110
4,18.7,3.44,3.15,17.02,175
5,18.1,3.46,2.76,20.22,105


In [8]:
#define predictor and response variables
X = data[["mpg", "wt", "drat", "qsec"]]
y = data["hp"]

#define cross-validation method to evaluate model
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

#define model
model = RidgeCV(alphas=arange(0.01, 1, 0.01), cv=cv, scoring='neg_mean_absolute_error')

#fit model
model.fit(X, y)

In [9]:
#define new observation
new = [24, 2.5, 3.5, 18.5]

#predict hp value using ridge regression model
model.predict([new])



array([104.16398018])

In [10]:
model.alpha_

0.99

In [11]:
# define new observation
new = [24,2.5,3.5,18.5]

# predict hp value using ridge regression regression model
model.predict([new])



array([104.16398018])