# **Ridge regression**

## **Written by:** Aarish Asif Khan

## **Date:** 24 February 2024

Ridge regression, also known as L2 regularization, is a linear regression technique used to mitigate the problem of multicollinearity (high correlation between predictors) and overfitting in predictive modeling.

In standard linear regression, the model seeks to minimize the sum of squared residuals between the observed and predicted values. However, when the dataset has multicollinearity (where predictors are highly correlated), the regression coefficients can become highly sensitive to small changes in the data, leading to overfitting.

In [1]:
# Importing necessary libraries
import numpy as np

from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [2]:
# Generating example data
np.random.seed(0)
X = np.random.rand(100, 5)  # 100 samples, 5 features
y = 3 * X[:,0] + 2 * X[:,1] - 5 * X[:,2] + np.random.randn(100)  # linear combination with noise

In [3]:
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
# Standardizing the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [6]:
# Instantiate Ridge Regression model
ridge_reg = Ridge(alpha=1.0)  # Alpha = 1.0 (regularization strength)

# Fit the model
ridge_reg.fit(X_train_scaled, y_train)

In [7]:
# Predict on test data
y_pred = ridge_reg.predict(X_test_scaled)

In [8]:
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Mean Squared Error: 0.7973839258412255


In [9]:
coefficients = ridge_reg.coef_
intercept = ridge_reg.intercept_
print("Coefficients:", coefficients)
print("Intercept:", intercept)


Coefficients: [ 0.82963334  0.48726674 -1.33920137  0.08776647 -0.07979223]
Intercept: -0.09720125320157909


# **Comparision between the Ridge regression model and Linear regression**

Ridge Regression and Linear Regression are both techniques used in the field of regression analysis, but they differ in how they handle the problem of overfitting and multicollinearity.

# **Linear Regression:**

* In Linear Regression, the goal is to find the best-fitting linear relationship between the independent variables (features) and the dependent variable (target).

* It aims to minimize the sum of squared differences between the observed and predicted values.

* Linear Regression does not impose any constraints on the coefficients of the features.

* It is susceptible to overfitting when the number of features is large relative to the number of observations, or when the features are highly correlated.

# **Ridge Regression:**

* Ridge Regression is a regularization technique that adds a penalty term to the linear regression objective function.

* This penalty term (L2 norm) is proportional to the square of the magnitude of the coefficients.

* The regularization parameter (alpha) controls the strength of the penalty term.

* Ridge Regression shrinks the coefficients of the features, reducing the effect of multicollinearity and preventing overfitting.

* It tends to produce more stable and robust models compared to linear regression when multicollinearity is present in the dataset.

In [11]:
# Importing necessary libraries
import pandas as pd
import seaborn as sns 
import matplotlib.pyplot as plt 

from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, LinearRegression
from sklearn.metrics import accuracy_score

In [12]:
# Load the Titanic dataset
df = sns.load_dataset('titanic')

In [16]:
# Data preprocessing
df.dropna(inplace=True)
 
X = df[['age', 'fare', 'pclass']] 
y = df['survived']

In [17]:
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [18]:
# Ridge Regression
ridge_reg = Ridge(alpha=1.0)  # Instantiate Ridge Regression model with alpha=1.0
ridge_reg.fit(X_train, y_train)  # Fit the model
ridge_pred = ridge_reg.predict(X_test)  # Predict on test set
ridge_accuracy = accuracy_score(y_test, ridge_pred.round())  # Calculate accuracy
print("Ridge Regression Accuracy:", ridge_accuracy)

Ridge Regression Accuracy: 0.7027027027027027


In [19]:
# Linear Regression
linear_reg = LinearRegression()  # Instantiate Linear Regression model
linear_reg.fit(X_train, y_train)  # Fit the model
linear_pred = linear_reg.predict(X_test)  # Predict on test set
linear_accuracy = accuracy_score(y_test, linear_pred.round())  # Calculate accuracy
print("Linear Regression Accuracy:", linear_accuracy)


Linear Regression Accuracy: 0.7027027027027027
