#### About

> Ridge Regression

It's a regularized linear regression technique.

Linear regression is a widely used method for fitting a linear model to a dataset with one or more input features. However, in some cases, linear regression can suffer from overfitting or instability when the dataset has multicollinearity or a large number of features. Ridge regression, also known as L2 regularization, is a technique that addresses these issues by introducing a penalty term to the linear regression objective function, resulting in a modified regression equation that provides a more stable and robust model.

> The Ridge Regression Objective Function

The Ridge Regression objective function is a modified version of the ordinary least squares (OLS) objective function used in linear regression. The Ridge objective function is given by:

J(beta) = RSS(beta) + alpha * ||beta||2^2

where
- J(beta) is the ridge objectve function
- RSS(beta) is the residual sum of squares which measures the squared differences between the predicted and actual values.
- beta is the vector of regression coefficients.
- alpha is thr regularisation hyperparam also known as ridge hyperparam
- ||beta||2^2 is the l2 norm of the regression coeeficients squared.

The Ridge objective function consists of two terms: the RSS term, which measures the fit of the model to the training data, and the penalty term, which is proportional to the square of the L2 norm of the regression coefficients. The penalty term introduces a bias towards smaller coefficient values, allowing Ridge regression to shrink the coefficients and reduce their variance, making the model more stable and less prone to overfitting.

> Closed form solution of Ridge Regression

The Ridge regression can be solved using a closed-form solution similar to ordinary least squares. The closed-form solution for the Ridge regression coefficients is given by:


beta_ridge = (X^T * X + alpha * I)^(-1) * X^T * y

where
- beta_ridge is the vector of ridge regression coeff.
- X is the matrix of input features.
- y is the vector of target values.
- I is the identity matrix
- alpha is the regularisation param

The Ridge regression coefficients can be calculated using this closed-form solution, which involves matrix operations and inversion. The Ridge regularization term α controls the strength of the penalty, with larger values of α resulting in larger shrinkage of the coefficients.





In [2]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [3]:
from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [5]:
# Initialize the Ridge regression model with alpha = 1.0
ridge = Ridge(alpha=1.0)


In [6]:
# Fit the Ridge model to the training data
ridge.fit(X_train, y_train)

In [7]:
# Predict on the test data
y_pred = ridge.predict(X_test)


In [8]:
# Calculate mean squared error
mse = mean_squared_error(y_test, y_pred)

In [9]:
print("Mean Squared Error:", mse)


Mean Squared Error: 3077.41593882723


> Use cases of ridge regression include

1. Multicollinearity - When the dataset has high multicollinearity i.e when the input features are highly correlated with each other. Ridge regressions can be used to mitigate the issue by shrinking the coeeff and reducing their variance.

2. Overfitting - It can prevent overfitting

3. It's robust to outliers

4. It can be used for feature selection.

5. It can be used for providing more interpretable ml models compared to the complex ones as it introduces a penalty term that encourages smaller coeff and reduces the task of overfitting

