## Ridge Regression — Matrix Form

Ridge Regression adds a regularization term to the cost function of ordinary least squares to penalize large coefficients, which helps prevent overfitting and handle multicollinearity.

### Matrix Form

$$y = X\beta$$

where:

- $ y $: vector of outputs $((n \times 1))$
- $X$: design matrix including all predictors (and usually a column of 1s for the intercept) $(n \times (p + 1))$
- $\beta$: vector of coefficients $((p + 1) \times 1)$


### Formula for Ridge Coefficients

$$\beta = (X^T X + \lambda I)^{-1} X^T y$$

where:

- $I$ is the identity matrix of size $(p + 1) \times (p + 1)$
- The addition of $ \lambda I $ ensures that $ X^T X + \lambda I $ is invertible even when $ X^T X $ is singular


### Interpretation

- When $ \lambda = 0 $: Ridge regression becomes **Ordinary Least Squares (OLS)**

$$\beta = (X^T X)^{-1} X^T y$$

- When $ \lambda \to \infty $: coefficients shrink toward **zero**

### Components of \( \beta \)
$$
\beta =
\begin{bmatrix}
\beta_0 \\[4pt]
\beta_1 \\[2pt]
\beta_2 \\[2pt]
\vdots \\[2pt]
\beta_p
\end{bmatrix}
$$

where:

- $ \beta_0 $: intercept  
- $ \beta_1, \beta_2, \dots, \beta_p $: slope coefficients (associated with each predictor)

In [18]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

from sklearn.datasets import load_diabetes

In [19]:
X, y = load_diabetes(return_X_y=True)

In [20]:
X.shape

(442, 10)

In [21]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [22]:
class RidgeRegression:

    def __init__(self, alpha=0.01): # alpha is equal to lambda
        self.alpha = alpha
        self.intercept_ = None
        self.coef_ = None

    def fit(self, X_train, y_train):
        X_train = np.insert(X_train, 0, 1, axis=1)
        I = np.identity(X_train.shape[1])
        I[0][0] = 0 
        betas = np.linalg.inv(np.dot(X_train.T, X_train) + self.alpha * I).dot(X_train.T).dot(y_train)
        self.intercept_ = betas[0]
        self.coef_ = betas[1:]

        print("Intercept: ", self.intercept_)
        print("====================================")
        print("Coefficient's: ", self.coef_)

    def predict(self, X_test):
        return np.dot(X_test, self.coef_) + self.intercept_

In [23]:
ridge = RidgeRegression()

ridge.fit(X_train, y_train)

Intercept:  151.33659663172543
Coefficient's:  [  40.69342168 -237.00801965  546.16179161  341.80931747 -430.14629956
  129.902301    -60.46081734  203.99084244  541.09802519   55.48255303]


In [24]:
y_pred = ridge.predict(X_test)
y_pred

array([140.48932729, 180.39358466, 138.26095011, 292.70472351,
       122.54953663,  93.61127853, 256.94944065, 185.46640503,
        86.4960167 , 110.59467587,  95.04571587, 164.19550268,
        60.59798796, 205.82695673,  99.72760443, 131.91526636,
       220.91412088, 247.87634694, 195.84576355, 215.78308828,
       206.82609175,  89.01546302,  72.05374047, 188.47495433,
       155.71143723, 161.25320029, 189.08097216, 178.04173865,
        49.65268248, 110.50254797, 178.39994134,  90.08024148,
       132.14592247, 181.98946205, 173.37370782, 190.81087767,
       123.38010922, 118.90948131, 146.69459204,  60.67799313,
        74.18510938, 108.16651262, 162.96843997, 151.55290246,
       173.76202246,  64.5447612 ,  76.57353392, 109.83957197,
        56.57149752, 163.18082268, 155.2330795 ,  64.94611225,
       110.68142707, 108.69309211, 172.0029122 , 157.94954707,
        94.8588743 , 208.43411608, 118.81317959,  72.11719648,
       185.80485787, 203.47916991, 141.32147862, 105.78

In [25]:
r2_score(y_test, y_pred)

0.4559819504579107