## Ridge (L2 Regulization) and Lasso (L1 Regulization)

## When and Why to use it
### A. Ridge Regression (L2 Regularization):
- When Features are Highly Correlated (Multicollinearity): Ridge helps stabilize the model by reducing the sensitivity to small changes in the data when predictor variables are highly correlated.
- When You Want Small Coefficients, Not Zero: Ridge shrinks coefficients toward zero but doesn’t eliminate them entirely. Use this when all features are potentially useful, and you want to retain all predictors in the model.
- When Overfitting Occurs: If your model performs well on training data but poorly on test data, Ridge can prevent overfitting by penalizing large coefficients.
When Interpretability is Less Critical: Ridge keeps all features, which might not simplify the model for interpretation.

#### B. Lasso Regression (L1 Regularization):
- When You Need Feature Selection: Lasso forces some coefficients to exactly zero, effectively eliminating irrelevant features. Use this when you suspect that many features are irrelevant or redundant.
- When You Want a Sparse Model: Lasso produces simpler, more interpretable models by keeping only the most important features.
- When the Dataset is High Dimensional: For datasets with a large number of predictors, Lasso can help reduce dimensionality.

In [1]:
import pandas as pd
import numpy as np

## Code Implementation of Ridge Regression from Scratch

Cost Function:
$$ 

J 
Ridge
​
 (β)= 
2n
1
​
  
i=1
∑
n
​
 (y 
i
​
 −X 
i
​
 ⋅β) 
2
 +λ 
j=1
∑
p
​
 β 
j
2
​

$$



B coeficient
$$ 
β=(X 
T
 X+λI) 
−1
 X 
T
 y
$$

In [2]:
class Ridge:
    def __init__(self, alpha=1.0) -> None:
        self.alpha = alpha 
        self.coeff_ = None
    
    def fit(self, X, y):
        n_samples, n_features = X.shape
        I = np.eye(n_features)
        self.coeff_ = np.linalg.inv(X.T @ X + self.alpha * I) @ X.T @ y
    
    def predict(self, X):
        return X @ self.coeff_

In [3]:
from sklearn.datasets import fetch_california_housing, load_diabetes
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt 
import pandas as pd
from sklearn.linear_model import LinearRegression,Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler, StandardScaler


data = load_diabetes()

df = pd.DataFrame(data=data.data, columns=data.feature_names)
df["target"] = data.target


scaler = StandardScaler()

X = df.drop(["sex", "age", "target", "s1", "s2"], axis=1).to_numpy()
y = df["target"].to_numpy()


X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

models = [LinearRegression(), Ridge(alpha=0.1), Lasso(alpha=0.1)]

 
for model in models:
    name = model.__class__.__name__

    print("Model: ", name)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_pred_tr = model.predict(X_train)

    rsc = r2_score(y_test, y_pred)
    mse = mean_squared_error(y_test, y_pred)

    rsct = r2_score(y_train, y_pred_tr)
    mset = mean_squared_error(y_train, y_pred_tr)

    print("R2 Score: ", rsc)
    print("MSE: ", mse)
    print("R2 Score Train: ", rsct)
    print("MSE Train: ", mset)
    print("-----------------------")

Model:  LinearRegression
R2 Score:  0.44678053230202086
MSE:  2931.0406900974717
R2 Score Train:  0.5015816339880688
MSE Train:  3028.588368869938
-----------------------
Model:  Ridge
R2 Score:  -4.2195061517951595
MSE:  27653.73564452668
R2 Score Train:  -3.261567843967107
MSE Train:  25894.98237928086
-----------------------
Model:  Lasso
R2 Score:  0.44758980284303596
MSE:  2926.7530519655697
R2 Score Train:  0.5015566256985108
MSE Train:  3028.740329190916
-----------------------


In [4]:
df

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.038076,0.050680,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204,75.0
2,0.085299,0.050680,0.044451,-0.005670,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.025930,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641,135.0
...,...,...,...,...,...,...,...,...,...,...,...
437,0.041708,0.050680,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207,178.0
438,-0.005515,0.050680,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018114,0.044485,104.0
439,0.041708,0.050680,-0.015906,0.017293,-0.037344,-0.013840,-0.024993,-0.011080,-0.046883,0.015491,132.0
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.026560,0.044529,-0.025930,220.0


In [48]:
std = df["target"].std()
var = df["target"].var()

std, var

(77.09300453299109, 5943.331347923785)

In [49]:
df.corr()["target"]

age       0.187889
sex       0.043062
bmi       0.586450
bp        0.441482
s1        0.212022
s2        0.174054
s3       -0.394789
s4        0.430453
s5        0.565883
s6        0.382483
target    1.000000
Name: target, dtype: float64

In [None]:
to_drop = ["sex", "age", "s1"]