# Part 2.3: Supervised Learning - Regularization

Regularization is a technique used to combat overfitting in machine learning models. It works by adding a penalty term to the model's cost function, which discourages the model from learning overly complex patterns or having excessively large coefficients.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Generate some data
np.random.seed(42)
X = np.random.rand(50, 10) # 50 samples, 10 features
y = 2 * X[:, 0] + 3 * X[:, 1] - 1.5 * X[:, 2] + np.random.randn(50)

# It's important to scale data before regularization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

### Ridge Regression (L2 Regularization)
Ridge adds a penalty equal to the **square of the magnitude** of the coefficients. It shrinks large coefficients but does not set them to zero.

In [2]:
# The 'alpha' parameter controls the strength of the regularization
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

print("Ridge Coefficients:")
print(ridge.coef_)

Ridge Coefficients:
[ 0.86416937  1.10175701 -0.46152523  0.06023023 -0.121519    0.01500535
  0.08982306 -0.28407714  0.21526448  0.14893289]


### Lasso Regression (L1 Regularization)
Lasso (Least Absolute Shrinkage and Selection Operator) adds a penalty equal to the **absolute value of the magnitude** of the coefficients. This can shrink some coefficients to exactly zero, effectively performing feature selection.

In [3]:
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

print("Lasso Coefficients:")
print(lasso.coef_)
print("\nNotice how some coefficients are exactly zero.")

Lasso Coefficients:
[ 0.80092034  0.95719    -0.32510564  0.         -0.          0.
  0.         -0.17188662  0.22815463  0.        ]

Notice how some coefficients are exactly zero.


### ElasticNet
ElasticNet is a middle ground between Ridge and Lasso. It combines both L1 and L2 penalties. The `l1_ratio` parameter controls the mix (e.g., `l1_ratio=0.5` means 50% L1 and 50% L2).

In [4]:
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net.fit(X_train, y_train)

print("ElasticNet Coefficients:")
print(elastic_net.coef_)

ElasticNet Coefficients:
[ 0.79506313  0.96903899 -0.37541863  0.         -0.0453539   0.
  0.01377875 -0.21202873  0.22770953  0.0721759 ]
