# Ridge and Lasso Regularization Examples


In machine learning, regularization is used to prevent overfitting — when a model memorizes the training data instead of learning patterns.

It does this by penalizing complex models (especially those with large weights).


**L2 Regularization (Ridge)**

***Formula: Loss = Original Loss + λ * (w₁² + w₂² + ... + wₙ²)***

1. Adds the sum of squared weights to the loss.

2. Keeps weights small, but not exactly zero.

3. Encourages smoothness and distributed learning.



**L1 Regularization (Lasso)**

***Formula: Loss = Original Loss + λ * (|w₁| + |w₂| + ... + |wₙ|)***

1. Adds the sum of absolute weights to the loss.

2. Drives some weights exactly to zero.

3. Encourages sparsity (feature selection).

In [5]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [2]:
# 1. Create a synthetic dataset
np.random.seed(42)
n_samples = 200

# Meaningful features
area = np.random.normal(2000, 500, n_samples)           # Area in sqft
bedrooms = np.random.randint(1, 5, n_samples)           # Number of bedrooms
bathrooms = np.random.randint(1, 4, n_samples)          # Number of bathrooms
age = np.random.randint(0, 30, n_samples)               # Age of house
distance = np.random.normal(10, 5, n_samples)           # Distance to city in km
garage = np.random.randint(0, 3, n_samples)             # Garage size (0–2 cars)

# Noise features (unrelated to target)
noise1 = np.random.randn(n_samples)
noise2 = np.random.randn(n_samples)

# Target variable: Price (in $1000s)
price = (area * 0.3 + bedrooms * 10 + bathrooms * 15 - age * 2 - distance * 3 + garage * 20 
         + np.random.normal(0, 10000, n_samples))

# 2. Combine into DataFrame
df = pd.DataFrame({
    'Area': area,
    'Bedrooms': bedrooms,
    'Bathrooms': bathrooms,
    'Age': age,
    'Distance_to_City': distance,
    'Garage_Size': garage,
    'Noise1': noise1,
    'Noise2': noise2,
    'Price': price
})

In [3]:
# 3. Split into features and target
X = df.drop('Price', axis=1)
y = df['Price']

In [4]:
# 4. Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [6]:
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
y_pred_lin = lin_reg.predict(X_test)

In [7]:
ridge = Ridge(alpha=100)
ridge.fit(X_train, y_train)
y_pred_ridge = ridge.predict(X_test)

In [8]:
lasso = Lasso(alpha=100)
lasso.fit(X_train, y_train)
y_pred_lasso = lasso.predict(X_test)

In [12]:
print("Mean Squared Error:")
print(f"Linear Regression: {mean_squared_error(y_test, y_pred_lin):.2f}")
print(f"Ridge Regression: {mean_squared_error(y_test, y_pred_ridge):.2f}")
print(f"Lasso Regression: {mean_squared_error(y_test, y_pred_lasso):.2f}")

Mean Squared Error:
Linear Regression: 113779326.71
Ridge Regression: 111821824.39
Lasso Regression: 112947910.78


In [10]:
coeffs = pd.DataFrame({
    'Feature': X.columns,
    'Linear Coef': np.round(lin_reg.coef_, 2),
    'Ridge Coef': np.round(ridge.coef_, 2),
    'Lasso Coef': np.round(lasso.coef_, 2)
})

In [11]:
coeffs

Unnamed: 0,Feature,Linear Coef,Ridge Coef,Lasso Coef
0,Area,1.1,1.13,1.12
1,Bedrooms,1721.43,1108.53,1649.03
2,Bathrooms,-687.66,-324.26,-534.17
3,Age,-139.19,-140.89,-138.53
4,Distance_to_City,-155.49,-132.69,-146.07
5,Garage_Size,-720.9,-371.06,-584.54
6,Noise1,-712.28,-451.75,-627.83
7,Noise2,-693.18,-463.49,-613.06
