# Topics Covers in MLR

1. Ridge Regression
2. Lasso Regression
3. ElasticNet Regression

# A. Ridge Regression

Great! Let's dive into **Ridge Regression** — a powerful linear regression technique that helps prevent overfitting by adding a **penalty** to the size of the coefficients.

---

## 🔍 What is Ridge Regression?

**Ridge Regression** is a type of **regularized linear regression**. It modifies the **Ordinary Least Squares (OLS)** cost function by adding a **penalty term** based on the **L2 norm** (i.e., square of the magnitude of coefficients).

---

## 📐 Cost Function

Given data $ X \in \mathbb{R}^{n \times p} $ and target $ y \in \mathbb{R}^n $, the cost function minimized by Ridge is:

$
J(\beta) = \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} \beta_j^2
$

Or in vector form:

$
J(\boldsymbol{\beta}) = \| y - X\boldsymbol{\beta} \|^2 + \lambda \|\boldsymbol{\beta}\|^2
$

- $ \lambda \geq 0 $: Regularization strength
- $ \boldsymbol{\beta} $: Coefficient vector

---

## 🤔 Intuition

- If $ \lambda = 0 $: Ridge becomes standard **Linear Regression**.
- As $ \lambda \to \infty $: Coefficients shrink closer to zero.
- Helps deal with **multicollinearity** and **overfitting**.

---

## 🧠 Mathematical Solution

The **closed-form solution** for Ridge Regression is:

$
\boldsymbol{\beta}_{\text{ridge}} = (X^TX + \lambda I)^{-1}X^Ty
$

Here:
- $ I $: Identity matrix
- The term $ \lambda I $ ensures that $ X^TX + \lambda I $ is always invertible (even when $ X^TX $ is singular)

---

## 🧪 In Scikit-learn

```python
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# Example: create a pipeline
ridge_model = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge(alpha=1.0))  # alpha is lambda
])

ridge_model.fit(X_train, y_train)
```

- Use `alpha` to control regularization strength.
- Try different values using **cross-validation**.

---

## 🔍 When to Use Ridge?

✅ High multicollinearity (features are highly correlated)  
✅ You want to **shrink** coefficients, but **not zero them out**  
❌ You don't need **feature selection** (use **Lasso** instead if you do)

In [2]:
# import necessary libraries 
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns

%matplotlib inline

In [None]:
# import data
# https://www.kaggle.com/datasets/abhishek14398/50startups
data = pd.read_csv('../Data/50_Startups_dataset.csv', usecols=['R&D Spend',	
                                                               'Administration',	'Marketing Spend',	'State'	, 'Profit'])
data.head()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.3,136897.9,471784.2,New York,192261.93
1,162597.8,151377.69,443898.63,California,191792.16
2,153441.61,101145.65,407934.64,Florida,191050.49
3,144372.51,118671.95,383199.72,New York,182902.09
4,142107.44,91391.87,366168.52,Florida,166188.04


In [31]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data.drop('Profit', axis=1),
                                                    data['Profit'],
                                                    test_size=0.2,
                                                    shuffle=True,
                                                    random_state=42)

# Define numeric and categorical columns
numeric_cols = X_train.select_dtypes(include=['number']).columns.tolist()
categorical_cols = ['State']  # Example categorical column

# Data Preprocessing (Scaling & Encoding)
preprocessor = ColumnTransformer(transformers=
                                 [('num_scalar', StandardScaler(), numeric_cols),
                                  ('encoder', OneHotEncoder(drop='first'), categorical_cols)],
                                 remainder='passthrough')

# make the pipeline 
# RANSAC will fit Linear Regression models repeatedly on random subsets to find a robust model that ignores outliers.

regressor = Ridge(alpha=1.0)
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('regressor', regressor )
])

# fit the pipeline 
pipeline.fit(X_train, y_train)

In [32]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

# predict
y_predict = pipeline.predict(X_test)

# Evaluate
r2 = r2_score(y_test, y_predict)
mse = mean_squared_error(y_test, y_predict)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_predict)

print(f"R² Score: {r2:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")

R² Score: 0.8954
MSE: 84692813.0068
RMSE: 9202.8698
MAE: 7408.0195


In [33]:
print(f'Intercept:{pipeline.named_steps['regressor'].intercept_}')
print(f'Coeffecients:{pipeline.named_steps['regressor'].coef_}')

Intercept:115404.39080519175
Coeffecients:[35974.79310181 -1345.26415522  4931.37152459   615.77442059
    98.18045416]


In [34]:
# cross validation
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import RidgeCV

# Create pipeline with RidgeCV
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('ridge_cv', RidgeCV(alphas=[0.01, 0.1, 1.0, 10.0, 100.0], scoring='neg_mean_squared_error', cv=5))
])

# Fit the pipeline
pipeline.fit(X_train, y_train)

In [35]:
# Predictions
y_pred = pipeline.predict(X_test)

# Evaluation
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)

# Print results
print("Best alpha:", pipeline.named_steps['ridge_cv'].alpha_)
print(f"R² Score: {r2:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")

Best alpha: 0.1
R² Score: 0.8986
MSE: 82153032.5777
RMSE: 9063.8310
MAE: 6981.1391


# B. Lasso Regression

Lasso Regression is a linear regression technique with **L1 regularization**, which can shrink some coefficients to **exactly zero**, making it useful for **feature selection** as well as prediction.

---

## 📌 Key Concepts

- **L1 Regularization** adds a penalty equal to the absolute value of the coefficients.
- Objective:  
  $
  \text{Minimize} \quad \frac{1}{2n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \alpha \sum_{j=1}^p |\beta_j|
  $
- The **α (alpha)** parameter controls the strength of the regularization.
  - High α → more coefficients set to 0
  - Low α → behaves more like simple linear regression

---

## ✅ When to Use Lasso:
- When you suspect many features are irrelevant or redundant.
- When you want a **sparse model** (some coefficients = 0).
- Great for **automatic feature selection**.

In [36]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data.drop('Profit', axis=1),
                                                    data['Profit'],
                                                    test_size=0.2,
                                                    shuffle=True,
                                                    random_state=42)

# Define numeric and categorical columns
numeric_cols = X_train.select_dtypes(include=['number']).columns.tolist()
categorical_cols = ['State']  # Example categorical column

# Data Preprocessing (Scaling & Encoding)
preprocessor = ColumnTransformer(transformers=
                                 [('num_scalar', StandardScaler(), numeric_cols),
                                  ('encoder', OneHotEncoder(drop='first'), categorical_cols)],
                                 remainder='passthrough')

# make the pipeline 
# RANSAC will fit Linear Regression models repeatedly on random subsets to find a robust model that ignores outliers.

regressor = Lasso(alpha=1.0)
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('regressor', regressor )
])

# fit the pipeline 
pipeline.fit(X_train, y_train)

In [37]:
print(f'Intercept:{pipeline.named_steps['regressor'].intercept_}')
print(f'Coeffecients:{pipeline.named_steps['regressor'].coef_}')

Intercept:115325.87012672475
Coeffecients:[38104.46384583 -1864.38169567  3384.47355808   931.28678079
     0.        ]


In [38]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

# predict
y_predict = pipeline.predict(X_test)

# Evaluate
r2 = r2_score(y_test, y_predict)
mse = mean_squared_error(y_test, y_predict)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_predict)

print(f"R² Score: {r2:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")

R² Score: 0.8988
MSE: 81988410.2207
RMSE: 9054.7452
MAE: 6960.9325


In [None]:
# using lassocv

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Lasso, LassoCV
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data.drop('Profit', axis=1),
                                                    data['Profit'],
                                                    test_size=0.2,
                                                    shuffle=True,
                                                    random_state=42)

# Define numeric and categorical columns
numeric_cols = X_train.select_dtypes(include=['number']).columns.tolist()
categorical_cols = ['State']  # Example categorical column

# Data Preprocessing (Scaling & Encoding)
preprocessor = ColumnTransformer(transformers=
                                 [('num_scalar', StandardScaler(), numeric_cols),
                                  ('encoder', OneHotEncoder(drop='first'), categorical_cols)],
                                 remainder='passthrough')

# make the pipeline 

regressor = LassoCV(alphas=np.logspace(-4, 1, 50), cv=5)
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('lasso_cv', regressor )
])

# fit the pipeline 
pipeline.fit(X_train, y_train)

In [40]:
# Predict
y_pred = pipeline.predict(X_test)

# Evaluation
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)

# Output
print("Best alpha:", pipeline.named_steps['lasso_cv'].alpha_)
print(f"R² Score: {r2:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")

Best alpha: 10.0
R² Score: 0.8990
MSE: 81805133.4319
RMSE: 9044.6190
MAE: 6957.7885


In [41]:
print(f'Intercept:{pipeline.named_steps['lasso_cv'].intercept_}')
print(f'Coeffecients:{pipeline.named_steps['lasso_cv'].coef_}')

Intercept:115339.66896437443
Coeffecients:[38092.16304372 -1853.31949743  3389.5067514    891.86153036
    -0.        ]


# C. ElasticNet Regression

### 🔗 ElasticNet Regression – The Best of Both Worlds (Ridge + Lasso)

**ElasticNet** is a regularized regression method that combines both **L1** (Lasso) and **L2** (Ridge) penalties. It is especially useful when:

- You have **many features** and some of them are **correlated**.
- You want the **sparsity** of Lasso but with the **stability** of Ridge.

---

## 🧠 Mathematical Objective

ElasticNet solves the following optimization problem:

$
\text{Minimize: } \frac{1}{2n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \alpha \left[ \rho \sum_{j=1}^p |\beta_j| + \frac{1 - \rho}{2} \sum_{j=1}^p \beta_j^2 \right]
$

Where:
- $ \alpha $ controls **overall regularization strength**.
- $ \rho \in [0, 1] $ is the **L1 ratio**:
  - $ \rho = 1 $: pure Lasso (L1)
  - $ \rho = 0 $: pure Ridge (L2)
  - In between: ElasticNet

---

## 🎯 Why Use ElasticNet?

| Feature                   | Ridge       | Lasso        | ElasticNet |
|--------------------------|-------------|--------------|------------|
| Shrinks coefficients     | ✅           | ✅            | ✅          |
| Can set some to zero     | ❌           | ✅            | ✅          |
| Handles multicollinearity| ✅           | ❌ (unstable) | ✅          |
| Feature selection        | ❌           | ✅            | ✅          |

---

## ✅ Use ElasticNet When:
- Features are **correlated**
- You want **feature selection**
- You want **stable** models on **high-dimensional** data

In [42]:
# using lassocv

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data.drop('Profit', axis=1),
                                                    data['Profit'],
                                                    test_size=0.2,
                                                    shuffle=True,
                                                    random_state=42)

# Define numeric and categorical columns
numeric_cols = X_train.select_dtypes(include=['number']).columns.tolist()
categorical_cols = ['State']  # Example categorical column

# Data Preprocessing (Scaling & Encoding)
preprocessor = ColumnTransformer(transformers=
                                 [('num_scalar', StandardScaler(), numeric_cols),
                                  ('encoder', OneHotEncoder(drop='first'), categorical_cols)],
                                 remainder='passthrough')

# make the pipeline 

regressor = ElasticNet(alpha=0.1,
                       l1_ratio=0.7)
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('lasso_cv', regressor )
])

# fit the pipeline 
pipeline.fit(X_train, y_train)

In [44]:
# Predict
y_pred = pipeline.predict(X_test)

# Evaluation
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)

# Output
print(f"R² Score: {r2:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")

R² Score: 0.8945
MSE: 85455927.9970
RMSE: 9244.2376
MAE: 7490.1266


In [45]:
print(f'Intercept:{pipeline.named_steps['lasso_cv'].intercept_}')
print(f'Coeffecients:{pipeline.named_steps['lasso_cv'].coef_}')

Intercept:115413.6964119894
Coeffecients:[35597.46869461 -1256.21982502  5193.16002223   569.97176469
   118.87375499]


In [None]:
# using ElasticNetCV

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNetCV
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data.drop('Profit', axis=1),
                                                    data['Profit'],
                                                    test_size=0.2,
                                                    shuffle=True,
                                                    random_state=42)

# Define numeric and categorical columns
numeric_cols = X_train.select_dtypes(include=['number']).columns.tolist()
categorical_cols = ['State']  # Example categorical column

# Data Preprocessing (Scaling & Encoding)
preprocessor = ColumnTransformer(transformers=
                                 [('num_scalar', StandardScaler(), numeric_cols),
                                  ('encoder', OneHotEncoder(drop='first'), categorical_cols)],
                                 remainder='passthrough')

# make the pipeline 

regressor = ElasticNetCV(
    l1_ratio=[.1, .5, .7, .9, .95, .99, 1],  # L1 ratio values to try
    alphas=np.logspace(-4, 1, 50),          # Alpha values to try
    cv=5,
    random_state=42
    )

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('elasticnet_cv', regressor )
])

# fit the pipeline 
pipeline.fit(X_train, y_train)

In [50]:
# Predict
y_pred = pipeline.predict(X_test)

# Evaluation
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)

# Output
print("Best alpha:", pipeline.named_steps['elasticnet_cv'].alpha_)
print("Best l1_ratio:", pipeline.named_steps['elasticnet_cv'].l1_ratio_)
print(f"R² Score: {r2:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")

Best alpha: 1.2067926406393288
Best l1_ratio: 0.99
R² Score: 0.8975
MSE: 82973279.3170
RMSE: 9108.9670
MAE: 7172.1751


In [54]:
print(f'Intercept:{pipeline.named_steps['elasticnet_cv'].intercept_}')
print(f'Coeffecients:{pipeline.named_steps['elasticnet_cv'].coef_}')

Intercept:115377.63023813063
Coeffecients:[ 3.70248848e+04 -1.59696399e+03  4.18283076e+03  7.49988518e+02
  3.59824018e+01]
