# Elastic Net Regression

Elastic Net Regression addresses the limitations of both Lasso and Ridge regression by combining their techniques to improve the model. It penalizes the coefficients using both the $L_1 $ and $L_2 $ norms, leading to the following loss function:

$$\text{Loss}_{\text{elastic}} = \text{RSS} + \lambda \left( (1 - \alpha) \cdot \| \beta \|_2^2 + \alpha \cdot \| \beta \|_1 \right) $$

This can be written as:

$$\therefore\sum_{i=1}^{n} (y_i - \hat{y_i})^2 + \lambda \left( (1 - \alpha) \cdot \sum_{j=1}^{d} \beta_j^2 + \alpha \cdot \sum_{j=1}^{d} |\beta_j| \right)$$

### Key Points:
- **Lasso Penalty ($L_1$ norm)**: Creates a sparse model by driving some coefficients to zero.
- **Ridge Penalty ($L_2$ norm)**: Prevents the model from ignoring highly correlated variables and stabilizes the $L_1$ regularization path.

### Tuning $\alpha$:
- **$\alpha$ = 0**: The equation reduces to Ridge Regression (only $L_2$ penalty remains).
- **$\alpha$ = 1**: The equation reduces to Lasso Regression (only $L_1$ penalty remains).
- **0 < $\alpha$ <1**: A combination of both Ridge and Lasso penalties.

To optimize the elastic net, $\alpha$ should be chosen between 0 and 1.

### Elastic Net Cost in Argument Minimum Representation:

$$
\therefore\boldsymbol{\beta_{\text{elastic}}} = \underset{\boldsymbol{\beta \in \mathbb{R}}}{\arg\min} \| y - X\beta \|_2^2 + \lambda \left( \alpha \| \beta \|_1 + (1 - \alpha) \| \beta \|_2^2 \right)
$$


## Geometrical Interpretation of Elastic Net Regression 

<img src="../../assets/Elastic_net_Regression.png">

Elastic Net regression combines characteristics of Ridge and Lasso penalties in its geometric interpretation:

### Geometric Shapes

   - **Ridge Penalty**: Forms a circular shape in the parameter space ($L_2$ norm).
   - **Lasso Penalty**: Forms a diamond-shaped boundary ($L_1$ norm).
   - **Elastic Net Penalty**: Combines these shapes into a rounded square or "elastic ball," blending features of both Ridge and Lasso penalties.

### Optimization Goal

   - The objective is to find coefficients $\beta_1, \beta_2$ that minimize the penalized loss function. This function balances between fitting the data (represented by ellipses in the figure) and penalizing model complexity (represented by the elastic ball).

### Interpretation
   - The red point on the figure denotes the Ridge estimate, found at the intersection of the ellipse (OLS solution) and the circle (Ridge penalty).
   - The purple point signifies the Lasso estimate, found where the ellipse intersects the diamond (Lasso penalty).
   - The green point represents the Elastic Net estimate, found at the intersection of the ellipse and the elastic ball (Elastic Net penalty).

### Penalty Combination
   - Elastic Net penalty is expressed as $(1-\alpha)\cdot\sum_{j=1}^{d}{\beta_j^2} + \alpha\cdot\sum_{j=1}^{d}|\beta_j|$, where $\alpha$ adjusts the balance between Ridge $L_2$ and Lasso $L_1$ penalties.
   - The elastic ball expands from the origin outward, illustrating increased regularization as coefficients deviate from zero.

### Exercise: Elastic Net Regression 

For implementation of closed form Elastic Net Regression Equation using the [Boston House Prices Dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html).

It is one of the datasets provided by sklearn. It has _506_ instances with _13_ **numericals/categorical features** of the Boston city. The _medv_ variable is the **target variable**. It is the median value of owner-occupied homes per $1000.

In [13]:
# Import necessary libraries
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

In [14]:
# Load the dataset
path = "../../assets/Datasets/House-Price.csv"
house = pd.read_csv(path)

house.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.01,18.0,2.31,0,0.54,6.58,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.03,0.0,7.07,0,0.47,6.42,78.9,4.97,2,242.0,17.8,396.9,9.14,21.6
2,0.03,0.0,7.07,0,0.47,7.18,61.1,4.97,2,242.0,17.8,392.83,4.03,34.7
3,0.03,0.0,2.18,0,0.46,7.0,45.8,6.06,3,222.0,18.7,394.63,2.94,33.4
4,0.07,0.0,2.18,0,0.46,7.15,54.2,6.06,3,222.0,18.7,396.9,5.33,36.2


In [15]:
# Separate feature and target columns
X = house.drop(columns=["MEDV"])
y = house["MEDV"].values.reshape(-1, 1)

In [16]:
# Standardization Feature Scaling and split into train/test
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y,
                                                    test_size=0.2, 
                                                    random_state=42)

In [17]:
# Implement Elastic Net Regression
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error

# ElasticNet with lambda value of 0.1, Lasso and Ridge ratio of 0.5 each
elastic = ElasticNet(alpha=0.1,
                     l1_ratio=0.5,
                     fit_intercept=True)

# Fitting the model on the training set
elastic.fit(X_train, y_train)

# Predicting the fitted model in test set
y_pred = elastic.predict(X_test)

In [18]:
# Calculating mean squared error
mean_sq_error = mean_squared_error(y_test, y_pred) 
print(f"MSE: {mean_sq_error:.3f}")

MSE: 25.223


In [19]:
# Get coefficients of elastic net 
coefficients = elastic.coef_

index=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 
       'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']

pd.options.display.float_format = "{:,.2f}".format
pd.DataFrame(coefficients, columns=['Beta value'],index=index)

Unnamed: 0,Beta value
CRIM,-0.74
ZN,0.32
INDUS,-0.03
CHAS,0.73
NOX,-1.39
RM,3.21
AGE,-0.1
DIS,-2.16
RAD,0.85
TAX,-0.61


> **Elastic Net Regression** usually performs better than the Lasso and Ridge Regression because it **shrinks coefficients** and **selects entire groups of highly correlated variables**, addressing the limitations of both the regression methods.