# Elastic Net Regression

## Summary

* **Elastic Net Regression** is a hybrid algorithm that combines the properties of both **Ridge Regression (L2)** and **Lasso Regression (L1)**.
* It serves a dual purpose: it helps to **reduce overfitting** while simultaneously performing **feature selection**.
* The algorithm modifies the cost function by adding **two penalty terms**: one based on the square of the slope (Ridge) and one based on the magnitude of the slope (Lasso).
* It is particularly useful when dealing with complex models that have a large number of features and are suffering from overfitting.

## Exam Notes

### When to use Elastic Net?

**Question:** When should you use Elastic Net Regression instead of just Ridge or Lasso?

**Answer:**  
You should use **Elastic Net** when your model is **overfitting** and also contains a **large number of features**.  
While Ridge handles overfitting and Lasso handles feature selection, Elastic Net allows you to solve both problems at the same time.

### Composition of Elastic Net

**Question:** What is Elastic Net composed of?

**Answer:**  
Elastic Net is a combination of **Ridge Regression** and **Lasso Regression**.

---

## Elastic Net Regression Details

**Elastic Net Regression** is a regularization technique designed to leverage the strengths of both Ridge and Lasso regression.

### The Cost Function

In Elastic Net, the cost function is the standard Mean Squared Error (MSE) plus **two distinct penalty terms**.

$$
J(\theta) =
\frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2
+
\lambda_1 \sum_{i=1}^{n} \ (slope)^2
+
\lambda_2 \sum_{i=1}^{n} |\ slope|
$$

* **First Term**: Standard Mean Squared Error (MSE) from Linear Regression.
* **Second Term ($\lambda_1 \sum \ (slope)^2$)**: **Ridge (L2) penalty**, used to reduce overfitting.
* **Third Term ($\lambda_2 \sum |\ slope|$)**: **Lasso (L1) penalty**, used for feature selection.

### Why use Elastic Net?

Elastic Net is useful in scenarios where multiple problems occur simultaneously:

1. **Overfitting**  
   * The model fits training data very well but performs poorly on test data.
   * The Ridge component ($\lambda_1$) helps reduce variance.

2. **High Dimensionality**  
   * The dataset contains a very large number of features.
   * The Lasso component ($\lambda_2$) shrinks coefficients of unimportant features to **zero**, performing feature selection.

By using Elastic Net, you effectively **tune the biasâ€“variance tradeoff** while simultaneously controlling model complexity and feature importance.
