## 📌 Ridge vs Lasso Regression

Ridge and Lasso Regression are two popular techniques in machine learning used for **regularizing linear models** to avoid overfitting and improve predictive performance. Both methods add a **penalty term** to the model’s cost function to constrain the coefficients, but they differ in how they apply this penalty.

- **Ridge Regression (L2 regularization):**  
  Adds the **squared magnitude** of the coefficients as a penalty.

- **Lasso Regression (L1 regularization):**  
  Introduces a penalty based on the **absolute value** of the coefficients.


## 📌 What is Ridge Regression (L2 Regularization) Method?

Ridge regression, also known as **L2 regularization**, is a technique used in linear regression to **prevent overfitting** by adding a penalty term to the loss function. This penalty is proportional to the **square of the magnitude of the coefficients (weights)**.

Ridge Regression is a version of linear regression that includes a penalty to prevent the model from overfitting, especially when there are **many predictors** or **not enough data**.

The standard loss function (mean squared error) is modified to include a regularization term:

Loss = MSE + λ Σ (wᵢ²) for i = 1 to n

Here:  
- λ = regularization parameter that controls the strength of the penalty.  
- wᵢ = model coefficients (weights).  



🔹 **How coefficients shrink**  

The model tries to minimize both the **MSE** and the **penalty**:  
`Loss = MSE + λ Σ wᵢ²`  

If a coefficient \(w_i\) is very large, the penalty \(\lambda w_i^2\) becomes huge.  
To keep the loss small, the model reduces the size of the coefficients (slopes).  
This is why Ridge makes coefficients smaller (shrinks them) compared to plain linear regression.  

---

🔹 **Why they don’t become exactly zero**  

The penalty is based on squares (\(w_i^2\)).  
Squaring makes the cost curve **smooth and round**.  
When optimization happens (using gradient descent or a matrix solution), coefficients are pushed toward zero but never forced to touch it.  

 


## 📌 Lasso Regression (L1 Regularization)

Lasso regression, also known as **L1 regularization**, is a linear regression technique that adds a penalty to the loss function to prevent overfitting. This penalty is based on the **absolute values of the coefficients**.  

Lasso regression modifies linear regression by including a penalty equal to the absolute value of the coefficient magnitudes.  
By encouraging **sparsity**, this L1 regularization term reduces overfitting and helps some coefficients become exactly **zero**, hence facilitating **feature selection**.  

---

🔹 **Loss Function:**  

Loss = MSE + $\lambda \sum_{i=1}^n |w_i|$

Where:  
- $\lambda$ = regularization parameter controlling penalty strength.  
- $w_i$ = model coefficients (slopes).  


| Characteristic      | Ridge Regression                                                                 | Lasso Regression                                                                 |
|---------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| **Regularization Type** | Applies L2 regularization, adding a penalty term proportional to the **square of the coefficients** | Applies L1 regularization, adding a penalty term proportional to the **absolute value of the coefficients** |
| **Feature Selection**   | Does **not** perform feature selection. All predictors are retained, although their coefficients are reduced in size to minimize overfitting | Performs **automatic feature selection**. Less important predictors are completely excluded by setting their coefficients to zero |
| **When to use**         | Best suited for situations where **all predictors are potentially relevant**, and the goal is to reduce overfitting rather than eliminate features | Ideal when you suspect that only a **subset of predictors** is important, and the model should focus on those while ignoring the irrelevant ones |
| **Output model**        | Produces a model that includes **all features**, but their coefficients are smaller in magnitude to prevent overfitting | Produces a model that is **simpler**, retaining only the most significant features and ignoring the rest by setting their coefficients to zero |
| **Impact on Prediction**| Reduces the magnitude of coefficients, shrinking them towards zero, but does not set any coefficients exactly to zero. All predictors remain in the model | Shrinks some coefficients to **exactly zero**, effectively removing their influence. This leads to a simpler model with fewer features |
| **Computation**         | Generally faster as it doesn’t involve feature selection                      | May be slower due to the feature selection process                             |
| **Example Use Case**    | Use when you have many predictors, all contributing to the outcome (e.g., predicting house prices where all features like size, location, etc., matter) | Use when you believe only some predictors are truly important (e.g., genetic studies where only a few genes out of thousands are relevant) |


## 📌 When to Use Ridge Regression?
Ridge Regression is most suitable when **all predictors are expected to contribute** to the outcome and none should be excluded from the model.  

- Reduces **overfitting** by shrinking the coefficients, ensuring they don’t become too large.  
- Keeps **all predictors** in the model, but with controlled influence.  

**Example:**  
When predicting **house prices**, features like:  
- Size  
- Number of bedrooms  
- Location  
- Year built  

…are all likely relevant. Ridge Regression ensures these features remain in the model but with reduced influence to create a **balanced and robust prediction**.  

---

## 📌 When to Use Lasso Regression?
Lasso Regression is ideal when you suspect that **only a few predictors are truly important**, and the rest may add noise or redundancy.  

- Performs **automatic feature selection**.  
- Shrinks the coefficients of less important predictors to **zero**, effectively removing them from the model.  

**Example:**  
If you’re building a model with **hundreds of potential predictors** (like customer behavior features in marketing), Lasso can automatically select only the most impactful ones while ignoring the rest.  
