# Regularization in Machine Learning

## What is Regularization?
Regularization is a technique used in machine learning to **prevent overfitting** by adding a penalty term to the model’s loss function.  

- **Overfitting**: The model learns noise and unnecessary details from training data, reducing performance on unseen data.  
- **Goal of regularization**: Encourage the model to keep its parameters (weights) small and simple, improving **generalization**.  

The modified loss function looks like this:  

![image.png](attachment:image.png)

Where:  
- **λ (lambda)** is the **regularization strength** (hyperparameter).  
- A higher λ means stronger regularization (more penalty on large weights).  

## Types of Regularization

### 1.  L1 Regularization (Lasso) → Sum of Absolute Coefficients
- **Penalty term**:  
![image-2.png](attachment:image-2.png)

- **Effect**:  
  - The penalty is proportional to the **absolute value** of the coefficients.   
  - Forces some coefficients (weights) to become **exactly zero**.  
  - Produces **sparse models** (only a few important features are kept).  
  - Helps with **feature selection**.  

- **When to use**:  
  - If you suspect many features are irrelevant.  
  - Useful for high-dimensional datasets (e.g., text classification, genomics).  

### 2. L2 Regularization (Ridge) → Sum of Squared Coefficients
- **Penalty term**:  
  ![image-3.png](attachment:image-3.png)
- **Effect**:  
  - The penalty is proportional to the **square** of the coefficients.   
  - Shrinks all coefficients towards zero but **never exactly zero**.  
  - Distributes weights more evenly across all features.  
  - Reduces model complexity while keeping all features.  

- **When to use**:  
  - If most features are useful but should be kept small.  
  - Works well when features are correlated.  

### 3. Elastic Net Regularization
- **Penalty term**:  
  ![image-4.png](attachment:image-4.png)
- **Effect**:  
  - Combines the benefits of **L1 (sparsity)** and **L2 (weight shrinkage)**.  
  - Some coefficients become zero, others shrink but remain nonzero.  
  - Helps when there are **many correlated features**.  

- **When to use**:  
  - If dataset is high-dimensional and features are correlated.  
  - A middle ground between Lasso and Ridge.  


### Choosing λ (Regularization Strength)
- **λ = 0** → no regularization (risk of overfitting).  
- **Small λ** → slight penalty, prevents large weights.  
- **Large λ** → strong penalty, can cause underfitting.  
- **Best λ** is usually found using **cross-validation**.  