# Regularization Techniques in Machine Learning

### Definition:
Regularization is a technique used to prevent overfitting by adding a penalty to the model's complexity.
It helps improve the generalization of the model, ensuring that it performs well not just on the training data but also on unseen data.

### Types of Regularization:

#### 1. **L2 Regularization (Ridge Regression):**
- **Formula:**
$$
\text{Loss function} = \text{Original Loss} + \lambda \sum_{i=1}^{n} w_i^2
$$
- Tends to shrink coefficients but doesn't set them to zero.

#### 2. **L1 Regularization (Lasso Regression):**
- **Formula:**
$$
\text{Loss function} = \text{Original Loss} + \lambda \sum_{i=1}^{n} |w_i|
$$
- Tends to drive some coefficients exactly to zero, leading to sparse models.

#### 3. **Elastic Net Regularization:**
- **Formula:**
$$
\text{Loss function} = \text{Original Loss} + \lambda \left( \alpha \sum_{i=1}^{n} |w_i| + (1-\alpha) \sum_{i=1}^{n} w_i^2 \right)
$$
- A mix of L1 and L2 regularization for a balance between feature selection and shrinkage.

#### 4. **Early Stopping (for Neural Networks):**
- **Description:**
  - Regularization technique for iterative models like neural networks. 
  - Training stops early if validation performance starts degrading, preventing overfitting.
  - Where $ \ w_i$ is slop of the curve


### Summary Table

| **Regularization Type** | **Penalty** | **Main Effect** |
|-------------------------|-------------|------------------|
| **L2 (Ridge)** | $ \lambda \sum_{i=1}^{n} w_i^2 $ | Shrinks coefficients but doesn't set them to zero |
| **L1 (Lasso)** | $ \lambda \sum_{i=1}^{n} \lvert w_i \rvert $ | Shrinks some coefficients to zero (feature selection) |
| **Elastic Net** | $ \lambda \left( \alpha \sum_{i=1}^{n} \lvert w_i \rvert + (1-\alpha) \sum_{i=1}^{n} w_i^2 \right) $ | Mix of L1 and L2 (balance between feature selection and shrinkage) |
| **Early Stopping** | Stops training if validation performance degrades | Prevents overfitting in iterative models like neural networks |
