<center><h1 style="color:purple">Regularization in Machine Learning</center>

---
Regularization is a vital technique in machine learning used to address **overfitting** and improve a model's **generalization ability**. It achieves this by adding a penalty term to the loss function, discouraging complex models that may overfit the training data. By constraining model parameters, regularization helps find a balance between **bias** and **variance** for optimal performance.

---

### **Overfitting and Underfitting**
- **Overfitting**: Occurs when a model learns the noise in training data, leading to poor generalization on unseen data. Symptoms include high training accuracy and low testing accuracy.
- **Underfitting**: Happens when a model fails to capture the patterns in the training data, resulting in poor performance on both training and testing datasets.

---

### **Bias-Variance Tradeoff**
- **Bias**: Error due to overly simplistic models that fail to capture underlying patterns (underfitting).
- **Variance**: Error caused by overly complex models that memorize data noise (overfitting).
- **Goal**: Find the right balance between bias and variance for consistent and accurate predictions on new data.

---

### **Types of Regularization Techniques**

#### 1. **Lasso Regularization (L1 Regularization)**
- Adds the absolute value of coefficients as a penalty term to the loss function.
- Promotes **sparse solutions**, reducing some feature coefficients to zero for **feature selection**.
- **Cost Function**:
  $$
  \text{Cost} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{i=1}^m |w_i|
  $$
  - **Variables**:
    - *$n$*: Number of examples
    - *$m$*: Number of features
    - *$y_i$*: Actual target value for the $i\text{th}$ example
    - *$\hat{y}_i$*: Predicted target value for the $i\text{th}$ example
    - *$w_i$*: Coefficients/weights of the features
    - *$\lambda$*: Regularization strength

#### 2. **Ridge Regularization (L2 Regularization)**
- Adds the squared magnitude of coefficients as a penalty term to the loss function.
- Reduces coefficient sensitivity and handles **multicollinearity** effectively.
- **Cost Function**:
  $$
  \text{Cost} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{i=1}^m w_i^2
  $$
  - **Variables**:
    - *$n$*: Number of examples
    - *$m$*: Number of features
    - *$y_i$*: Actual target value for the $i\text{th}$ example
    - *$\hat{y}_i$*: Predicted target value for the $i\text{th}$ example
    - *$w_i$*: Coefficients/weights of the features
    - *$\lambda$*: Regularization strength

#### 3. **Elastic Net Regularization**
- Combines L1 and L2 regularization, controlled by an additional hyperparameter (\(\alpha\)).
- Balances feature selection (L1) and coefficient shrinkage (L2).
- **Cost Function**:
  $$
  \text{Cost} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \left( (1 - \alpha) \sum_{i=1}^m |w_i| + \alpha \sum_{i=1}^m w_i^2 \right)
  $$
  - **Variables**:
    - *$n$*: Number of examples
    - *$m$*: Number of features
    - *$y_i$*: Actual target value for the $i\text{th}$ example
    - *$\hat{y}_i$*: Predicted target value for the $i\text{th}$ example
    - *$w_i$*: Coefficients/weights of the features
    - *$\lambda$*: Regularization strength
    - *$\alpha$*: Hyperparameter controlling the balance between L1 and L2 regularization

---

### **Benefits of Regularization**
1. **Improves Generalization**: Reduces overfitting by focusing on the underlying patterns rather than noise.
2. **Feature Selection**: L1 regularization simplifies models by eliminating irrelevant features.
3. **Stabilizes Models**: Reduces sensitivity to data changes and ensures consistent performance across datasets.
4. **Handles Multicollinearity**: Controls the magnitudes of correlated coefficients.
5. **Enhances Performance**: Prevents excessive weighting of irrelevant features or outliers.
6. **Adjustable Complexity**: Hyperparameters *$\lambda$* or *$\alpha$* allow fine-tuning the balance between bias and variance.

---

### **Choosing Regularization Techniques**
- Use **Lasso** for feature selection or when working with sparse datasets.
- Use **Ridge** for handling multicollinearity and ensuring stability.
- Use **Elastic Net** when a mix of L1 and L2 regularization is beneficial.

---

By applying regularization thoughtfully, machine learning practitioners can build robust models that generalize well across diverse datasets.


<img src="1.png" width=400> <img src="2.png" width=400>

<img src="3.png" width=400> <img src="4.png" width=400>

<img src="5.png" width=400>