## Regularization Techniques
Regularization techniques are methods used to **prevent overfitting** in machine learning models. Overfitting happens when a model learns the training data too well, including noise or random fluctuations, causing poor performance on unseen (test) data. Regularization works by **adding a penalty** or constraint to the model's loss function during training, which discourages overly complex models and helps them generalize better on new data.

The penalty is typically a function of the model parameters (weights or coefficients) and is controlled by a **regularization parameter** (often denoted as λ) that balances fitting the training data well and keeping the model simple.

#### L1 Regularization (Lasso Regression)

L1 regularization, also called Lasso (Least Absolute Shrinkage and Selection Operator) regression, adds the sum of absolute values of the weights as a penalty term to the loss function:

$$
J(\theta) = J(original\theta) + \lambda \sum_{j=1}^{n} |\theta_j|
$$

  where:
  - J(θ) is the regularized cost function
  - J_original(θ) is the original loss function (e.g., mean squared error)
  - λ is the regularization parameter controlling the strength of the penalty
  - θ_j are the model parameters (weights)

Key Characteristics:
- Encourages sparsity: many parameters may be shrunk exactly to zero
- Useful for feature selection by effectively removing irrelevant features
- Produces simpler, interpretable models
- Can cause some coefficients to become zero (feature elimination)


#### L2 Regularization (Ridge Regression)

L2 regularization, also called Ridge regression, adds the sum of squared values of the weights as a penalty term:

$$
J(\theta) = J(original\theta) + \lambda \sum_{j=1}^{n} |\theta_j|^2
$$

Key Characteristics:
- Shrinks coefficients towards zero but does not force them to be zero
- Helps reduce model complexity and improve stability
- Useful when many small/medium effects exist rather than sparse features
- Produces smaller, more robust weights


In [2]:
# Manual Implementation
import numpy as np

# Example weights
theta = np.array([3, -4, 0.5])
lambda_ = 0.1
original_cost = 10

# L1 Regularization term
l1_term = lambda_ * np.sum(np.abs(theta))
l1_regularized_cost = original_cost + l1_term

# L2 Regularization term
l2_term = lambda_ * np.sum(theta ** 2)
l2_regularized_cost = original_cost + l2_term

print("L1 Regularized Cost:", l1_regularized_cost)
print("L2 Regularized Cost:", l2_regularized_cost)


L1 Regularized Cost: 10.75
L2 Regularized Cost: 12.525


In [3]:
# Using Scikit-Learn
# For linear models with built-in regularization:

# Lasso Regression (L1):

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

lasso = Lasso(alpha=0.1)  # alpha corresponds to λ
lasso.fit(X_train, y_train)

print("Lasso coefficients:", lasso.coef_)

Lasso coefficients: [61.43821843 98.34755587 60.9358236  55.43964642 35.89008021]


In [4]:
# Ridge Regression (L2):

from sklearn.linear_model import Ridge

ridge = Ridge(alpha=0.1)
ridge.fit(X_train, y_train)

print("Ridge coefficients:", ridge.coef_)

Ridge coefficients: [61.44968224 98.32085667 60.97256679 55.4690014  35.94653687]


#### Dropout Regularization

Dropout is a regularization technique mainly used in deep learning. During each training iteration:
  - Randomly "drop" (set to zero) a fraction p of neurons in the network
  - This prevents neurons from co-adapting too much
  - Forces the network to learn redundant representations and become robust
  - Reduces overfitting and improves generalization

Key Characteristics:
- Only applied during training; during inference, dropout is turned off
- Dropout rate p is a hyperparameter (typical values 0.2 to 0.5)
- Easy to implement and widely used in neural networks


**Solved Example: Regularized Cost Calculation**

Given a linear regression model with three weights θ1 = 3, θ2 = -4, θ3 = 0.5, regularization parameter λ = 0.1, and original cost J_original = 10:

L1 Regularization:
J(θ) = J_original(θ) + λ Σ_j=1^m |θ_j|

J = 10 + 0.1 × (|3| + |-4| + |0.5|)
  = 10 + 0.1 × (3 + 4 + 0.5)
  = 10 + 0.75
  = 10.75

L2 Regularization:
J(θ) = J_original(θ) + λ Σ_j=1^m θ_j^2

J = 10 + 0.1 × (3² + (-4)² + 0.5²)
  = 10 + 0.1 × (9 + 16 + 0.25)
  = 10 + 2.525
  = 12.525

Regularization helps generalize ML models by simplifying them. Lasso eliminates some features of the coefficients, while ridge regression reduces their size as the strength of regularization increases.

**Summary**

- Regularization techniques help reduce overfitting and improve generalization of ML models.

- L1 regularization (Lasso) encourages sparse models by forcing some weights to zero, useful for feature selection.

- L2 regularization (Ridge) shrinks weights but keeps all in the model, stabilizing learning.

- Dropout is a robust regularization technique for neural networks, randomly dropping neurons during training.

- Python libraries like Scikit-Learn and TensorFlow provide easy-to-use implementations of these techniques.



In [5]:
# Sources:
# [1](https://www.ibm.com/think/topics/lasso-regression)
# [2](https://www.youtube.com/watch?v=VqKq78PVO9g)
# [3](https://builtin.com/data-science/l2-regularization)
# [4](https://www.reddit.com/r/deeplearning/comments/17eiu9p/list_here_all_the_regularization_techniques_you/)
# [5](https://www.geeksforgeeks.org/machine-learning/regularization-in-machine-learning/)
# [6](https://www.e2enetworks.com/blog/regularization-in-deep-learning-l1-l2-dropout)
# [7](https://towardsdatascience.com/understanding-l1-and-l2-regularization-93918a5ac8d0/)
# [8](https://www.dataquest.io/blog/regularization-in-machine-learning/)
# [9](https://www.geeksforgeeks.org/machine-learning/what-is-lasso-regression/)
# [10](https://wandb.ai/mostafaibrahim17/ml-articles/reports/Understanding-L1-and-L2-regularization-techniques-for-optimized-model-training--Vmlldzo3NzYwNTM5)