<a href="https://colab.research.google.com/github/Farah-Deeba-UNCC/Introduction-to-ML/blob/main/Regularization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### L2 Regularization
L2 regularization, also known as Ridge Regression, is a technique used in machine learning to prevent overfitting by adding a penalty term to the loss function. This penalty term shrinks the magnitude of the model coefficients, making the model more generalizable to unseen data.

Mathematical Formulation
For a linear regression model, the objective function with L2 regularization is given by:
$J(W) = \frac{1}{2m} \left[ \sum_{i=1}^{m} (y^{(i)}- h_\theta(x^{(i)}))^2 + \lambda \sum_{j=1}^{n} \theta_j^2 \right]$

The first term is the mean of squared errors (MSE), and the second term is the L2 penalty.
Ridge regression discourages large coefficient values and reducing model complexity and improves generalization. Unlike L1 regularization (Lasso), which can force some weights to be exactly zero, L2 regularization shrinks all coefficients but does not make them zero. L2 regularization tends to distribute weight values more evenly across features, making it useful when dealing with high-dimensional data.

By tuning the 𝜆 parameter, one can control the extent of regularization. A higher 𝜆 leads to more penalty and a simpler model, while a lower λ results in a model that fits the training data more closely.

In [None]:
# L2- regularization (ridge)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge, LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from ipywidgets import interact

# Generate synthetic data
np.random.seed(42)
x = np.linspace(-3, 3, 30)
y = 2 * x**3 + x**2 - 2*x + np.random.normal(0, 5, size=x.shape)  # More noisy cubic function
x = x.reshape(-1, 1)

# Function to visualize the effect of L2 regularization
def plot_ridge(alpha):
    plt.figure(figsize=(12, 5))

    # Fit Linear Regression with high-degree polynomial (degree=10)
    model_lr = make_pipeline(PolynomialFeatures(degree=10), LinearRegression())
    model_lr.fit(x, y)
    y_pred_lr = model_lr.predict(np.linspace(-3, 3, 100).reshape(-1, 1))

    # Fit Ridge Regression with L2 regularization
    model_ridge = make_pipeline(PolynomialFeatures(degree=10), Ridge(alpha=alpha))
    model_ridge.fit(x, y)
    y_pred_ridge = model_ridge.predict(np.linspace(-3, 3, 100).reshape(-1, 1))

    # Plot regression results
    plt.subplot(1, 2, 1)
    plt.scatter(x, y, label="Data", color='red')
    plt.plot(np.linspace(-3, 3, 100), y_pred_lr, label="Linear Regression (no regularization)", linestyle='dashed', linewidth=2)
    plt.plot(np.linspace(-3, 3, 100), y_pred_ridge, label=f"Ridge Regression (alpha={alpha})", linewidth=2)

    plt.xlabel("x")
    plt.ylabel("y")
    plt.title("Effect of L2 Regularization on Data Fitting")
    plt.legend()

    # Show effect on model coefficients
    plt.subplot(1, 2, 2)
    coefs_lr = model_lr.named_steps['linearregression'].coef_.flatten()
    coefs_ridge = model_ridge.named_steps['ridge'].coef_.flatten()
    plt.plot(range(len(coefs_lr)), coefs_lr, 'bo-', label="Linear Regression Coefficients")
    plt.plot(range(len(coefs_ridge)), coefs_ridge, 'ro-', label=f"Ridge Coefficients (alpha={alpha})")

    plt.xlabel("Coefficient Index")
    plt.ylabel("Coefficient Value")
    plt.title("Effect of Regularization on Coefficients")
    plt.legend()

    plt.tight_layout()
    plt.show()
 # Print theta values
    print(f"Theta values for Ridge Regression (alpha={alpha}):")
    print(coefs_ridge)
# Interactive widget to adjust alpha (regularization strength)
interact(plot_ridge, alpha=(0.001, 100.0, 1.0));



### L1 regularization,
L1 regularization, also known as Lasso (Least Absolute Shrinkage and Selection Operator), is a technique used in machine learning to prevent overfitting by adding a penalty equal to the absolute value of the magnitude of coefficients to the loss function. This encourages sparsity in the model parameters, effectively reducing the number of features used in the model.

Mathematical Formulation:
In a linear regression model, the L1 regularization modifies the cost function as follows:
$J(W) = \frac{1}{2m} \left[ \sum_{i=1}^{m} (y^{(i)}- h_\theta(x^{(i)}))^2 + \lambda \sum_{j=1}^{n} |\theta_j| \right]$


L1 regularization can shrink some coefficients to exactly zero, effectively performing feature selection by eliminating less informative features. The introduction of the absolute value penalty promotes sparsity in the model. It is useful in scenarios where feature selection is desired. By penalizing large coefficients, L1 regularization reduces model complexity abd helps to prevent overfitting.

The regularization parameter λ balances the trade-off between fitting the training data well and keeping the number of model coefficients small to maintain generalization.

In [None]:
# L1-regularization (Lasso)
from sklearn.linear_model import Lasso, LinearRegression
# Generate synthetic data
np.random.seed(42)
x = np.linspace(-3, 3, 30)
y = 2 * x**3 + x**2 - 2*x + np.random.normal(0, 5, size=x.shape)  # More noisy cubic function
x = x.reshape(-1, 1)

# Function to visualize the effect of L1 regularization
def plot_lasso(alpha):
    plt.figure(figsize=(12, 5))

    # Fit Linear Regression with high-degree polynomial (degree=10)
    model_lr = make_pipeline(PolynomialFeatures(degree=10), LinearRegression())
    model_lr.fit(x, y)
    y_pred_lr = model_lr.predict(np.linspace(-3, 3, 100).reshape(-1, 1))

    # Fit Lasso Regression with L1 regularization
    model_lasso = make_pipeline(PolynomialFeatures(degree=10), Lasso(alpha=alpha, max_iter=5000))
    model_lasso.fit(x, y)
    y_pred_lasso = model_lasso.predict(np.linspace(-3, 3, 100).reshape(-1, 1))

    # Plot regression results
    plt.subplot(1, 2, 1)
    plt.scatter(x, y, label="Data", color='red')
    plt.plot(np.linspace(-3, 3, 100), y_pred_lr, label="Linear Regression (no regularization)", linestyle='dashed', linewidth=2)
    plt.plot(np.linspace(-3, 3, 100), y_pred_lasso, label=f"Lasso Regression (alpha={alpha})", linewidth=2)

    plt.xlabel("x")
    plt.ylabel("y")
    plt.title("Effect of L1 Regularization on Data Fitting")
    plt.legend()

    # Show effect on model coefficients
    plt.subplot(1, 2, 2)
    coefs_lr = model_lr.named_steps['linearregression'].coef_.flatten()
    coefs_lasso = model_lasso.named_steps['lasso'].coef_.flatten()
    plt.plot(range(len(coefs_lr)), coefs_lr, 'bo-', label="Linear Regression Coefficients")
    plt.plot(range(len(coefs_lasso)), coefs_lasso, 'ro-', label=f"Lasso Coefficients (alpha={alpha})")

    plt.xlabel("Coefficient Index")
    plt.ylabel("Coefficient Value")
    plt.title("Effect of Regularization on Coefficients")
    plt.legend()

    plt.tight_layout()
    plt.show()
 # Print theta values
    print(f"Theta values for Lasso Regression (alpha={alpha}):")
    print(coefs_lasso)
# Interactive widget to adjust alpha (regularization strength)
interact(plot_lasso, alpha=(0.001, 100.0, 1.0));
