# **Homework Assignment: Analyzing and Plotting Bias in Penalized Regression**

-------------------------------

In this assignment, you will explore how **Ridge** and **Lasso** regression introduce **bias** into a model to reduce **variance**, and how the choice of the regularization parameter $\lambda$ affects this trade-off. The goal is to visualize and analyze the **bias-variance trade-off** and understand the conditions under which penalization helps or hinders model performance.

## **The Question**

**How does varying the regularization parameter $\lambda$ in Ridge and Lasso regression impact the trade-off between bias and variance?**

- Generate a synthetic dataset based on a **known** linear relationship:
  
  $$
  y = \beta_0 + \beta_1 x + \ldots + \epsilon
  $$

  where $$\epsilon \sim \mathcal{N}(0, \sigma^2),$$

  use a **high-dimensional** setting (e.g., 50 predictors) with only a few non-zero true coefficients to emphasize the effects of regularization. I stress, the $\beta_i$ coefficients should be known for this experiment and they should be mostly 0, with only a few non-zero parameters.

- Investigate how increasing $\lambda$ influences the model’s **bias**, **variance**, and **Mean Squared Error (MSE)**.
- Plot **Bias²**, **Variance**, and **MSE** on a single graph for both Ridge and Lasso models.
- Explain MSE decomposition into bias and variance. Read more on the MSE decomposition if you need to.

**Does the regularization lead to an optimal trade-off point where MSE is minimized? Explain why this point exists.**




**Expected Outcome:**
 - As $\lambda$ increases:
   - **Bias** increases (the model becomes too simple).
   - **Variance** decreases (the model becomes more stable).
   - **MSE** forms a **U-shape**, revealing the optimal trade-off.

- Analyze how **Ridge** and **Lasso** differ in terms of their bias-variance trade-offs.
- Discuss situations where one method may outperform the other, considering factors like **feature sparsity** and **multicollinearity**.


# Reminder 

$$\text{Bias}^2(\hat{\theta}) = \left(\mathbb{E}[\hat{\theta}] - \theta\right)^2$$

$$\text{Var}(\hat{\theta}) = \mathbb{E}\left[ \left(\hat{\theta} - \mathbb{E}[\hat{\theta}] \right)^2 \right]$$

$$\text{MSE}(\hat{\theta}) = \mathbb{E}\left[(\hat{\theta} - \theta)^2\right] 
= \text{Var}(\hat{\theta}) + \text{Bias}^2(\hat{\theta})$$

In [None]:
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(101)

beta_len = 50
n_samples = 100

beta = np.zeros(beta_len)
beta[[0, 5, 11, 12, 33, 38, 49]] = [1.5, -2.5, -3.0, 1.0, 2.5, 0.8, 1.2]
beta_0 = 5.0

X = np.random.randn(n_samples, beta_len)
Y = beta_0 + X.dot(beta) + np.random.normal(0, 1, n_samples)  # Adding noise
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=101)

In [None]:
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, Y_train)
Y_pred = ridge_model.predict(X_test)
print("Predicted coefficients:\n", ridge_model.coef_)

In [None]:
runs = 100
alphas = np.logspace(-2, 4, 20)
for_plot = []

for a in alphas:
    all_preds_Ridge = []
    all_mses = []
    
    for run in range(runs):
        new_epsilon = np.random.normal(0, 1, n_samples)
        Y_new = beta_0 + X.dot(beta) + new_epsilon   
        X_train, X_test, Y_train, Y_test = train_test_split(X, Y_new, test_size=0.3, random_state=run)
        true_Y_test = beta_0 + X_test.dot(beta)
        model = Ridge(alpha=a)
        model.fit(X_train, Y_train)
        preds = model.predict(X_test)
        all_preds_Ridge.append(preds)
        
        
        all_mses.append(np.mean((preds - true_Y_test)**2))

    all_preds_Ridge = np.array(all_preds_Ridge)
    # true_Y_test = beta_0 + X_test.dot(beta)


    mean_prediction = np.mean(all_preds_Ridge, axis=0)
    bias_2 = np.mean((mean_prediction - true_Y_test)**2)
    var = np.mean(np.var(all_preds_Ridge, axis=0))
    mse = np.mean(all_mses)
    expected_mse = bias_2 + var
    
    labda = n_samples * a
    for_plot.append((labda, mean_prediction, bias_2, var, mse, expected_mse))
    
    print(f"alpha: {a}")
    print(f"Bias^2: {bias_2:.4f}")
    print(f"Variance: {var:.4f}")
    print(f"MSE: {mse:.4f}")
    print(f"Expected MSE (Bias^2 + Variance): {expected_mse:.4f}\n")

In [None]:
labdas = [item[0] for item in for_plot]
bias_squared = [item[2] for item in for_plot]
variances = [item[3] for item in for_plot]
mses = [item[4] for item in for_plot]
expected_mses = [item[5] for item in for_plot]

plt.figure(figsize=(10, 6))
plt.plot(labdas, bias_squared, label='Bias²', marker='o')
plt.plot(labdas, variances, label='Variance', marker='o')
plt.plot(labdas, mses, label='MSE', marker='o')
plt.plot(labdas, expected_mses, label='Bias² + Variance', marker='x', linestyle='--')

plt.xscale('log')
plt.xlabel('Lambda (Regularization Parameter)', fontsize=12)
plt.ylabel('Error', fontsize=12)
plt.title('Bias-Variance Trade-off for Ridge Regression', fontsize=14)
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()

# ANSWER:
Bigger labda increases MSE and lowers Variance