<a href="https://colab.research.google.com/github/lucywowen/csci547_ML/blob/main/examples/lasso_collinearity_hack.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LASSO vs Collinearity: A Hands‑On Demo

**Goal:** Show that with two highly collinear predictors, LASSO will often set one coefficient to exactly 0 (and which one gets dropped can flip with tiny data changes), while Ridge shrinks but typically keeps both nonzero.

**What you'll do:**
1. Generate synthetic data where `X2 ≈ X1` (strong collinearity) and the target is `y = 3·X1 + 2·X3 + noise`. (I'll provide you some code).
2. Fit LASSO across a range of α values and pick a sparse solution with exactly two non‑zero coefficients.
3. Plot the LASSO coefficient paths vs `log10(alpha)` to visualize when coefficients enter/leave the model.
4. Compare to Ridge (no exact zeros).
5. Repeat the experiment across many random seeds to **demonstrate arbitrariness** in which collinear feature survives.




In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
from zipfile import ZipFile, ZIP_DEFLATED
from IPython.display import display

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Lasso, Ridge, lasso_path

plt.rcParams.update({"figure.dpi": 144})

Libraries imported.


## 1) Generate a dataset with two highly collinear features
Here I've made `X2 = X1 + ε` where `ε ~ N(0, σ²)` with a **small** σ to induce strong collinearity.

TO DO: Plot the correlations of the features as a heatmap.  

In [None]:
# Parameters you can tweak
SEED = 42
N = 300
COLLINEAR_NOISE = 0.02   # smaller -> stronger collinearity between X1 and X2
Y_NOISE = 1.0

rng = np.random.default_rng(SEED)
X1 = rng.normal(size=N)
X2 = X1 + rng.normal(scale=COLLINEAR_NOISE, size=N)
X3 = rng.normal(size=N)
y  = 3.0*X1 + 2.0*X3 + rng.normal(scale=Y_NOISE, size=N)

df = pd.DataFrame({"X1": X1, "X2": X2, "X3": X3, "y": y})

print("\nCorrelation among predictors (expect X1≈X2):")
display(df[["X1","X2","X3"]].corr())


Correlation among predictors (expect X1≈X2):


Unnamed: 0,X1,X2,X3
X1,1.0,0.999762,-0.072945
X2,0.999762,1.0,-0.075797
X3,-0.072945,-0.075797,1.0


In [None]:
### Heatmap here

## 2) LASSO sweep over α to find a model with exactly 2 non‑zero coefficients
Ok! Here I have some sample code using one value of α and extracting the feature coefficients.  

TO DO: Now let's sweep over different values of α.  Try using [.00001, .0001, .001, .01, .1, 1] to start, and the goal of the search is trying to find an α with exactly two non‑zero coefficients (ideally `X3` plus one of `X1` or `X2`).
What value of α do you get?

In [None]:
X = df[["X1","X2","X3"]].values
y_arr = df["y"].values

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("lasso", Lasso(alpha=.001, fit_intercept=True, max_iter=20000, tol=1e-6, random_state=0))
])

pipe.fit(X, y_arr)
coefs = pipe.named_steps["lasso"].coef_
print(coefs)

[ 3.22749009 -0.41467568  2.05242318]


In [None]:
### Your turn!

## 3) LASSO coefficient paths
TO DO: Track (plot) how each coefficient changes along the regularization path. With `X1` and `X2` collinear, one of them will typically be driven to **exactly zero** for sufficiently large α.

Hint: You can use the same alpha values: [.000001, .00001, .0001, .001, .01, .1, 1] but the plot will be much neater if you use `plt.xscale('log')`.

In [1]:
### Plot coefficients across alphas

## 4) Ridge comparison
TO DO: Complare these coefficients to Ridge.  Remember, Ridge regularization (L2) shrinks coefficients but rarely sets them to **exactly** zero, even with strong collinearity.

In [None]:
### What's the output using Ridge?

## 5) Repeat across many seeds to show arbitrariness
Ok now we get to the big thing!  Let's regenerate the dataset with different random seeds and fit a LASSO model for each. We then count how often `X1` survives vs `X2` (or both/neither). Plot correlations between the features as a heatmap and plot the alpha path. Go for it!

In [None]:
N_SEEDS = 10
records = []
for seed in range(N_SEEDS):
    ### You got this!!



## 6) Bonus! Elastic Net: bridging LASSO and Ridge

Elastic Net combines L1 and L2 penalties. The mixing parameter `l1_ratio` controls the blend:
- `l1_ratio = 1` → pure **LASSO** (L1)
- `l1_ratio = 0` → pure **Ridge** (L2)

With highly collinear features, Elastic Net often shows a **grouping effect**: correlated predictors tend to be selected together more readily than with pure LASSO. Below, try to (a) pick `alpha` and `l1_ratio` using cross‑validation, (b) visualize Elastic Net coefficient paths for a few `l1_ratio` values, and (c) compare CV‑selected coefficients for Ridge, LASSO, and Elastic Net.


In [None]:
### Try using elastic net and see what happens!

## Discussion
- **LASSO (L1)** promotes sparsity by adding an \(L_1\) penalty; under strong collinearity, multiple solutions give similar fit quality, so the penalty can **select one feature and zero out its twin**. Tiny data changes can flip which one survives.
- **Ridge (L2)** shares weight across collinear features—**shrinks** them but rarely makes them **exactly 0**.
- The path plot connects this intuition visually: as α increases, one of the collinear coefficients is pushed to 0 while the other remains.
- **Knobs to turn:**
  - Increase/decrease `COLLINEAR_NOISE` to strengthen/relax collinearity.
  - Try different α ranges or selection rules.
  - Add more redundant predictors to show LASSO picking a subset.
