# Notebook 01: Permutation Importance

## Breaking the Connection to Measure Impact

In the previous notebook, we learned that shuffling rows shouldn't change predictions. Now we'll use a similar idea—but more strategically—to measure **feature importance**.

Permutation importance answers a simple question: *"What happens to model performance when we break the association between a feature and the target?"*

---

## The Core Idea

### Intuition

Imagine you're predicting house prices, and one of your features is "number of bedrooms." This feature is clearly important—houses with more bedrooms tend to cost more.

Now, what if we randomly shuffle the "bedrooms" column? We break the connection between bedrooms and price. If the model's performance drops significantly, bedrooms were important. If performance barely changes, bedrooms didn't matter much.

### The Algorithm

Permutation importance works as follows:

1. **Train a model** on the original data and record baseline performance (e.g., RMSE, R²)
2. **For each feature**:
   - Create a copy of the test set
   - Randomly permute (shuffle) that feature's values
   - Make predictions with the permuted feature
   - Compute the performance metric
   - **Importance = Baseline Performance - Permuted Performance**
3. **Rank features** by importance (larger drop = more important)

### Mathematical Formulation

For a feature $j$, permutation importance is:

$$\text{Importance}_j = \text{Score}(y, \hat{y}) - \text{Score}(y, \hat{y}_{\text{perm}(j)})$$

where:
- $\text{Score}$ is a metric (higher is better, like R²) or negative error (like -RMSE)
- $\hat{y}$ are predictions on original test set
- $\hat{y}_{\text{perm}(j)}$ are predictions when feature $j$ is permuted

**Important**: For error metrics (RMSE, MAE), we typically use the **negative** so that higher values mean more importance:

$$\text{Importance}_j = -\text{RMSE}(y, \hat{y}_{\text{perm}(j)}) - (-\text{RMSE}(y, \hat{y})) = \text{RMSE}(y, \hat{y}) - \text{RMSE}(y, \hat{y}_{\text{perm}(j)})$$

---

## Why Permutation Importance?

### Advantages

1. **Model-agnostic**: Works with any model (linear, trees, neural networks)
2. **Intuitive**: Easy to explain to non-technical stakeholders
3. **No retraining**: Fast to compute (just permute and predict)
4. **Handles interactions**: Captures feature importance in the context of the full model

### Limitations

1. **Correlated features**: If two features are highly correlated, permuting one might not hurt much because the other carries the signal
2. **Computational cost**: Requires multiple predictions (one per feature, often with repeats)
3. **Random variation**: Results can vary slightly between runs (use `n_repeats` to average)

---

## Permutation Importance vs. Other Methods

### vs. Coefficient Magnitude (Linear Models)

In linear models, coefficients tell us feature importance, but:
- Coefficients assume features are independent (often not true)
- Coefficients are sensitive to feature scaling
- Permutation importance works even when features are correlated

### vs. Feature Importance (Tree Models)

Tree models (Random Forest, XGBoost) have built-in feature importance:
- Based on how often features are used for splitting
- Can be biased toward high-cardinality features
- Permutation importance is more reliable for ranking

### vs. Label Permutation Test

**Important distinction**: Permutation importance permutes **features**, not labels.

- **Feature permutation**: Breaks feature-target association → measures feature importance
- **Label permutation**: Breaks all associations → tests if model learned anything (null hypothesis test)

---

## What We'll Do in This Notebook

1. **Train a Ridge regression model** on the diabetes dataset
2. **Compute permutation importance** using scikit-learn's built-in function
3. **Visualize results** with a bar plot
4. **Manual verification**: Manually permute one feature and observe the impact

Let's begin!



## Setup and Imports


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Ridge
from sklearn.inspection import permutation_importance
from sklearn.metrics import mean_squared_error, r2_score

from src.utils import set_seed
from src.metrics import regression_report
from src.viz import barh

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

set_seed(42)
print("✓ Imports successful!")


## Step 1: Load and Prepare Data


In [None]:
# Load diabetes dataset
data = load_diabetes(as_frame=True)
X = data.data
y = data.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"\nFeature names: {list(X.columns)}")


## Step 2: Train a Model with Pipeline

We'll use a **Pipeline** with StandardScaler + Ridge. Standardization is important for Ridge regression because it penalizes coefficients equally. Without scaling, features with larger scales would be penalized more.


In [None]:
# === TODO (you code this) ===
# Train a Ridge or LinearRegression model on the diabetes data.
# Hints: use a Pipeline with StandardScaler for stability
# Acceptance:
# - Print test RMSE and R2
# - Pipeline is fitted successfully


## Step 3: Compute Permutation Importance

Scikit-learn provides `permutation_importance()` which permutes each feature multiple times and returns mean and std of importance scores.


In [None]:
# === TODO (you code this) ===
# Compute permutation importance on the test set.
# Hints: from sklearn.inspection import permutation_importance; n_repeats=10
# Acceptance:
# - Bar plot of importances; short interpretation paragraph


## Step 4: Manual Permutation Check

As a sanity check, manually permute one feature and observe the impact.


In [None]:
# === TODO (you code this) ===
# Kaggle-style quick check: manually permute one obviously strong feature and recompute RMSE.
# Hints: copy X_test; shuffle one column with np.random.permutation
# Acceptance:
# - Show metric delta; if small, feature likely weak for this model


## Summary

Permutation importance is a powerful, model-agnostic way to measure feature importance. It works by breaking the connection between a feature and the target through random permutation, then measuring the impact on model performance.

**Next**: Notebook 02 will explore regularization (Ridge and Lasso) to understand how we can control model complexity.
