# ⭐ Tutorial: Feature Importance with RiskLabAI

This notebook is a tutorial for the feature importance methods in the `RiskLabAI` library, based on Chapters 6-8 of 'Advances in Financial Machine Learning' by Marcos López de Prado.

We will demonstrate the professional **Strategy Pattern** design of this module (using `FeatureImportanceController`) to test five different methods on a unified synthetic dataset.

**Outline:**
1.  **Generate Synthetic Data:** Create a dataset with known informative, redundant, and noise features using `get_test_dataset`.
2.  **Part 1: Standard Feature Importance:**
    * Mean Decrease Impurity (MDI)
    * Mean Decrease Accuracy (MDA)
    * Single Feature Importance (SFI)
3.  **Part 2: Clustered Feature Importance:**
    * First, we'll use `cluster_k_means_top` to group our features.
    * Clustered MDI
    * Clustered MDA
4.  **Part 3: Orthogonalized Importance:**
    * Use `orthogonal_features` to remove collinearity.
    * Calculate MDI on the orthogonal features.
    * Use `calculate_weighted_tau` to compare results.
5.  **Conclusion:** Compare the results and see which methods correctly identified the informative features.

## 0. Setup and Imports

First, we import our libraries. We'll import `pandas`, `numpy`, `matplotlib`, and `sklearn`. 

From `RiskLabAI`, we import our `feature_importance` module (which contains the `FeatureImportanceController` and all utilities) and our plotting utils.

In [None]:
# Standard Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Scikit-Learn Imports
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import KFold

# RiskLabAI Imports
import RiskLabAI.features.feature_importance as fi
import RiskLabAI.utils.publication_plots as pub_plots

# --- Notebook Configuration ---
warnings.filterwarnings('ignore')
pub_plots.setup_publication_style()

## 1. Generate Synthetic Data

We use `get_test_dataset` to create our ground truth. We will generate a dataset with 40 features:
* **10 Informative (`I_`):** These features actually predict the target.
* **15 Redundant (`R_`):** These are copies of the informative features, plus some noise. Models can easily overfit to these.
* **15 Noise (`N_`):** These have no predictive value.

**The goal of a good feature importance algorithm is to assign high importance only to the 10 Informative features.**

In [None]:
X, y = fi.get_test_dataset(
    n_features=40,
    n_informative=10,
    n_redundant=15,
    n_samples=10000,
    random_state=42,
    sigma_std=0.5 # Add 50% noise to redundant features
)

print("Features (X):")
print(X.head())
print("\nTarget (y):")
print(y.head())

## 2. Part 1: Standard Feature Importance

We will now test the three standard (non-clustered) methods. We will use a basic `RandomForestClassifier` as our model for all tests.

In [None]:
# Helper function to plot importance
def plot_importance(importance_df, title):
    importance_df.sort_values('Mean', ascending=True, inplace=True)
    fig, ax = plt.subplots(figsize=(12, 16))
    ax.barh(
        importance_df.index,
        importance_df['Mean'],
        xerr=importance_df['StandardDeviation'],
        color='C0'
    )
    pub_plots.apply_plot_style(ax, title, 'Feature Importance', 'Feature Name')
    plt.show()

# Define the base classifier for all strategies
base_classifier = RandomForestClassifier(
    n_estimators=100, 
    criterion='entropy', 
    random_state=42,
    n_jobs=-1
)

### 2.1 Mean Decrease Impurity (MDI)

MDI is the *fastest* method. It is calculated in-sample (on training data) and measures how much each feature decreases impurity (Gini/Entropy). 

**Problem:** It is known to be biased and will inflate the importance of redundant features.

In [None]:
print("Running MDI...")
# 1. Initialize the controller with the 'MDI' strategy
mdi_controller = fi.FeatureImportanceController(
    strategy_type='MDI', 
    classifier=base_classifier
)

# 2. Compute importance
mdi_importance = mdi_controller.calculate_importance(X, y)

plot_importance(mdi_importance, 'Feature Importance (MDI)')

**Analysis:** MDI fails completely. As predicted, it assigns high importance to *both* the Informative (`I_`) and Redundant (`R_`) features, and incorrectly assigns zero importance to some Informative features. **It cannot distinguish signal from noise.**

### 2.2 Mean Decrease Accuracy (MDA)

MDA is a more robust, out-of-sample method. It works by:
1. Training a model in a cross-validation loop.
2. Measuring the baseline score (e.g., log-loss) on the test set.
3. Shuffling *one feature* in the test set and measuring the new, worse score.
4. The drop in score is the feature's importance.

**This is much slower, but far more accurate than MDI.**

In [None]:
print("Running MDA (this may take a minute)...")
# 1. Initialize the controller
mda_controller = fi.FeatureImportanceController(
    strategy_type='MDA',
    classifier=base_classifier,
    n_splits=5 # 5-fold CV
)

# 2. Compute importance
mda_importance = mda_controller.calculate_importance(X, y)

plot_importance(mda_importance, 'Feature Importance (MDA)')

**Analysis:** MDA is a massive improvement. It correctly assigns high importance to all 10 Informative (`I_`) features while correctly identifying that the Redundant (`R_`) and Noise (`N_`) features have zero importance.

### 2.3 Single Feature Importance (SFI)

SFI measures the predictive power of each feature *in isolation*. It trains a model on *only* that one feature and measures its cross-validated score.

**Problem:** It fails to identify features that are only important in combination with others.

In [None]:
print("Running SFI (this may take a minute)...")
# 1. Initialize the controller
sfi_controller = fi.FeatureImportanceController(
    strategy_type='SFI',
    classifier=base_classifier,
    n_splits=5,
    scoring='log_loss'
)

# 2. Compute importance
sfi_importance = sfi_controller.calculate_importance(X, y)

plot_importance(sfi_importance, 'Feature Importance (SFI)')

**Analysis:** SFI works, but it incorrectly gives high importance to Redundant features. This is because, in isolation, a redundant feature is just as predictive as its informative original.

## 3. Part 2: Clustered Feature Importance

The problem with MDA is that when features are highly correlated (like our `I_` and `R_` features), shuffling one has little effect because the redundant ones provide cover. 

**Clustered Importance** solves this. It first groups features into clusters, and then shuffles the *entire cluster* at once.

First, we use `cluster_k_means_top` (from the `RiskLabAI.cluster` module, imported via our `__init__.py`) to find the true clusters in our data.

In [None]:
print("Finding feature clusters...")
corr = X.corr()
corr_sorted, clusters, silh = fi.cluster_k_means_top(
    corr,
    max_clusters=int(X.shape[1]/2), # Max 20 clusters
    iterations=10,
    random_state=42
)

print(f"Found {len(clusters)} clusters.")

# Plot the sorted correlation matrix to confirm clusters
fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(corr_sorted, ax=ax, cmap='viridis', vmin=-1, vmax=1)
pub_plots.apply_plot_style(ax, 'Clustered Correlation Matrix', '', '')
plt.show()

### 3.1 Clustered MDI

This method first calculates standard MDI (fast, but biased) and then sums the importances for each cluster. This is an improvement, but still based on a flawed MDI foundation.

In [None]:
print("Running Clustered MDI...")
# 1. Initialize the controller
c_mdi_controller = fi.FeatureImportanceController(
    strategy_type='ClusteredMDI',
    classifier=base_classifier,
    clusters=clusters # Pass in our found clusters
)

# 2. Compute importance
c_mdi_importance = c_mdi_controller.calculate_importance(X, y)

plot_importance(c_mdi_importance, 'Clustered Feature Importance (MDI)')

### 3.2 Clustered MDA

This is the most robust method. It's like MDA, but instead of shuffling one feature at a time, it shuffles all features in a cluster. This defeats the substitution effect from redundant features.

In [None]:
print("Running Clustered MDA (this may take a minute)...")
# 1. Initialize the controller
c_mda_controller = fi.FeatureImportanceController(
    strategy_type='ClusteredMDA',
    classifier=base_classifier,
    clusters=clusters, # Pass in our found clusters
    n_splits=5
)

# 2. Compute importance
c_mda_importance = c_mda_controller.calculate_importance(X, y)

plot_importance(c_mda_importance, 'Clustered Feature Importance (MDA)')

**Analysis:** Clustered MDA gives the cleanest result. The chart clearly shows a small number of clusters (which contain the Informative features) are important, while all other clusters are not.

## 4. Part 3: Orthogonal Features & Weighted Tau

An alternative to clustering is **orthogonalization**. This method uses PCA to transform the features into a set of uncorrelated components. We can then run MDI (which is fast) on these new features. The importance of the *first* principal components should be highest.

We use `weighted_tau` to measure if the most important features (by MDI) align with the most important principal components (by explained variance).

In [None]:
print("Running Orthogonal Feature Importance...")
# 1. Get orthogonal features (X_ortho) and their component weights (pca_df)
X_ortho, pca_df = fi.orthogonal_features(X, variance_threshold=0.95)

# 2. Run MDI on the *orthogonal features*
ortho_mdi_controller = fi.FeatureImportanceController(
    strategy_type='MDI', 
    classifier=base_classifier
)
ortho_importance = ortho_mdi_controller.calculate_importance(X_ortho, y)

# 3. Calculate Weighted Tau
# This checks if feature importance (MDI) correlates with PC rank.
mdi_ranks = ortho_importance['Mean'].rank(ascending=False)
pca_ranks = pd.Series(range(1, len(mdi_ranks) + 1), index=mdi_ranks.index)

tau = fi.calculate_weighted_tau(mdi_ranks, pca_ranks)

print(f"\nWeighted Kendall's Tau: {tau:.4f}")
plot_importance(ortho_importance, 'Feature Importance on Orthogonal Features')


**Analysis:** The plot shows that importance is concentrated in the first few Principal Components, which is what we expect. The high Weighted Tau score confirms that the feature importance ranking (from MDI) strongly correlates with the PCA component ranking.