<img src="../Images/DSC_Logo.png" style="width: 400px;">

In [None]:
!pip install pandas numpy matplotlib seaborn scikit-learn shap PyALE

# Post-hoc Model-Agnostic Interpretation Methods

Although interpretable or semi-interpretable supervised machine learning (ML) models are available, they are not the most commonly used in geosciences today. This is largely because more complex models such as ensemble methods (e.g. Gradient Boosting, Random Forests) or deep neural networks (NNs) often outperform them in predictive performance (Dramsch et al. 2020). However, complex models require translating complex logic into human-intuitive insights. As a result, much of modern XAI research has shifted toward **post-hoc** interpretation methods that explain the predictions of black-box models after the model has been trained and without simplifying the models themselves (Dwivedi et al. 2023). Since around 2015, this field has expanded rapidly, introducing both model-agnostic methods (e.g., SHAP, LIME, partial dependence plots) and model-specific methods. These methods can potentially not only be used to justify model predictions, but to enhance process understanding in the geosciences (Jiang et al. 2024; see Notebook 1).

**Model-agnostic** interpretation methods aim to explain ML models without relying on their internal structures such as weights or coefficients. They follow the **SIPA** principle: 
- **S**ample from the data,
- **I**ntervene on it,
- make **P**redictions using the model,
- and then **A**ggregate the results.

A key benefit of model-agnostic interpretation is the flexibility it offers. The same interpretation method can be used across different models and switching methods without retraining the model is easy.

Model-agnostic methods can be classified into **local** and **global** approaches. Local methods explain individual predictions. With that, they can also be used to find anomalies or explain why specific predictions are wrong. Global model-agnostic methods, on the other hand, describe a model's overall behavior across the dataset. Two broad categories within global model-agnostic methods are **feature effects** and **feature importance**. In general, the usefulness of model-agnostic interpretation for both local and global methods depends on model performance (Molnar 2025). 

This notebook introduces three local and three global model-agnostic interpretation methods, which are briefly explained below. For more details and additional methods, see for example Molnar (2025).

<div style="text-align: center;">
  <img src="../Images/XAI_Model_Agnostic.png" style="width: 400px;">
  <div style="font-size: 14px; margin-top: 8px;">Fig. 1 Overview of interpretability methods in machine learning, modified from Molnar (2025)</div>
</div>

## 1. Local Model-Agnostic Methods

## 1.1 Individual Conditional Expectation (ICE)

Individual Conditional Expectation (ICE) (Goldstein et al. 2015) plots show how the prediction for a single data instance changes when a specific feature is varied, while all other features are kept constant. Each line in an ICE plot represents one instance, with predictions generated by altering the chosen feature across a range of values. This allows us to visualize how sensitive the model’s predictions are to that feature on an individual level.

The method is simple and intuitive, revealing individual prediction patterns and heterogeneous feature effects. However, it is limited to one feature at a time and may produce unrealistic samples when features are correlated, since ICE varies one feature while holding others constant, potentially violating feature dependencies in the data. It often makes sense to explore the plot using transparency when lines are overlapping or by coloring another feature that reveals interaction. If plots become too crowded, sampling or using a partial dependence plot (PDP; see Sect. 2.1) may help. In addition, centered ICE plots (c-ICE) align all curves to the prediction at a chosen anchor point (e.g., the minimum value), showing only relative changes and derivative ICE plots (d-ICE) visualize the rate of change of predictions with respect to the feature. If the derivatives vary across instances, it indicates interactions. 

See Section 2.1 for an example usage of ICE plots together with PDP plots.

---
---

## 1.2 Local Interpretable Model-Agnostic Explanations (LIME)

LIME (Ribeiro, Singh, and Guestrin 2016) explains individual predictions of black box models by training a simple, understandable model (like linear regression or a decision tree) to mimic the black box model's behavior near a specific data point. It does this by creating slightly modified versions of that data point, asking the black box model for predictions on each one, weighting them based on how similar they are to the original, and then fitting the simple model to these weighted examples (e.g. linear regression or a decision tree). The learned model should be a good approximation of the black box model predictions locally, but it does not have to be a good global approximation. LIME is able to explain any black box and can also be applied to text or image data.

LIME has the following steps:
1. **Select the instance** of interest we want to explain the prediction for.
2. **Generate samples** by permuting feature values that are sampled from a **normal distribution**.
3. **Assign a weight** to each sample based on how far they are to the individual instance. LIME gives more importance to data points that are closer to the instance being explained. To do this, it uses a function called an exponential kernel, which assigns higher weights to nearby points and lower weights to distant ones. The size of the neighborhood - how far points can be while still influencing the explanation - is controlled by a parameter called kernel width. 
4. **Make predictions on permutations** using the original black box model.
5. **Train a surrogate model** using the weighted samples & predictions.
6. **Interpret** the surrogate model.

Current limitations with LIME were summarized by Molnar 2025: 
- Because data points are sampled from a Gaussian distribution, the correlation between features is ignored. This can lead to unlikely data points, which are then used to learn the local explanation models.
- Changing the kernel width can significantly affect the explanation. In high-dimensional data, defining what "close" means becomes even harder, since not all features are equally scaled or meaningful in the same way. The choice of neighborhood is an unsolved problem and different kernel widths should be tested.
- LIME explanations can be unstable: small changes in the input or repeating the process can lead to very different results.

---
---

## 1.3 Shapley Values and Shapely Additive Explanations (SHAP)

**Shapley values** come from cooperative **game theory**, introduced by Lloyd Shapley (1953), to fairly distribute a payoff among players based on their contribution. In ML, the prediction for a specific instance is the “payout”, and the features are the players. The goal is to attribute the difference between the prediction and the average prediction to individual features.

Step-by-step for computing the shapley value of one feature:
1. Form all possible **subsets** (coalitions) of the other features. The number of subsets grows exponentially with the number of features. That’s why we use Monte Carlo sampling to avoid having to evaluate every possible subset, which becomes computationally impossible as the number of features increases.
2. For each sampled subset:
    - Compute **prediction with the target feature**.
    - Compute **prediction without the target feature**.
    - The difference is the **marginal contribution** of the feature.
4. Average the marginal contributions, weighted appropriately by subset size. The result is the shapley value for the feature.

The Shapley value of a feature quantifies how much that feature contributed to the difference between the actual prediction and the average prediction for the dataset. **SHAP** (SHapley Additive exPlanations), introduced by Lundberg & Lee (2017), is a practical framework for computing and visualizing Shapley values. It bridges shapley values and LIME. There are multiple ways to estimate shapley values for different algorithms such as KernelSHAP, Permutation Method, and TreeSHAP (for tree-based methods). The Python `shap` package includes a wide range of visualization and aggregation tools (some displayed below). These can provide information about feature importances and effects (including interaction effects). Compared to SHAP, LIME approximates these effects using a local surrogate model, which may not fully capture the original model's behavior.

As with other model-agnostic methods, Shapley values don’t provide causality. They show contributions, not what-if (counterfactual) outcomes. They’re also computationally expensive, as exact calculation requires evaluating all feature subsets (so we rely on approximation via sampling). Another limitation is that SHAP assumes that features are independent unless you're using a version of SHAP that explicitly models conditional distributions. For example, see [`shap.TreeExplainer` documentation](https://shap.readthedocs.io/en/latest/generated/shap.TreeExplainer.html) where the `feature_perturbation` parameter controls this behavior.

---
In the following example, we build on the framework presented by Flora et al. (2024), who applied various XAI methods across multiple atmospheric science tasks. Our focus is on their road surface temperature prediction use case, which involves a binary classification task: predicting whether the road surface temperature is below freezing (0°C) based on near-surface meteorological conditions. The dataset contains approximately 1 million examples collected over two cool seasons and includes 30 input features from temperature variables (e.g., surface and 2-m air temperature), radiation and heat fluxes, cloud coverage, freezing duration metrics, and site-level features (e.g., urban/rural classification and wind speed). The model used is a Random Forest classifier, trained to distinguish freezing from non-freezing road conditions.

For demonstration purposes, we load a much smaller random sample of the original dataset to keep the notebook lightweight. After training on this dataset, we then apply SHAP to interpret the model's predictions, following the structure and insights from the original study.

In [None]:
import pandas as pd
import numpy as np
import shap
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 12})

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

Load dataset:

In [None]:
df = pd.read_csv("../data/Flora_et_al_2024/road_surface_dataset_sampled.csv") 
df.dropna(inplace=True)
df.head()
df.columns

In [None]:
# Mapping from dataset column paper name (inferred from Table 3 in Flora et al. 2024; mappings are approximate and not guaranteed to be fully accurate)
column_mapping = {
    'sfc_temp': 'Tsfc', 'temp2m': 'T2m', 'dwpt2m': 'Td', 'hrrr_dT': 'HRRRdT',
    'swave_flux': 'S', 'vbd_flux': 'Vbd', 'vdd_flux': 'Vdd', 'uplwav_flux': 'λ↑',
    'dllwave_flux': 'λ↓', 'sat_irbt': 'Tirbt', 'lat_hf': 'Lhf', 'sens_hf': 'Shf',
    'gflux': 'G', 'd_ground': 'DG²', 'd_rad_d': 'DS² ↓', 'd_rad_u': 'DS² ↑',
    'tot_cloud': 'Ctotal', 'low_cloud': 'Clow', 'mid_cloud': 'Cmid', 'high_cloud': 'Chigh',
    'tmp2m_hrs_bl_frez': 'Hours T2m ≤ 0°C', 'tmp2m_hrs_ab_frez': 'Hours T2m ≥ 0°C',
    'sfcT_hrs_bl_frez': 'Hours Tsfc ≤ 0°C', 'sfcT_hrs_ab_frez': 'Hours Tsfc ≥ 0°C',
    'fric_vel': 'Vfric', 'sfc_rough': 'SR', 'wind10m': 'U10m',
    'urban': 'Urban', 'rural': 'Rural', 'date_marker': 'Date Marker'
}

# Reduce and rename
df_reduced = df[list(column_mapping)].rename(columns=column_mapping)

Define y and X:

In [None]:
y = df['cat_rt']
X = df_reduced

Apply train-test-split:

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Train and evaluate model:

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
model.fit(X_train, y_train)

print("Classification Report:\n", classification_report(y_test, model.predict(X_test)))

SHAP scales poorly with large datasets. For this demonstration we therefore calculate SHAP values faster using approximation. `approximate=True` tells SHAP to speed up calculations by simplyfying calculations (not the full path-dependent decomposition is applied).

In [None]:
# SHAP TreeExplainer
explainer = shap.TreeExplainer(
    model,
    data=X_test,
    feature_perturbation='interventional'  # safe default
)

# Compute SHAP values with approximation (faster)
shap_values = explainer.shap_values(X_test, approximate=True)

Next, we generate several SHAP plots inspired by Flora et al. (2024). Please note that, in contrast to the original study, we work with a substantially smaller dataset and use SHAP’s approximation mode to speed up computation, which lead to some differences in the resulting explanations.

**1) SHAP summary plot**

A SHAP summary plot aggregates SHAP values for all samples and all features to show overall feature importance and effect distributions. Each dot represents the SHAP value of one feature for one sample. 
- Features are ordered top to bottom by overall importance (mean absolute SHAP value).
- The x-axis shows the SHAP value (impact on model output):
    - Negative values push the prediction lower (less freezing).
    - Positive values push the prediction higher (more freezing).
- The color shows the actual feature value for that sample:
    - Red = high feature value
    - Blue = low feature value

In [None]:
# Select SHAP values for class 1 (positive class)
shap_values_class1 = shap_values[:, :, 1]  # shape: (14257, 29)

# Plot summary plot
shap.summary_plot(shap_values_class1, 
                  X_test,
                  plot_size=(8, 6))

- Tsfc (surface temperature) is the most important. This is consistent with physical intuition (Flora et al. 2024). Cold surface temperatures (blue dots) have positive SHAP values, increasing freezing likelihood. Warm temperatures (red dots) have negative SHAP values, decreasing freezing probability.
- Also freezing duration features are important. Compared to that, most radiation and cloud features have smaller but non-negligible effects.

**2) SHAP dependence plot**

A SHAP dependence plot shows:
- The relationship between a single feature's value (code below: λ↑ on the x-axis) and its SHAP value (y-axis). The SHAP value measures how much that feature contributes to pushing the model output up or down for each sample.
- Points are colored by the value of a second interacting feature (code below: Tsfc), which helps visualize interaction effects.
    - Blue = colder surface temperatures.
    - Red = warmer surface temperatures.

In [None]:
shap.dependence_plot(
    ind='λ↑',
    shap_values=shap_values_class1,
    features=X_test,
    interaction_index='Tsfc',
    show=True
)

We see:
- When λ↑ (upward longwave radiation) is low (~200-280):
    - SHAP values are mostly positive, so λ↑ increases the freezing probability.
    - Points with blue colors (colder Tsfc - surface temperature) tend to have higher SHAP values.
- When λ↑ rises above ~300:
    - SHAP values become mostly negative, meaning high λ↑ actually reduces freezing probability.
    - This effect varies with Tsfc — red points (warmer surfaces) show a stronger negative impact.
In summary, there is a nonlinear relationship and interaction between λ↑ and Tsfc: For colder surfaces, moderate λ↑ values help predict freezing. For warmer surfaces, high λ↑ decreases freezing prediction.

**3) SHAP waterfall plot**

With the SHAP waterfall plot, we are testing how the model arrives at its prediction for one specific road surface instance. Let's use the plot to illustrate which features most influenced the model’s uncertain decision, specifically the instance whose predicted probability of frozen road (class 1) is closest to 0.5 (the decision threshold).

In [None]:
# Get prediction probabilities for positive class
probs = model.predict_proba(X_test)[:, 1]

# Find sample closest to decision threshold (0.5)
index = np.argmin(np.abs(probs - 0.5))
print(f"Chosen index: {index}, prediction: {probs[index]:.3f}")

# Get instance
instance = X_test.iloc[[index]]

# Explain prediction
explainer = shap.TreeExplainer(model)
shap_values = explainer(instance)

# Create explanation for class 1
explanation = shap.Explanation(
    values=shap_values.values[0][:, 1],
    base_values=shap_values.base_values[0][1],
    data=shap_values.data[0],
    feature_names=shap_values.feature_names
)

# Plot waterfall
shap.plots.waterfall(explanation)

What does the plot show?
- Base value: The average model output over the whole dataset. For example, E[f(x)] = 0.396 means on average the model predicts a 39.6% chance of frozen road.
- Final prediction: The model’s prediction for this specific instance (f(x) = 0.5), i.e., 50% frozen probability.
- Colored bars: Each bar corresponds to a feature’s contribution (SHAP value) to move the prediction from the base value toward the final prediction.
    - Red bars (+) increase the predicted probability of freezing.
    - Blue bars (−) decrease the predicted probability of freezing.
- Feature values on the left: The actual value of the feature for this instance (e.g., T2m = 1.651), sorted by importance.

In summary, the final prediction of 0.5 comes from a balance of (temperature) features pushing both ways. 

---
---

## 2. Global Model-Agnostic Methods

## 2.1 Partial Dependence Plot (PDP)

While ICE plots (see Section 1.1) provide individual-level insights into how a prediction changes when one feature is varied, Partial Dependence Plots (PDPs) (Friedman 2001) offer a global perspective by averaging these effects over the entire dataset. A PDP shows the average predicted outcome for different values of a selected feature (or pair of features), marginalizing over all other features. This allows us to understand the general trend of how the model responds to a feature, without focusing on any single data point. The averaging process smooths out individual variations and provides a clearer picture of overall model behavior. However, like ICE plots, PDPs can be misleading when features are strongly correlated.

---
We now use PDP and ICE plots to interpret a Random Forest regressor trained on a bike sharing demand dataset to predict the number of bike rentals using weather conditions, season, and time-related features. This example is adapted from Molnar (2025) and [scikit-learn.org](https://scikit-learn.org/stable/auto_examples/inspection/plot_partial_dependence.html).

In [None]:
from sklearn.datasets import fetch_openml
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
plt.rcParams.update({'font.size': 12})

from sklearn.pipeline import make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestRegressor

from sklearn.inspection import PartialDependenceDisplay

Load and preprocess the dataset:

In [None]:
bikes = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True)
print(bikes)

X, y = bikes.data.copy(), bikes.target

# Downsample for faster computation
X, y = X.iloc[::5], y[::5]

# Simplify rare category in "weather"
X["weather"] = X["weather"].replace("heavy_rain", "rain").astype("category")

Split into train and test based on year (train on year 0 and test on year 1):

In [None]:
mask_train = X["year"] == 0.0
X = X.drop(columns="year")
X_train, y_train = X[mask_train], y[mask_train]
X_test, y_test = X[~mask_train], y[~mask_train]

# Identify feature types
numerical_features = ["temp", "feel_temp", "humidity", "windspeed"]
categorical_features = X_train.columns.difference(numerical_features)

We inspect the dataset to determine data types and apply appropriate preprocessing. We then apply preprocessing: numerical features are passed through without changes ("passthrough"), while categorical features are one-hot encoded.

In [None]:
X_train.info()

preprocessor = ColumnTransformer([
    ("num", "passthrough", numerical_features),
    ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features)
])

Define Random Forest model:

In [None]:
rf_model = make_pipeline(
    preprocessor,
    RandomForestRegressor(n_estimators=100, random_state=0)
)

Train the model:

In [None]:
rf_model.fit(X_train, y_train)

print(f"Test R^2 Score: {rf_model.score(X_test, y_test):.2f}")

Plot PDP and ICE for multiple features:

In [None]:
fig, ax = plt.subplots(ncols=3, figsize=(10, 3), sharey=True, constrained_layout=True)
PartialDependenceDisplay.from_estimator(
    rf_model,
    X_train,
    features=["temp", "humidity", "windspeed"],
    kind="both",
    centered=False,
    subsample=100,
    grid_resolution=20,
    ax=ax
)

for axis in ax:
    legend = axis.get_legend()
    legend.remove()
    axis.set_ylabel("Predicted bike rentals")

plt.show()

- The x-axes in our plots include black tick marks at the bottom. These are rug plots and they show the distribution of the training data along each feature. The denser the ticks in a region, the more data the model saw there. Here, windspeed shows a cluster of samples at lower values (around 8), while temperature and humidity are more evenly distributed across their ranges.
- The y-axis shows the predicted outcome (number of bike rentals). We see that some ICE curves remain flat while the slope of others increases sharply beyond certain thresholds (e.g., 10°C or 80% humidity), indicating variation in individual predictions.
- The dashed line shows the average effects (PDPs). The combination of ICE and PDP gives both the average trend and the individual-level nuances captured by the Random Forest model. Overall, temperature affects bike rentals positively, and humidity negatively, while windspeed appears to have minimal impact because of rather flat curves (changing windspeed does not significantly alter predictions).
- While most ICE curves follow a similar pattern, the noticeable divergence in some temperature and humidity curves suggests potential interactions with other features.

--

Next, let's plot the ICE curves colored by season to explore potential interactions between humidity and seasonal patterns. 

To customize how ICE curves are plotted we have to compute the ICE curves ourselves (again for a subset of the data). For that, we first define a range of humidity values we’ll use to vary the feature. Inside the loop, we:
1. Fix one instance (row).
2. Change only humidity through the grid of values.
3. Keep all other features constant.
4. Predict each time, collect results = ICE curve.
5. Plot the ICE line, colored by season.

In [None]:
# Subset 100 random samples for ICE
subset = X_train.sample(n=100, random_state=0)
seasons = subset["season"].values

# Define a range of humidity values
humidity_range = np.linspace(X_train["humidity"].min(), X_train["humidity"].max(), 20)

# Define color palette for seasons
season_colors = {
    "WINTER": "#6b6bd6",  
    "SPRING": "#a3b8e0", 
    "SUMMER": "#5ecb89", 
    "FALL":   "#f2c04c" 
}

# Compute and plot ICE curves
fig, ax = plt.subplots(figsize=(5, 3), constrained_layout=True)

for _, row in subset.iterrows():
    season = row["season"].upper() 
    preds = []
    for h in humidity_range:
        row_mod = row.copy()
        row_mod["humidity"] = h
        preds.append(rf_model.predict(pd.DataFrame([row_mod]))[0])
    ax.plot(humidity_range, preds, color=season_colors[season], alpha=0.7)
ax.set_ylabel("Predicted bike rentals")

# Add custom legend
legend_handles = [
    Line2D([0], [0], color=color, lw=2, label=season.title())
    for season, color in season_colors.items()
]
ax.legend(handles=legend_handles, title="Season", loc="upper right")

plt.show()

- Coloring ICE curves by season reveals that different seasons have different baseline predictions (intercepts), with winter generally lower and summer higher.
- The effect of humidity varies slightly by season: in winter, humidity has less impact than in other months where bike rentals drop more sharply above 60% humidity. 

--

Plotting PDPs with two features of interest enable us to visualize interactions among them. Let's do that for the features "temp" and "humidity" and their interaction. The third subplot (rightmost) visualizes the model’s average predictions while systematically varying both features together as a heatmap with contours (lines of constant predicted values). These reveale whether their influence on bike rentals is additive (no interaction) or non-additive (interaction present). `PartialDependenceDisplay` automatically understands that the tuple `("temp", "humidity")` implies an interaction term and generates the appropriate 2D plot.

In [None]:
fig, ax = plt.subplots(ncols=3, figsize=(10, 3), constrained_layout=True)

PartialDependenceDisplay.from_estimator(
    rf_model,
    X_train,
    features=["temp", "humidity", ("temp", "humidity")],
    kind="average",
    subsample=50,
    grid_resolution=20,
    ax=ax
)

for axis in ax:
    axis.set_ylabel("Predicted bike rentals")

plt.show()

> ### **Exercise 1:**
> Examine the 2D PDP plot of temperature and humidity (rightmost plot). At what temperature range does the relationship between these features and predicted bike rentals change noticeably, and what does this tell you about how the model uses interactions between features to make predictions?

---
---

## 2.2 Accumulated Local Effects (ALE)

Accumulated Local Effects (ALE) (Apley and Zhu 2020) plots help explain how a single feature affects a model’s predictions on average, and are designed to be a more reliable alternative to PDPs when features are correlated. Like PDPs, they aim to describe the model’s behavior, but they work very differently under the hood. Remember that PDPs show the effect of one feature by replacing it with fixed grid values across all observations while keeping all other features unchanged. The model then predicts for these altered instances, and the predictions are averaged. However, if the replaced feature is correlated with others, this process creates unrealistic or impossible combinations (e.g. a large house with only one room). These implausible combinations can bias the result, making PDPs unreliable in the presence of correlated features.

ALE avoids this problem by only considering prediction changes for real data points. Instead of replacing feature values globally, **ALE looks at small intervals of the feature and measures how predictions change within these narrow windows**. The key idea is to take the difference in predictions when a feature increases slightly (e.g., from 10°C to 12°C), but only for observations where this change actually makes sense given the data. These local differences are averaged per interval and accumulated across the feature range. In contrast to PDP, ALE never alters the other features and works entirely with values that appear in the data.

In addition to showing the effect of a single feature (1D ALE), ALE can also reveal how two features interact (2D ALE). These second-order ALE plots isolate the interaction effect, removing the influence of the individual features. For instance, both high temperature and high humidity may separately reduce bike rentals in our coding example, but their combination might reduce them even more or less than expected from the individual effects. ALE shows this directly, whereas PDP mixes main and interaction effects.

Limitations of ALE include that the reflect average effects and do not provide individual-level curves like ICE plots. Interpretation is within intervals, not across the entire range. And if features are very strongly correlated, ALE may still struggle to isolate the effect of one feature from another.

---

The following code uses the `PyALE` library to compute and visualize a 2D ALE plot for the bike rental prediction model. Specifically, it investigates the interaction between temperature and humidity, i.e. the same two features we explored before using PDPs.

The `grid_size` parameter controls how finely the feature space is divided. A larger grid size gives a more detailed view of the interaction but may become noisy if data is sparse. With `grid_size=15`, both features are split into 15 intervals, creating a 15×15 grid (225 regions). For ~1700 rows, this means each region has enough data to estimate effects reliably. It balances detail and stability: fine enough to show interactions without becoming too noisy or too smooth.

In [None]:
from PyALE import ale

import pandas as pd
import matplotlib.pyplot as plt

# 1. Define feature names
features = X_train.columns.tolist()

# 2. Generate 2D ALE for "temp" and "humidity"
ale_eff_2d = ale(
    X=X_train,                         # DataFrame with original feature names
    model=rf_model,                   # Your pipeline model
    feature=["temp","humidity"],    # 2D interaction
    grid_size=15                      # Increase for finer resolution
)

# 3. The function automatically plots the 2D ALE!
plt.show()

The plot shows how the combined influence of the two features deviates from what we would expect if their effects were simply additive. This means that the color scale represents only the second-order interaction effect and not the overall contribution of temperature or humidity alone. Yellow to lightgreen shade indicates an above average and darker violet shade a below average prediction when the main effects are already taken into account (Molar 2025).

- Overall, this plot confirms that the model detects nonlinear interactions: the impact of temperature on bike rentals depends on humidity and vice versa. If there were no interaction, the plot would display smooth, uniform color transitions along rows or columns, indicating that the combined effect is simply the sum of two independent (additive) effects. 
- However, the plot highlights non-additive effects, for example: cold and humid weather leads to a higher prediction than expected. While the PDPs showed that both low temperature and high humidity individually reduce bike rentals, the 2D ALE plot additionally reveals that their combined effect is not simply the sum of the two. Instead, in cold and humid conditions, the model predicts more rentals than the main effects alone would suggest, indicating a true interaction.

---
---

## 2.3 Permutation Feature Importance (PFI)

Feature importance tells us which input features matter most for the model's predictions. But how we calculate this importance really matters. In Notebook 3, we first saw the default feature importance from Random Forest. This is based on how much each feature helps to reduce uncertainty (impurity) in decision trees. It's fast but biased. It favors numerical features over categorical ones and features with many unique values (even if they are random noise). In addition, feature importance is inherently calculated during training, not during testing. 

To get a more reliable measure of feature usefulness, we can use Permutation Feature Importance (PFI). PFI takes a different approach: instead of looking at how the model was trained, it tests how the model behaves after training, typically using test data. It works by shuffling (permuting) the values of each feature and observing how much the model's performance drops. The idea is:
1. Measure how well the model performs normally (e.g., using accuracy).
2. Shuffle one feature’s values to break its connection to the target.
3. Re-measure performance.
- If performance drops a lot, the feature was important.
- If there's little or no change, the feature wasn’t contributing much.

However, standard PFI shuffles features randomly, which breaks all relationships the feature had with other features (similar to PDPs). This can result in unrealistic or impossible data combinations. To solve this, Conditional PFI shuffles more carefully — for example, by only shuffling within logical subgroups (like species, regions, or categories). This helps maintain realistic feature interactions while still measuring the feature’s unique contribution.

---

Let's now extend our workflow on the penguins dataset from Notebook 3 by adding PFI, which will help us evaluate how much each feature actually contributes to the model's performance on unseen data.

First, run the model as before:

In [None]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 12})

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OrdinalEncoder
from sklearn.metrics import accuracy_score, classification_report

# Load and clean the dataset
penguins = sns.load_dataset("penguins")
penguins = penguins.dropna()

# Separate features and target
X = penguins.drop(columns=['species'])
y = penguins['species']
feature_names = X.columns.tolist()

# Encode categorical features
categorical_cols = X.select_dtypes(include='object').columns
encoder = OrdinalEncoder()
X[categorical_cols] = encoder.fit_transform(X[categorical_cols])

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train Random Forest model
rfc = RandomForestClassifier(max_features=X_train.shape[1], random_state=0)
rfc.fit(X_train, y_train)

# Evaluate model
y_pred = rfc.predict(X_test)
print(f"Number of trees in the forest: {len(rfc.estimators_)}")
accuracy = accuracy_score(y_test, y_pred)
print(f"Test set accuracy: {accuracy:.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Gini-based feature importance (impurity-based)
importances = rfc.feature_importances_
feature_imp_df = pd.DataFrame({
    'Feature': feature_names,
    'Gini Importance': importances
}).sort_values('Gini Importance', ascending=False)
print(feature_imp_df)

Next, calculate PFI:

In [None]:
from sklearn.inspection import permutation_importance

# PFI on Test Set
pfi_test_result = permutation_importance(
    rfc, X_test, y_test,
    n_repeats=10,
    random_state=42,
    n_jobs=-1
)
pfi_test_df = pd.DataFrame({
    'Feature': X_test.columns,
    'Permutation Importance (Test)': pfi_test_result.importances_mean
}).sort_values('Permutation Importance (Test)', ascending=False)

# PFI on Training Set
pfi_train_result = permutation_importance(
    rfc, X_train, y_train,
    n_repeats=10,
    random_state=42,
    n_jobs=-1
)
pfi_train_df = pd.DataFrame({
    'Feature': X_train.columns,
    'Permutation Importance (Train)': pfi_train_result.importances_mean
}).sort_values('Permutation Importance (Train)', ascending=False)

# Merge for side-by-side comparison
pfi_combined_df = pd.merge(
    pfi_train_df,
    pfi_test_df,
    on='Feature'
).set_index('Feature')

print(pfi_combined_df)

# Plot both side-by-side
fig, ax = plt.subplots(1, 2, figsize=(12, 4), sharey=True)

# Train PFI
pfi_combined_df.sort_values("Permutation Importance (Train)", ascending=True)['Permutation Importance (Train)'].plot.barh(ax=ax[0], color="salmon")
ax[0].set_title("Permutation Importance (Train Set)")

# Test PFI
pfi_combined_df.sort_values("Permutation Importance (Test)", ascending=True)['Permutation Importance (Test)'].plot.barh(ax=ax[1], color="skyblue")
ax[1].set_title("Permutation Importance (Test Set)")

plt.tight_layout()
plt.show()

- "bill_length_mm" is clearly the most important feature, also according to Gini importance.
- "flipper_length_mm" and "bill_depth_mm" appear overestimated in Gini importance when comparing it to PFI (for both, training and testing sets).

## References and Further Learning

Apley, D. W. and Zhu, J.: Visualizing the effects of predictor variables in black box supervised learning models, Journal of the Royal Statistical Society Series B: Statistical Methodology, 82, 1059–1086, 2020.

Dramsch, J. S.: 70 years of machine learning in geoscience in review, Advances in geophysics, 61, 1–55, 2020.

Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R., Patel, P., Qian, B., Wen, Z., Shah, T., and Morgan, G.: Explainable AI (XAI): Core ideas, techniques, and solutions, ACM Computing Surveys, 55, 1–33, doi:10.1145/3561048,    2023.

Flora, M. L., Potvin, C. K., McGovern, A., and Handler, S.: A machine learning explainability tutorial for atmospheric sciences, Artificial Intelligence for the Earth Systems, 3, e230018, doi:10.1175/AIES-D-23-0018.1,    2024.

Friedman, J. H.: Greedy function approximation: a gradient boosting machine, Annals of statistics, 1189–1232, 2001.

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E.: Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation, journal of Computational and Graphical Statistics, 24, 44–65, doi:10.1080/10618600.2014.907095,  2015.

James, G., Witten, D., Hastie, T., Tibshirani, R., and Taylor, J.: An Introduction to Statistical Learning with Applications in Python, An Introduction to Statistical Learning: with Applications in Python, 1, 2023.

Jiang, S., Sweet, L., Blougouras, G., Brenning, A., Li, W., Reichstein, M., Denzler, J., Shangguan, W., Yu, G., Huang, F., and Zscheischler, J.: How Interpretable Machine Learning Can Benefit Process Understanding in the Geosciences, Earth's Future, 12, doi:10.1029/2024EF004540,    2024.

Lundberg, S. M. and Lee, S.-I.: A unified approach to interpreting model predictions, Advances in neural information processing systems, 30, 2017.

Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (3rd ed.). Retrieved from christophm.github.io/interpretable-ml-book/, 2025.

Ribeiro, M. T., Singh, S., and Guestrin, C.: "Why Should I Trust You?", in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144.

Shapley, L. S.: A value for n-person games, in: Contributions to the Theory of Games (AM-28), 307–318.