# Interpretability & Model Inspection

One way to probe the models we build is to test them against the established knowledge of domain experts. In this final section, we’ll explore how to build intuitions about our machine learning model and avoid pitfalls like spurious correlations. These methods for model interpretability increase our trust into models, but they can also serve as an additional level of reproducibility in our research and a valuable research artefact that can be discussed in a publication.

This part of the tutorial will also go into some considerations why the feature importance of tree-based methods can serve as a start but often shouldn’t be used as the sole source of truth regarding feature interpretation of our applied research.

This section will introduce tools like `shap`, discuss feature importance, and manual inspection of models.

First we'll split the data from [the Data notebook](/notebooks/0-basic-data-prep-and-model.html) and load the model from [the Sharing notebook](https://ml.recipes/notebooks/3-model-sharing.html).

In [1]:
from pathlib import Path

DATA_FOLDER = Path("..", "..") / "data"
DATA_FILEPATH = DATA_FOLDER / "penguins_clean.csv"

In [2]:
import pandas as pd
penguins = pd.read_csv(DATA_FILEPATH)
penguins.head()

Unnamed: 0,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Sex,Species
0,39.1,18.7,181.0,MALE,Adelie Penguin (Pygoscelis adeliae)
1,39.5,17.4,186.0,FEMALE,Adelie Penguin (Pygoscelis adeliae)
2,40.3,18.0,195.0,FEMALE,Adelie Penguin (Pygoscelis adeliae)
3,36.7,19.3,193.0,FEMALE,Adelie Penguin (Pygoscelis adeliae)
4,39.3,20.6,190.0,MALE,Adelie Penguin (Pygoscelis adeliae)


In [3]:
from sklearn.model_selection import train_test_split
num_features = ["Culmen Length (mm)", "Culmen Depth (mm)", "Flipper Length (mm)"]
cat_features = ["Sex"]
features = num_features + cat_features
target = ["Species"]

X_train, X_test, y_train, y_test = train_test_split(penguins[features], penguins[target[0]], stratify=penguins[target[0]], train_size=.7, random_state=42)

ModuleNotFoundError: No module named 'sklearn'

In [4]:
from sklearn.svm import SVC
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from joblib import dump, load

MODEL_FOLDER = Path("..", "..") / "model"
MODEL_EXPORT_FILE = MODEL_FOLDER / "svc.joblib"

model = load(MODEL_EXPORT_FILE)
model.score(X_test, y_test)

ModuleNotFoundError: No module named 'sklearn'

With the model loaded correctly, we can start diving into explainable AI, interpretability and model inspection!

## Partial Dependence for Machine Learning Interpretability

The scikit-learn library provides powerful tools for understanding the relationship between features and the target variable in machine learning models, including the `PartialDependenceDisplay` and `partial_dependence` functions. These tools enable us as researchers to visualize and quantify the impact of individual features on model predictions, helping to uncover complex patterns and relationships within the data. 

The partial dependence plots attempt to show an "average response" of a model to a certain feature changing.

The `PartialDependenceDisplay` function offers an intuitive way to visualize the partial dependence of the target variable on one or two features, allowing researchers to assess how the model's predictions change as the values of these features vary while marginalizing over the values of all other features. 

Similarly, the `partial_dependence` function computes the partial dependence of the target variable on one or more features, providing numerical values that quantify the magnitude and direction of the relationship between each feature and the target. 

In [None]:
from sklearn.inspection import PartialDependenceDisplay, partial_dependence
from matplotlib import pyplot as plt


pd_results = partial_dependence(model, X_train.sample(20), num_features)
print(pd_results.keys())
print(f"Example Values: {pd_results['values'][0]}, Average: {pd_results['average'][0][0].mean(axis=0)}")

ModuleNotFoundError: No module named 'sklearn'

Luckily, we already have the plotting functionality available from scikit-learn directly and re-use the model we loaded earlier!

We'll create a partial dependence plot for each penguin class!

In [6]:
PartialDependenceDisplay.from_estimator(model, X_train, [0,1,2], target=list(y_train.unique())[0])
plt.title(f"Partial Dependence of SVC on Penguin Features for {y_train.unique()[0]}")
plt.show()

NameError: name 'PartialDependenceDisplay' is not defined

In [7]:
PartialDependenceDisplay.from_estimator(model, X_train, [0,1,2], target=list(y_train.unique())[1])
plt.title(f"Partial Dependence of SVC on Penguin Features for {y_train.unique()[1]}")
plt.show()

NameError: name 'PartialDependenceDisplay' is not defined

In [8]:
PartialDependenceDisplay.from_estimator(model, X_train, [0,1,2], target=list(y_train.unique())[2])
plt.title(f"Partial Dependence of SVC on Penguin Features for {y_train.unique()[2]}")
plt.show()

NameError: name 'PartialDependenceDisplay' is not defined

These plots can be very insightful, if you know how to interpret them correctly.

We have the 3 features and how varying these changes the impact in predicting a specific class.

Interestingly, we can see that the Culmen length for Adelie is smaller, because larger values reduce the partial dependence, Chinstrap penguins however seem to have a larger Culmen length, and Gentoo is almost unaffected by this feature!

Similarly only Gentoo seems to have larger Flippers, whereas smaller flippers have a lower partial dependence for large values.

I'm not a penguin expert, I just find them adorable, and I'm able to glean this interpretable information from the plots. I think is a great tool! 🐧

### Feature importances with Tree importance vs Permutation importance

Understanding feature importance is crucial in machine learning, as it helps us identify which features have the most significant impact on model predictions. 

Two standard methods for assessing feature importance are Tree Importance and Permutation Importance.
Tree Importance, usually associated with tree-based models like random forests, calculates feature importances based on how frequently a feature is used to split nodes in the trees. It's a counting exercise.

Features frequently selected for splitting are considered more important because they contribute more to the model's predictive performance. One benefit of Tree Importance is its computational efficiency, as feature importance can be readily obtained by training. However, Tree Importance may overestimate the importance of correlated features, features with high cardinality and randomness, and features that struggle with feature interactions.

On the other hand, Permutation Importance assesses feature importance by measuring the decrease in model performance when the values of a feature are randomly shuffled. Features that, when shuffled, lead to a significant decrease in model performance are deemed more important. Permutation Importance is model-agnostic and can be applied to any type of model, making it versatile and applicable in various scenarios. Additionally, Permutation Importance accounts for feature interactions and is less biased by correlated features. However, it is computationally more expensive, especially for models with large numbers of features or complex interactions.

People are interested in feature importances for several reasons. Firstly, feature importances provide insights into the underlying relationships between features and the target variable, aiding in feature selection and dimensionality reduction. 

Moreover, understanding feature importances helps researchers and practitioners interpret model predictions and identify potential areas for improvement or further investigation. Feature importances can also inform domain experts and stakeholders about which features are driving model decisions, enhancing transparency and trust in machine learning systems.

We'll start out by training a different type of model in this section, a standard Random Forest. Then we can directly compare the tree-based feature importnace with permutation importances. The data split from [the Data notebook](/notebooks/0-basic-data-prep-and-model.html) we established earlier remains the same and the pre-processing is also the same, despite Random Forests dealing with non-normalised data well.

In [9]:
from sklearn.ensemble import RandomForestClassifier

num_transformer = StandardScaler()
cat_transformer = OneHotEncoder(handle_unknown='ignore')

preprocessor = ColumnTransformer(transformers=[
    ('num', num_transformer, num_features),
    ('cat', cat_transformer, cat_features)
])

rf = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(random_state=42)),
])

rf.fit(X_train, y_train)
rf.score(X_test, y_test)

ModuleNotFoundError: No module named 'sklearn'

Now we can simply plot the feature importances obtained from training the model.

These will always be slightly different, due to the training process of Random Forests on randomly selected subsets of the data.

In [None]:
pd.Series(rf.named_steps["classifier"].feature_importances_, index=num_features+['Female', 'Male']).plot.bar()
plt.show()

NameError: name 'rf' is not defined

The tree-based feature importance shows the importances as the "random forest sees them", which means we get the `Sex` feature split into male and female from the OneHotEncoding. This also means that this categorical features is correlated strongly.

We can clearly see that the `Culmen length` is the most important feature in determining which penguin we're facing. `Culmen depth` seems to be slightly less important than `Flipper length`. `Sex` seems to be entirely unimportant.

Now we can use the more sophisticated permutation importance. 

Luckily, scikit-learn implements this feature for us and we can just import it:

In [11]:
from sklearn.inspection import permutation_importance

result = permutation_importance(
    rf, X_train, y_train, n_repeats=10, random_state=42
)

fi_rf_train = pd.Series(result.importances_mean, index=features)
fi_rf_train.plot.bar()
plt.show()

ModuleNotFoundError: No module named 'sklearn'

We can see that the permutation importance gives a lower weight to `Culmen Depth` in the Random Forest and a slightly higher importance to `Sex`.

Overall it's similar in that `Culmen Length` is still the most important and `Flipper length` is the second most important, while the relative importance changes somewhat. 

These differences can be much more pronounced in more complex models.

The really neat feature is however, that we can apply this to the SVM model, which does not have internal importances!

In [12]:
result = permutation_importance(
    model, X_train, y_train, n_repeats=10, random_state=42
)

fi_svm_train = pd.Series(result.importances_mean, index=features)
fi_svm_train.plot.bar()
plt.show()

NameError: name 'permutation_importance' is not defined

Here we can see that `Culmen length` is still the most important and `Sex` is mostly unimportant, but the relative importances of `Culmen depth` and `Flipper length` change.

In [None]:
result = permutation_importance(rf, X_test, y_test, n_repeats=10, random_state=42)
fi_rf_test = pd.Series(result, index=features)
result = permutation_importance(model, X_test, y_test, n_repeats=10, random_state=42)
fi_svm_test = pd.Series(result, index=features)

pd.DataFrame({"RF Train": fi_rf_train, "SVM Train": fi_svm_train, "RF Test": fi_rf_test, "SVM Test": fi_svm_test}).plot.bar()

## Shap Inspection


In [13]:
import shap

rf = RandomForestClassifier()
rf.fit(X_train[num_features], y_train)

explainer = shap.Explainer(rf)
explainer

ModuleNotFoundError: No module named 'shap'

In [14]:
shap_values = explainer.shap_values(X_test[num_features])

NameError: name 'explainer' is not defined

In [15]:
shap.initjs()

NameError: name 'shap' is not defined

In [16]:
shap.force_plot(explainer.expected_value[0], shap_values[0][0], feature_names=num_features)

NameError: name 'shap' is not defined

In [17]:
shap.force_plot(explainer.expected_value[0], shap_values[0], feature_names=num_features)

NameError: name 'shap' is not defined

## Model Inspection

There are several tools that work for figuring out that a model is doing what it's supposed to do. Scikit-learn classifiers mostly work out of the box, which is why we don't necessarily have to debug the models.

Sometimes we have to switch off regularization in scikit-learn to achieve the model state we expect.

In neural networks we are working with many moving parts. The first step is a practical step: Overfit a small batch of data on the network. This ensures that the model is capable of learning and all the connections are made as expected. This works as a first-order sense check that models are performing.

A more in-depth solution for Pytorch is [Pytorch Surgeon](https://github.com/archinetai/surgeon-pytorch), which can be used to extract submodels of the complete architecture for debugging purposes.

Some example code from the Pytorch Surgeon Docs (torch and surgeon are not installed to save space):


In [18]:
import torch
import torch.nn as nn
from surgeon_pytorch import Extract, get_nodes

class SomeModel(nn.Module):

    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(5, 3)
        self.layer2 = nn.Linear(3, 2)
        self.layer3 = nn.Linear(1, 1)

    def forward(self, x):
        x1 = torch.relu(self.layer1(x))
        x2 = torch.sigmoid(self.layer2(x1))
        y = self.layer3(x2).tanh()
        return y

model = SomeModel()
print(get_nodes(model)) # ['x', 'layer1', 'relu', 'layer2', 'sigmoid', 'layer3', 'tanh']

ModuleNotFoundError: No module named 'torch'

This enables us to extract the model at one of the nodes above:

In [19]:
model_ext = Extract(model, node_out='sigmoid')
x = torch.rand(1, 5)
sigmoid = model_ext(x)
print(sigmoid) # tensor([[0.5570, 0.3652]], grad_fn=<SigmoidBackward0>)

NameError: name 'Extract' is not defined