# AI Explainability

In [None]:
!pip install -r requirements.txt

In [None]:
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import shap

from tqdm.notebook import tqdm

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler

from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

The goal of this notebook is to take you through some AI explainability tooling, starting with simple models that can be easily understood, moving right up to Shapley values.

## 1 - Load and Clean the Data

As we did in the tree-based-methods course, we'll be working with a dataset called adult census which can be found [here](https://archive.ics.uci.edu/ml/datasets/adult). This contains US census information from 1994. The task is to predict whether or not an individual in the dataset earns more than $50k.

In [None]:
if not Path('adult-census.csv').exists():
    !wget https://s3-eu-west-1.amazonaws.com/faculty-client-teaching-materials/explainability/adult-census.csv
df = pd.read_csv("adult-census.csv")

As before we'll need to clean the salary column to make it suitable for prediction.

In [None]:
def convert_salary(salary):
    """
    salary: str
        This should be an entry from df["salary"]
    """

    if salary == " <=50K" or salary == " <=50K.":
        output = 0
    elif salary == " >50K" or salary == " >50K.":
        output = 1
    else:
        raise ValueError(f"Invalid input {salary}")
    return output

In [None]:
df["salary"] = df["salary"].map(convert_salary)

Recall that the classes are imbalanced, as shown below:

In [None]:
df["salary"].value_counts() / len(df)

So about 24% of people have a salary above $50k.

Let's now split off the target column from our features. We'll also drop the `education` column, as this has already been encoded in a sensible way numerically using `education-num`.

In [None]:
X = df.drop(
    ["salary", "education"], axis=1
).copy()  # this stops X being a reference to df
y = df["salary"]

Finally, and just as before, we need to convert the categorical features into numercial columns. These categorical columns are listed below.

In [None]:
cat_cols = [
    "workclass",
    "marital-status",
    "occupation",
    "relationship",
    "race",
    "sex",
    "native-country",
]

Before we use `pd.get_dummies`, let's have a look at the different values within each category. The reason for this is we want to be slightly careful about dropping the redundant categories - our goal is to understand how important different features are to a model and this might become difficult if we discard an arbitrary category.

In [None]:
X = pd.get_dummies(X, drop_first=False)

categories = {}
for col in cat_cols:
    categories[col] = [c for c in X.columns if c.split("_")[0] == col]

In [None]:
categories

Looking through this it is clear that most categories have an "Other" column, denoted either by a "?" or "Other". The exceptions are `sex`, `marital-status` and `relationship`. For `sex` we'll just keep the male column as it's a binary variable. For `marital-status` and `relationship` we'll keep all of their categories, we'll drop `native-country` entirely as it is very high cardinality, and for the other columns we'll remove the "other" class.

In [None]:
to_drop = ["sex_ Female"]
for col in X.columns:
    if (
        col.split("_")[-1] in [" ?", " Other"]
        or col.split("_")[0] == "native-country"
    ):
        to_drop.append(col)
print(to_drop)

In [None]:
X.drop(to_drop, axis=1, inplace=True)

Let's now create a training and testing set. We won't be hyperparamter tuning here so we can do without a validation set.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

Finally, it will be helpful later to normalize our features.

In [None]:
scaler = MinMaxScaler()
scaler.fit(X_train)  # note we fit the scaler with just the training set

X_train = pd.DataFrame(scaler.transform(X_train), columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)

## 2 - Intrinsically Interpretably Models

Let's now fit an intrinsically interpretable model, namely logistic regression. You haven't come across logistic regression in a lot of detail, but as simple way of thinking about it is it's the classification counterpart to linear regression. It's defined as follows:
$$ p_i = \sigma\left( \beta_0 + \sum_{j=1}^k \beta_j x_{ij} \right)$$
where $p_i$ is the probability assigned to predicting the $i$-th value, there are $k$ features, and
$$\sigma(x):= \frac{1}{1 + e^{-x}}$$

The coefficients naturally give a way of explaining the model and also therefore of measuring feature importance. As we've normalised our features, we can compare the relative sizes of the coefficients to each other to measure importance.

Let's now fit an train our logistic regression.

In [None]:
logistic = LogisticRegression(max_iter=1000)
logistic.fit(X_train, y_train)

Let's compare the accuracies across the training and testing sets.

In [None]:
print(logistic.score(X_train, y_train))
print(logistic.score(X_test, y_test))

Scaling appears to have helped the model greatly! Let's plot a graph of its coefficients. You can access the coefficients of logistic regression by looking at the `.coef` attribute.

In [None]:
logistic.coef_

This is actually a 2-dimensional array with one row, so in most of what follows we'll want to plug in `logistic.coef_[0]`.

**Ex:** Write a function that given a list of feature importances and labels for them plots them on a bar chart.

In [None]:
def plot_feature_importances(scores, labels, normalize=False):
    """
    scores: array-like
        A list of feature scores.
    labels: array-like
        A list of feature labels, the same length of scores.
    normalize: bool
        Whether or not to normalize the scores by L1-norm.
    """

    # put the scores in a dataframe
    to_plot = pd.DataFrame({"score": scores, "label": labels})

    if normalize:
        to_plot["score"] /= np.absolute(to_plot["score"].values).sum()

    # sort the dataframe by score
    to_plot = to_plot.sort_values("score", ascending=False)

    plt.figure(figsize=(10, 10))
    plt.bar(to_plot["label"], to_plot["score"])
    plt.xticks(rotation="vertical")
    plt.ylabel("Score")
    plt.show()

In [None]:
plot_feature_importances(np.absolute(logistic.coef_[0]), X.columns)

This is a decent first attempt, but it is pretty hard to interpret due to the sheer number of features. One way we could combat this is by combining the dummied-categorical columns together. Let's do that now.

The function below will try and combine categorical columns scores together.

In [None]:
from collections import defaultdict


def combine_importances(scores, labels):
    """
    Combine categorical scores together.
    
    WARNING:
        Definition of a cat-col is precence of an underscore.
        This works for adult-census but won't necessarily
        for other datasets.
    """
    score_dict = defaultdict(float)
    for (score, label) in zip(scores, labels):
        cat_col_split = label.split("_")[0]
        score_dict[cat_col_split] += abs(score)
    return list(score_dict.values()), list(score_dict.keys())

Let's try plotting that instead.

In [None]:
plot_feature_importances(
    *combine_importances(logistic.coef_[0], X.columns), normalize=True
)

That looks a lot better. One downside is we've lost the ability to tell the directionality of how features effect predictions (which we could have done in the first plot), i.e. it's no longer possible to tell if being more educated has a positive or a negative effect on your earnings, we just know it's significant.

## 3 - Model Specific Methods

Some models are not intrinsically interpetable, but due to their structure come with a way of calculating feature importances. We'll have a look at Random Forests in this section.

In [None]:
forest = RandomForestClassifier(
    n_estimators=100, random_state=42, min_samples_leaf=10
)
forest.fit(X_train, y_train)

In [None]:
print(forest.score(X_train, y_train))
print(forest.score(X_test, y_test))

Decision trees have a natural way of measuring feature importance; one can calculate how much a feature decreases the gini impurity/entropy. To apply this to a random forest, you simply average this across trees in the forest. You can access these by looking at the `.feature_importances_` attribute; note these will sum up to 1.


In [None]:
forest.feature_importances_

**Ex:** Plot the feature importance graph where the individual dummied-columns are combined together like the above.

In [None]:
plot_feature_importances(
    *combine_importances(forest.feature_importances_, X.columns), True
)

The random forest seems to be using different features to the logistic regression, in particular Logistic Regression makes a lot of use of `capital-gain`.

## 4 - Model Agnostic Methods: Permutation Feature Importance

A big upside of the above approaches with random forests and logistic regression is that we essentially got the feature importances "for free". A downside is that we can't easily compare them to each other; we can normalize them both but they are essentially very different methods. One way of overcoming this difficulty is to use "permutation" feature importance on both models.

To calculate this, you simply shuffle a column (or a group of columns) and see how it affects the models pefromance with respect to some scoring metric. Often we will repeate this process many times and output the average. In fact this "group" point is very useful, we can easily see the importance of a group of categorical columns by shuffling its dummies as a group.

**Ex:** Write a function that will calculate the shuffle feature importance of a (group of) columns, you may find `np.random.permutation` to be useful.

In [None]:
def get_shuffle_importance(scoring_function, data, target, column, n_iters=5):
    """
    scoring_function: function
        A function with which to measure model, should take
        arguments in the form scoring_function(data, target).
    data: pandas.DataFrame
        The data with which to calculate feature importance.
    target: array-like
        The target that was being predicted
    column: str or array-like
        If str this should be the column to calculate feature importance for.
        O/w a list (or similar) of columns.
    n_iters: int, optional
        The number of times to do the shuffling, default is 5.
    """

    perm_scores = []
    opt_score = scoring_function(data, target)
    for _ in range(n_iters):
        data0 = data.copy()  # to make sure we don't change the original data
        data0[column] = np.random.permutation(data0[column])

        # calculate the perm score
        perm_score = scoring_function(data0, target)

        perm_scores.append(perm_score)

    perm_scores = np.array(perm_scores)

    perm_scores = np.absolute(perm_scores - opt_score) / opt_score
    return perm_scores.mean()

Let's try using it for the logistic and random forest models using accuracy as our metric. To do this we'll first need to group the dummied columns back together. This is done below for you.

In [None]:
column_groups = defaultdict(list)
for col in X.columns:
    if col.split("_")[0] != col:  # then it's a categorical
        column_groups[col.split("_")[0]].append(col)
    else:
        column_groups[col].append(col)

**Ex:** Use the `column_groups` dictionary to create a dataframe with three columns; `label` being the original column names, `forest_acc` being the forest accuracy feature importance, and `logistic_acc` being the logistic accuracy feature importance. Calculate these importances on the training set. Call the dataframe `importances_df`.

In [None]:
importances_dict = {"label": [], "forest_acc": [], "logistic_acc": []}

for feature, group in tqdm(column_groups.items()):
    importances_dict["label"].append(feature)

    logistic_acc = get_shuffle_importance(
        logistic.score, X_train, y_train, group
    )
    forest_acc = get_shuffle_importance(forest.score, X_train, y_train, group)

    importances_dict["forest_acc"].append(forest_acc)
    importances_dict["logistic_acc"].append(logistic_acc)

importances_df = pd.DataFrame(importances_dict)

In [None]:
importances_df

**Ex:** Compare the feature importance graphs made by this method to the previous ones.

In [None]:
plot_feature_importances(importances_df["forest_acc"], importances_df["label"])

In [None]:
plot_feature_importances(
    importances_df["logistic_acc"], importances_df["label"]
)

You may find that they're a bit different! The above demonstrates a clear problem with feature importance; different metrics which *a priori* sound reasonable give different answers. Another downside is that they can't explain anything about local feature importance; `education-num` may be important in general but it might be more or less significant for individual points.

Before moving on, you may want to think about the following:
- In order to aggregate the importance of categorical features to random forests using `.feature_importances_`, we simply summed the individual components. However, a more correct way to do this would have been to have done the summing before taking the average across trees. You may want to see if you can do this yourself.
- A big advantage of the permutation approach is that it allows us to calculate feature importance on a validation set distinct from the training set, try doing that above and see if it changes things much.
- We chose accuracy above, but we could have chosen log loss. You could try rerunning the above using logloss as your scoring function instead, using the functions defined below.

In [None]:
def forest_logloss(data, target):
    return log_loss(target, forest.predict_proba(data)[:, 1])


def logistic_logloss(data, target):
    return log_loss(target, logistic.predict_proba(data)[:, 1])

## 5 - Model Agnostic Methods: Shapley Values

Shapley values are a way of explaining ML models that come from cooperative game theory. They allow us to give local explanations of data points, as well as getting global information by summing over individual points. Let's try it out below for the random forest; we use the `shap` package found [here](https://github.com/slundberg/shap).

One disadvantage of shap is that it can be quite slow. To speed things up, we'll use a subsample below.

In [None]:
sample = X_train.sample(100)

In [None]:
explainer = shap.SamplingExplainer(
    lambda x: forest.predict_proba(x)[:, 1], sample
)
shap_values = explainer.shap_values(
    sample, nsamples=100, l1_reg=f"num_features({sample.shape[1]})"
)

Each point now has its own explanation (or feature importance). In order to aggregate these together, we first take the absolute value and then take the mean.

In [None]:
global_shaps = np.absolute(shap_values).mean(axis=0)

We'll now plot the feature importances for the individual categorical columns.

In [None]:
plot_feature_importances(*combine_importances(global_shaps, sample.columns))

The results are quite similar, though `hours-per-week` and `age` look more important here. The `shap` package actually has its own plotting built in, let's check it out below:

In [None]:
shap.summary_plot(shap_values, features=sample)

The above shows how much variability there is in features importance for different points, and the importance of local explanations. If you have some time, try repeating the above with the logistic regression, and looking at the `shap` package in more detail.

The `shap` package has downsides though, some of which are unique to it and others which are a problem for Shapley values more generally:
- It's quite slow and often requires some sampling
- You may recall from the slides that Shapley values are meant to "sum to the accuracy of the model. This cannot be true with `shap` though, as we haven't passed the true labels anywhere
- It would have been better if we could have passed our column groups to the the explainer class directly as opposed to having to do the aggregation afterwards and treating them as independent features.

This is actually a topic of current reseach, and is something which the Faculty R&D team are actively involved with. For more details on AI Safety, see the blogs on our website, for example [here](https://faculty.ai/blog/machine-learning-model-explainability-through-shapley-values/). 