# Model Agnostic Methods

As models become more and more complex, our ability to understand them with our monkey brains decreases. We can understand how they function, but it becomes impossible to follow the exact mathematical inner-workings. This is what is widely known as a **black-box** model. There is a widely-shared notion that black box models are uninterpretable and we cannot understand how they work.

In my opinion, this is lazy and is spread by fearmongers who have never attempted it. As a whole, models are deterministic. Even if we can't see what is going on inside, we can still observe the input and the output. We can fall back to the naive, childish ways of Deedee "Ooh, what does this button do?". 

This is the essence of model-agnostic methods. We modify the input in a controlled way and observe the change in output in order to reason about the behaviour of the model. 

This notebook will explore the following techniques:

1. Partial Dependence Plot (PDP) - observe the mean prediction as a given feature is set to a given level for the **entire** dataset
2. Permutation Feature Importance - measure the dip in performance as information about a given feature is lost (via permutation)

## Data Loading

In [5]:
from __future__ import print_function
import pandas as pd
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import fastprogress

import seaborn as sns
sns.set('talk')
sns.set_style('white')

In [6]:
data = pd.read_csv("data/Nasa.csv", usecols = [1,4])
# data['brand'] = data['name'].apply(lambda x: x.split(' ')[0])
# data['selling_price'] = data['selling_price'] * 0.014
# data['selling_price'] = data['selling_price'] / 1000
data.head()

Unnamed: 0,TIME,T2M_
0,0,-5.84125
1,0,-3.54462
2,1,-2.18427
3,1,-0.3078
4,2,2.10742


In [7]:
valid_ds = pd.read_csv("data/valid.csv",index_col=0)
valid_ids = valid_ds.index

## Train a Neural Network via fast.ai

The fast.ai library provides us with a modern neural net architecture for tabular data that can handle categorical variables by associating them with learned embeddings.



In [11]:
!pip install fastai
from fastai.tabular.all import *

Collecting fastai
  Using cached fastai-2.0.12-py3-none-any.whl (355 kB)


ERROR: Could not find a version that satisfies the requirement torch>=1.6.0 (from fastai) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.6.0 (from fastai)


ModuleNotFoundError: No module named 'fastai'

In [None]:
CATEGORICAL_FEATURES = ["year", "fuel", "seller_type", "transmission", "owner", "brand"]

dls = TabularDataLoaders.from_df(data, 
                                 cat_names=CATEGORICAL_FEATURES, 
                                 cont_names=["km_driven"],
                                 valid_idx=valid_ids,
                                 y_names="selling_price", 
                                 procs=[Categorify, FillMissing, Normalize])

In [None]:
learn = tabular_learner(dls,loss=mse, metrics=rmse)
learn.lr_find()

In [None]:
learn.fit_one_cycle(12, 0.01)

In [None]:
learn.fit_one_cycle(12, 0.01)

In [None]:
learn.fit_one_cycle(12, 0.001)

In [None]:
learn.fit_one_cycle(12, 0.001)

In [None]:
learn.model

## Partial Dependence Plot

The procedure to obtain a partial dependence plot is as follows: for a given feature, define a set of feature values. For each feature value, copy the input data but replace the original feature value with the surrogate. Afterwards, run inference and record the mean prediction. The PDP is simply a plot of the mean predictions (over the entire dataset) at each feature value.

For example, for continuous variables, we might decide to evaluate the PDP at every (max - min) / 100 feature value.
For categorical variables, it is even easier - simply observe the mean prediction when we substitute the feature values for the entire dataset with each category.

In [None]:
feat = "km_driven"
min_fv = data[feat].min()
max_fv = data[feat].max()
preds_by_fv = {}
for fv in fastprogress.progress_bar(np.arange(min_fv, max_fv, (max_fv - min_fv)/ 100)):
    modified_df = data.copy()
    modified_df[feat] = fv
    modified_dl = learn.dls.test_dl(modified_df)
    with learn.no_bar():
        preds = learn.get_preds(dl=modified_dl)
    preds_by_fv[fv] = preds[0].mean().item()
ax = pd.Series(preds_by_fv).plot(figsize=(16,9),lw=6)

In [None]:
@interact(cat_feat=CATEGORICAL_FEATURES)
def plot_cat_feat_pdp(cat_feat="year"):
    preds_by_fv = {}
    for fv in learn.dls.categorify[cat_feat]:
        if fv == '#na#': continue
        modified_df = data.copy()
        modified_df[cat_feat] = fv
        modified_dl = learn.dls.test_dl(modified_df)
        with learn.no_bar():  preds = learn.get_preds(dl=modified_dl)
        preds_by_fv[fv] = preds[0].mean().item()
    ax = pd.Series(preds_by_fv).sort_values().plot(kind='barh',figsize=(16,9))
    sns.despine(ax=ax)
    ax.set_title(f"Partial Dependence Plot for {feat}")
    ax.set_xlabel("Mean Prediction")
    ax.set_ylabel("Feature Value")

## Permutation Feature Importance

Permutation feature importance attemps to quantify the utility of a given feature by breaking the connection between the feature and the target variable.

The process is as follows:

1. Pick a feature
2. Randomly shuffle the feature for the validation dataset
3. Measure the relative drop in prediction quality

The reasoning is that permuting the important features will result in a higher drop because the model relies on them.

In [None]:
error_before = rmse(*predict_with_ds(valid_ds))
error_before

In [None]:
def predict_with_ds(ds):
    dl = learn.dls.test_dl(ds)
    with learn.no_bar():
        return learn.get_preds(dl=dl)

def rmse(preds, targets):
    return math.sqrt(((preds - targets)**2).mean().item())


def calculate_permutation_loss(feat):
    permuted = valid_ds.copy()
    permuted[feat] = np.random.permutation(permuted[feat])
    error_after = rmse(*predict_with_ds(permuted))
    return error_after - error_before
    
calculate_permutation_loss("km_driven")

Since randomness is involved in the process, it helps to repeat the process a number of times and take the mean. 

Let's calculate the importance for each feature and analyze the results:

In [None]:
permutation_losses = {}
for feat in ["km_driven"] + CATEGORICAL_FEATURES:
    # repeat the experiment 10 times and take the mean
    feature_losses = np.mean([calculate_permutation_loss(feat) for _ in range(10)])
    permutation_losses[feat] = feature_losses
ax = pd.Series(permutation_losses).sort_values().plot(figsize=(16,9), kind='barh')
sns.despine(ax=ax)
ax.set_title("Permutation Feature Importance")
ax.set_xlabel("Increase in RMSE");