<p>
<center><img width="400" height="100" src="./imgs/ais.png" class="imagedim">
</p>

# Explainer Dashoards Demo Notebook
*[Stepan Vondracek](https://people.telekom.de/businesscard?wr=200083284)*





This is a demo notebook meant to accompany a presentation given within the AIS' Knowledge exchange.
The capabilities of the explainer dashboards will be demonstrated using the
[Titanic dataset](https://www.kaggle.com/c/titanic) aka the Wonderwall of data science. However, I believe that given
the more general audience, the AI/ML experts will pardon me (and the public will appreciate the understandability
of the data set.

<p>
<center><img width="300" height="300" src="./imgs/wonder.png" class="imagedim">
</p>


## 1 Intro

In [None]:
# Run if the requirements are not satisfied
# !pip install -r requirements.txt

In [20]:
# imports
import pandas as pd

In this notebook, I use the kaggle titanic data set. The test data does not contain the actual outcome, hence I will just split the train data set.

In [21]:
titanic_data = pd.read_csv("./data/titanic_train.csv")

In [14]:
# Create the new train/test split as kaggle test set does not contain target variable
from sklearn.model_selection import train_test_split

data = {'train': (train_test_split(titanic_data, test_size=0.2))[0],
        'test': (train_test_split(titanic_data, test_size=0.2))[1]}

This notebook works just for the showcase of Explainer Dashboards, hence I will not perform any extensive
feature engineering. I will just convert the *sex* and *passenger class* variables to dummies and drop the
nominal variables and then drop all rows affected by missing observations

In [15]:
for name, tbl in data.items():
    data[name] = (pd.get_dummies(tbl, columns=['Sex', 'Pclass'], drop_first=True)
                 .drop(['Ticket', 'Cabin', 'Embarked', 'Name'], axis=1)
                 .set_index('PassengerId')).dropna()


For the purposes of this showcase, I will use only two models. The first is just a GLM using the logit link function, the second (to demonstrate
the capabilities of SHAP with more complex models) will be Random Forrest which I have previously tuned for
the particular specification.

In [30]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

X = data['train'].drop('Survived', axis=1)
y = data['train']['Survived']

models_dict = {
    'logit': LogisticRegression(fit_intercept=False),
    'random_forrest': RandomForestClassifier(criterion='gini',
                                             n_estimators=700,
                                             min_samples_split=10,
                                             min_samples_leaf=3,
                                             max_features='auto',
                                             oob_score=True,
                                             random_state=1,
                                             n_jobs=-1)
}

The following cells will just create, fit and predict using the particular specifications.

In [31]:
%%capture
#  Add features as an attribute, so I can easily later easily use them as an argument
models_fit = {}
for mod, specs in models_dict.items():
    models_fit[mod] = specs.fit(X, y)
    specs.features = list(X.columns)

In [32]:
# Get predictions on the test data
models_pred = {}
for mod, fit in models_fit.items():
    models_pred[mod] = fit.predict(data['test'][fit.features])

In [33]:
# Get MAE of both models
from sklearn.metrics import mean_absolute_error as mae
models_mae = {}
for mod, pred in models_pred.items():
    models_mae[mod] = mae(pred, data['test']['Survived'])

for k, v in models_mae.items():
    print(f"Model {k} has a MAE value of: {v:.2f}")


Model logit has a MAE value of: 0.22
Model random_forrest has a MAE value of: 0.09


#  2 Explainer

Now it's time to demonstrate the capabilities of  Explainer Dashboards

In [35]:
from explainerdashboard.explainers import ClassifierExplainer

In [36]:
%%capture
explainers = {}

for model, specs in models_dict.items():
    explainers[model] = ClassifierExplainer(model=specs,
                                            X=data['test'][models_fit[mod].features],
                                            y=data['test']['Survived'],
                                            model_output='probability',
                                            index_name="Passenger ID"
                                            )

In [None]:
from explainerdashboard import ExplainerDashboard
ExplainerDashboard(explainers['random_forrest']).run(mode='external')

Building ExplainerDashboard..
Detected notebook environment, consider setting mode='external', mode='inline' or mode='jupyterlab' to keep the notebook interactive while the dashboard is running...
Generating layout...
Calculating shap values...
Calculating prediction probabilities...
Calculating metrics...
Calculating confusion matrices...
Calculating classification_dfs...
Calculating roc auc curves...
Calculating pr auc curves...
Calculating liftcurve_dfs...



In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`


In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`


In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`


In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i,

Calculating shap interaction values... (this may take a while)
Reminder: TreeShap computational complexity is O(TLD^2), where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree. So reducing these will speed up the calculation.
Calculating dependencies...
Calculating permutation importances (if slow, try setting n_jobs parameter)...
Calculating predictions...
Calculating pred_percentiles...
Calculating ShadowDecTree for each individual decision tree...
Reminder: you can store the explainer (including calculated dependencies) with explainer.dump('explainer.joblib') and reload with e.g. ClassifierExplainer.from_file('explainer.joblib')
Registering callbacks...
Building ExplainerDashboard..
Generating layout...



In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`


In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`


In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`


In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i,

Calculating dependencies...
Reminder: you can store the explainer (including calculated dependencies) with explainer.dump('explainer.joblib') and reload with e.g. ClassifierExplainer.from_file('explainer.joblib')
Registering callbacks...
Starting ExplainerDashboard on http://10.180.109.80:8050
You can terminate the dashboard with ExplainerDashboard.terminate(8050)


## 3 Conclusion

This very simple demo notebook was meant to briefly demonstrate the easy-to-use, yet capable Python library. Feel free
to contact me as I will surely explore their capabilities more in depth.