<a href="https://colab.research.google.com/github/VondracekS/ExplainabilityExchange/blob/master/Colab_ExplainerDashboard_DTSEKnowledgeExchange.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p>
<center><img width="400" height="100" src="https://github.com/VondracekS/ExplainabilityExchange/blob/master/imgs/ais.png?raw=1" class="imagedim">
</p>

# Explainer Dashoards Demo Notebook
*[Stepan Vondracek](https://people.telekom.de/businesscard?wr=200083284)*





This is a demo notebook meant to accompany a presentation given within the AIS' Knowledge exchange.
The capabilities of the explainer dashboards will be demonstrated using the
[Titanic dataset](https://www.kaggle.com/c/titanic) aka the Wonderwall of data science. However, I believe that given
the more general audience, the AI/ML experts will pardon me (and the public will appreciate the understandability
of the data set.

<p>
<center><img width="300" height="300" src="https://github.com/VondracekS/ExplainabilityExchange/blob/master/imgs/wonder.png?raw=1" class="imagedim">
</p>


## 1 Intro

In [18]:
!git clone https://github.com/VondracekS/ExplainabilityExchange.git

Cloning into 'ExplainabilityExchange'...
remote: Enumerating objects: 25, done.[K
remote: Counting objects: 100% (25/25), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 25 (delta 0), reused 25 (delta 0), pack-reused 0[K
Unpacking objects: 100% (25/25), done.


In [3]:
%cd ExplainabilityExchange/

/content/ExplainabilityExchange


In [19]:
# Run if the requirements are not satisfied
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [20]:
# imports
import pandas as pd

In this notebook, I use the kaggle titanic data set. The test data does not contain the actual outcome, hence I will just split the train data set.

In [21]:
titanic_data = pd.read_csv("./data/titanic_train.csv")

In [22]:
# Create the new train/test split as kaggle test set does not contain target variable
from sklearn.model_selection import train_test_split

data = {'train': (train_test_split(titanic_data, test_size=0.2))[0],
        'test': (train_test_split(titanic_data, test_size=0.2))[1]}

This notebook works just for the showcase of Explainer Dashboards, hence I will not perform any extensive
feature engineering. I will just convert the *sex* and *passenger class* variables to dummies and drop the
nominal variables and then drop all rows affected by missing observations

In [23]:
for name, tbl in data.items():
    data[name] = (pd.get_dummies(tbl, columns=['Sex', 'Pclass'], drop_first=True)
                 .drop(['Ticket', 'Cabin', 'Embarked', 'Name'], axis=1)
                 .set_index('PassengerId')).dropna()


For the purposes of this showcase, I will use only two models. The first is just a GLM using the logit link function, the second (to demonstrate
the capabilities of SHAP with more complex models) will be Random Forrest which I have previously tuned for
the particular specification.

In [24]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

X = data['train'].drop('Survived', axis=1)
y = data['train']['Survived']

models_dict = {
    'logit': LogisticRegression(fit_intercept=False),
    'random_forrest': RandomForestClassifier(criterion='gini',
                                             n_estimators=700,
                                             min_samples_split=10,
                                             min_samples_leaf=3,
                                             max_features='auto',
                                             oob_score=True,
                                             random_state=1,
                                             n_jobs=-1)
}

The following cell is used just to create a regression table for the presentation

In [26]:
import statsmodels.api as sm

X_sm = sm.add_constant(X)
sm_mod = sm.Logit(y, X_sm).fit()

print(sm_mod.summary())

Optimization terminated successfully.
         Current function value: 0.443225
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:               Survived   No. Observations:                  566
Model:                          Logit   Df Residuals:                      558
Method:                           MLE   Df Model:                            7
Date:                Mon, 28 Nov 2022   Pseudo R-squ.:                  0.3445
Time:                        19:26:25   Log-Likelihood:                -250.87
converged:                       True   LL-Null:                       -382.71
Covariance Type:            nonrobust   LLR p-value:                 3.362e-53
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          4.3515      0.576      7.558      0.000       3.223       5.480
Age           -0.0420      0.


In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only



The following cells will just create, fit and predict using the particular specifications.

In [27]:
%%capture
#  Add features as an attribute, so I can easily later easily use them as an argument
models_fit = {}
for mod, specs in models_dict.items():
    models_fit[mod] = specs.fit(X, y)
    specs.features = list(X.columns)

In [28]:
# Get predictions on the test data
models_pred = {}
for mod, fit in models_fit.items():
    models_pred[mod] = fit.predict(data['test'][fit.features])

In [29]:
# Get MAE of both models
from sklearn.metrics import mean_absolute_error as mae
models_mae = {}
for mod, pred in models_pred.items():
    models_mae[mod] = mae(pred, data['test']['Survived'])

for k, v in models_mae.items():
    print(f"Model {k} has a MAE value of: {v:.2f}")


Model logit has a MAE value of: 0.19
Model random_forrest has a MAE value of: 0.10


#  2 Explainer

Now it's time to demonstrate the capabilities of  Explainer Dashboards

In [13]:
from explainerdashboard.explainers import ClassifierExplainer


The dash_html_components package is deprecated. Please replace
`import dash_html_components as html` with `from dash import html`

The dash_core_components package is deprecated. Please replace
`import dash_core_components as dcc` with `from dash import dcc`


In [30]:
%%capture
explainers = {}

for model, specs in models_dict.items():
    explainers[model] = ClassifierExplainer(model=specs,
                                            X=data['test'][models_fit[mod].features],
                                            y=data['test']['Survived'],
                                            model_output='probability',
                                            index_name="Passenger ID"
                                            )

In [32]:
from explainerdashboard import ExplainerDashboard
ExplainerDashboard(explainers['random_forrest']).run()

Building ExplainerDashboard..
Detected google colab environment, setting mode='external'
Generating layout...
Calculating dependencies...
Reminder: you can store the explainer (including calculated dependencies) with explainer.dump('explainer.joblib') and reload with e.g. ClassifierExplainer.from_file('explainer.joblib')
Registering callbacks...
Starting ExplainerDashboard on http://172.28.0.2:8050
You can terminate the dashboard with ExplainerDashboard.terminate(8050)
Dash app running on:


<IPython.core.display.Javascript object>

## 3 Conclusion

This very simple demo notebook was meant to briefly demonstrate the easy-to-use, yet capable Python library. Feel free
to contact me as I will surely explore their capabilities more in depth.