<a href="https://colab.research.google.com/github/VondracekS/ExplainabilityExchange/blob/master/ExplainerDashboardDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Explainer Dashboard Demo

This demo notebook is created to demonstrate the easy usage of [Explainer Dashboards](https://explainerdashboard.readthedocs.io/en/latest/). It works with commonly known [miles-per-galon](https://data.world/dataman-udit/cars-data) and [penguins](https://github.com/allisonhorst/palmerpenguins) data sets. 

In [289]:
# at first, install the explainer dashboard library
!pip install explainerdashboard

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


To avoid wasting our precious time, I do not perform any extensive EDA. Should
you want to play around a little bit with the data set, feel free to do so. Otherwise, I perform only very basic data transformations to feed them to our models.

In [224]:
# load the data
from seaborn import load_dataset
data_cls = load_dataset("penguins")
data_reg = load_dataset("mpg").drop(['name', 'origin'], axis=1)

In [271]:
# train-test split
from sklearn.model_selection import train_test_split
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(data_reg.drop('mpg', axis=1), data_reg[['mpg']])
X_cls_train, X_cls_test, y_cls_train, y_cls_test = train_test_split(data_cls.drop('species', axis=1), data_cls[['species']])

In [318]:
# the very basic data transformations
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer

def transformation_pipeline(data: pd.DataFrame) -> pd.DataFrame:
  """
  Sklearn-based preprocessing pipeline to be used for both of our data sets
  """
  cat_features = [col for col in data.columns if data[col].dtype == 'object']
  num_features = [col for col in data.columns if data[col].dtype != 'object']

  numeric_transformer = Pipeline(
      steps=[('imputer', SimpleImputer(strategy='median')),
            ('scaler', StandardScaler())
  ])
  categorical_transformer = Pipeline(
      steps=[
          ('imputer', SimpleImputer(strategy='median')),
          ('encoder', OneHotEncoder())]
  )
  preprocessor = ColumnTransformer(
      transformers=[
          ('numeric', numeric_transformer, num_features),
          ('categorical', categorical_transformer, cat_features)
      ]
  )
  return pd.DataFrame(preprocessor.fit_transform(data), 
                      columns=preprocessor.get_feature_names_out())

In [319]:
from typing import Any
from explainerdashboard import (ClassifierExplainer, RegressionExplainer,
                                ExplainerDashboard)
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

def run_explainer_dashboard(task: str, data: pd.DataFrame, 
                            target_var: str, model: Any, 
                            port=8050, mode='inline') -> None:
  
  X_train, X_test, y_train, y_test = train_test_split(
      data.drop(target_var, axis=1), data[[target_var]])
  
  X_train = transformation_pipeline(X_train)
  X_test = transformation_pipeline(X_test)
  y_train = transformation_pipeline(y_train)
  y_test = transformation_pipeline(y_test)
 
  if task=='classification':
    explainer = ClassifierExplainer(model.fit(X_train, y_train), X_test, y_test)
  else:
    explainer = RegressionExplainer(model.fit(X_train, y_train), X_test, y_test)


  ExplainerDashboard(explainer, mode=mode).run(port=port)
  print("Explainer dashboard running!")

In [320]:
run_explainer_dashboard('regression', data_reg, 'mpg', RandomForestRegressor())


A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().



Changing class type to RandomForestRegressionExplainer...
Generating self.shap_explainer = shap.TreeExplainer(model)
Building ExplainerDashboard..
Generating layout...
Calculating shap values...
Calculating predictions...
Calculating residuals...
Calculating absolute residuals...
Calculating shap interaction values...
Reminder: TreeShap computational complexity is O(TLD^2), where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree. So reducing these will speed up the calculation.
Calculating dependencies...
Calculating importances...
Calculating ShadowDecTree for each individual decision tree...
Reminder: you can store the explainer (including calculated dependencies) with explainer.dump('explainer.joblib') and reload with e.g. ClassifierExplainer.from_file('explainer.joblib')
Registering callbacks...
Starting ExplainerDashboard inline (terminate it with ExplainerDashboard.terminate(8050))
Dash is running on http://127.0.0.1:8050/

INFO:dash.dash:Dash is running on http://127.0.0.1:8050/



<IPython.core.display.Javascript object>

Explainer dashboard running!


In [313]:
from explainerdashboard.datasets import titanic_embarked, feature_descriptions

X_train, y_train, X_test, y_test = titanic_embarked()
model = RandomForestClassifier(n_estimators=50, max_depth=10).fit(X_train, y_train)

explainer = ClassifierExplainer(model, X_test, y_test, 
                                cats=['Sex', 'Deck'], 
                                descriptions=feature_descriptions,
                                labels=['Queenstown', 'Southampton', 'Cherbourg'])

ExplainerDashboard(explainer, mode='inline').run(port=8049)

Detected RandomForestClassifier model: Changing class type to RandomForestClassifierExplainer...
Note: model_output=='probability', so assuming that raw shap output of RandomForestClassifier is in probability space...
Generating self.shap_explainer = shap.TreeExplainer(model)
Building ExplainerDashboard..
Generating layout...
Calculating shap values...
Calculating prediction probabilities...
Calculating metrics...
Calculating confusion matrices...
Calculating classification_dfs...
Calculating roc auc curves...
Calculating pr auc curves...
Calculating liftcurve_dfs...
Calculating shap interaction values... (this may take a while)
Reminder: TreeShap computational complexity is O(TLD^2), where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree. So reducing these will speed up the calculation.
Calculating dependencies...
Calculating permutation importances (if slow, try setting n_jobs parameter)...
Calculating pred_percentiles...
Cal

INFO:dash.dash:Dash is running on http://127.0.0.1:8049/



<IPython.core.display.Javascript object>

Unfortunately, the dashboard output doesn't work 100%. To see the full output, please visit the [following page](http://titanicexplainer.herokuapp.com/multiclass/)