# Explainer examples
**Introduction**

In this Notebook we will pick up Pima Indian Diabetes dataset from the National Institute of Diabetes and Digestive and Kidney Diseases.

The problem at hand is to be able to detect and predict weather a person has Diabetes or not, based on other available parameters like Body Mass Index, Insulin levels, etc.

This notebook shows how you can use the `Explainer` object for interactive visualization in your jupyter notebook.

Another interesting insight from this problem could be to see on which parameter does Diabetes depend the most.

All this plotting functionality gets called by the `ExplainerDashboard` to construct the interactive dashboard.

# Google colab link:

[https://colab.research.google.com/github/oegedijk/explainerdashboard/blob/master/explainer_examples.ipynb](https://colab.research.google.com/github/oegedijk/explainerdashboard/blob/master/explainer_examples.ipynb)

# notebook properties

Display multiple outputs per cell:

In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# ClassifierExplainer:

## train model

In [None]:
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

MessageError: ignored

In [None]:
from sklearn.ensemble import RandomForestClassifier

diabetes=pd.read_csv('diabetes.csv')
X = diabetes.drop(["Outcome"], axis=1)
Y = diabetes["Outcome"]
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.1, random_state=0)
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

In [None]:
Y.value_counts()

0    500
1    268
Name: Outcome, dtype: int64

## <font color="green">**Diverse Counterfactual Explanations for Machine Learning**</font>

In [None]:
!pip install dice_ml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dice_ml
  Downloading dice_ml-0.9-py3-none-any.whl (2.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m36.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: dice_ml
Successfully installed dice_ml-0.9


In [None]:
diabetes.describe

<bound method NDFrame.describe of      Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0              6      148             72             35        0  33.6   
1              1       85             66             29        0  26.6   
2              8      183             64              0        0  23.3   
3              1       89             66             23       94  28.1   
4              0      137             40             35      168  43.1   
..           ...      ...            ...            ...      ...   ...   
763           10      101             76             48      180  32.9   
764            2      122             70             27        0  36.8   
765            5      121             72             23      112  26.2   
766            1      126             60              0        0  30.1   
767            1       93             70             31        0  30.4   

     DiabetesPedigreeFunction  Age  Outcome  
0                       0.627  

In [None]:
import dice_ml
# Dataset
data_dice = dice_ml.Data(dataframe=diabetes,
                         # For perturbation strategy
                         continuous_features=['Pregnancies',	'Glucose', 'BloodPressure',	'SkinThickness',
                                              'Insulin',	'BMI'	,'DiabetesPedigreeFunction', 'Age'],
                         outcome_name='Outcome')


In [None]:
# Model
rf_dice = dice_ml.Model(model=model,
                        # There exist backends for tf, torch, ...
                        backend="sklearn")
explainer = dice_ml.Dice(data_dice,
                         rf_dice,
                         # Random sampling, genetic algorithm, kd-tree,...
                         method="random")


In [None]:
# %% Create explanation
# Generate CF based on the blackbox model
input_datapoint = X_test[20:21]
cf = explainer.generate_counterfactuals(input_datapoint,
                                  total_CFs=10,
                                  desired_class="opposite")
# Visualize it
cf.visualize_as_dataframe(show_only_changes=True)


100%|██████████| 1/1 [00:01<00:00,  1.69s/it]

Query instance (original outcome : 1)





Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,4,144,82,32,0,38.5,0.554,37,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,-,78.0,-,-,-,39.00000000000001,0.554,55.0,0.0
1,-,55.0,-,-,339.0,39.00000000000001,0.554,-,0.0
2,-,-,-,-,-,24.0,0.554,-,0.0
3,-,94.0,-,-,-,14.9,0.554,-,0.0
4,-,46.0,-,-,53.0,39.00000000000001,0.554,-,0.0
5,-,40.0,-,-,-,39.00000000000001,0.554,-,0.0
6,-,-,-,-,-,39.00000000000001,0.554,24.0,0.0
7,-,44.0,-,-,-,39.00000000000001,2.178,-,0.0
8,-,-,-,-,-,2.7,0.554,-,0.0
9,-,-,-,-,-,29.5,0.554,-,0.0


In [None]:
diabetes[['Glucose', 'BMI']].describe()

Unnamed: 0,Glucose,BMI
count,768.0,768.0
mean,120.894531,31.992578
std,31.972618,7.88416
min,0.0,0.0
25%,99.0,27.3
50%,117.0,32.0
75%,140.25,36.6
max,199.0,67.1


In [None]:
# %% Create feasible (conditional) Counterfactuals
features_to_vary=['Glucose',
                  'BMI']
permitted_range={'Glucose':[50,190],
                'BMI':[18, 40]}
# Now generating explanations using the new feature weights
cf = explainer.generate_counterfactuals(input_datapoint,
                                  total_CFs=10,
                                  desired_class="opposite",
                                  permitted_range=permitted_range,
                                  features_to_vary=features_to_vary)
# Visualize it
cf.visualize_as_dataframe(show_only_changes=True)


100%|██████████| 1/1 [00:00<00:00,  1.09it/s]

Query instance (original outcome : 1)





Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,4,144,82,32,0,38.5,0.554,37,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,-,136.0,-,-,-,18.1,0.554,-,0.0
1,-,89.0,-,-,-,31.1,0.554,-,0.0
2,-,-,-,-,-,24.5,0.554,-,0.0
3,-,104.0,-,-,-,32.6,0.554,-,0.0
4,-,84.0,-,-,-,35.0,0.554,-,0.0
5,-,-,-,-,-,18.6,0.554,-,0.0
6,-,-,-,-,-,24.0,0.554,-,0.0
7,-,-,-,-,-,27.9,0.554,-,0.0
8,-,-,-,-,-,21.5,0.554,-,0.0
9,-,161.0,-,-,-,18.9,0.554,-,0.0
