In [2]:
pip install dice-ml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dice-ml
  Downloading dice_ml-0.9-py3-none-any.whl (2.6 MB)
[K     |████████████████████████████████| 2.6 MB 5.1 MB/s 
Installing collected packages: dice-ml
Successfully installed dice-ml-0.9


In [3]:
import pandas as pd
import dice_ml
from dice_ml.utils import helpers  

In [4]:
%load_ext autoreload
%autoreload 2

features parameter should be provided as an OrderedDict in the same order that was used to train the ML model.

In [5]:
d = dice_ml.Data(features={'age': [17, 90],
                           'workclass': ['Government', 'Other/Unknown', 'Private', 'Self-Employed'],
                           'education': ['Assoc', 'Bachelors', 'Doctorate', 'HS-grad', 'Masters',
                                         'Prof-school', 'School', 'Some-college'],
                           'marital_status': ['Divorced', 'Married', 'Separated', 'Single', 'Widowed'],
                           'occupation': ['Blue-Collar', 'Other/Unknown', 'Professional', 'Sales', 'Service', 'White-Collar'],
                           'race': ['Other', 'White'],
                           'gender': ['Female', 'Male'],
                           'hours_per_week': [1, 99]},
                 outcome_name='income')

We first explain a RandomForest model that has been pre-trained on the Adult dataset.

In [6]:
backend = 'sklearn'
sk_modelpath = helpers.get_adult_income_modelpath(backend=backend)  # pretrained model
m = dice_ml.Model(model_path=sk_modelpath, backend=backend)

The next two steps are the same as when using DiCE with training data. We specify the random algorithm and provide an input query instance.

In [7]:
exp = dice_ml.Dice(d, m, method="genetic")

https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations


The initialization needs to be provided as random since the default kdtree is not supported for private data.



In [8]:
query_instance = pd.DataFrame({'age': 22,
                               'workclass': 'Private',
                               'education': 'HS-grad',
                               'marital_status': 'Single',
                               'occupation': 'Service',
                               'race': 'White',
                               'gender': 'Female',
                               'hours_per_week': 45}, index=[0])

In [9]:
# generate counterfactuals
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite",
                                        initialization="random")
# visualize the results
dice_exp.visualize_as_dataframe(show_only_changes=True)


100%|██████████| 1/1 [00:05<00:00,  5.00s/it]

Query instance (original outcome : 0)





Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,22,Private,HS-grad,Single,Service,White,Female,45,0



Diverse Counterfactual set without sparsity correction since only metadata about each  feature is available (new outcome: 1)


Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,17.0,Government,Some-college,Married,-,-,-,56.0,1
0,23.0,-,Doctorate,Widowed,Other/Unknown,-,Male,58.0,1
0,40.0,Government,Doctorate,-,-,-,-,86.0,1
0,17.0,-,Prof-school,Married,Blue-Collar,-,Male,56.0,1


#Explaining pre-trained deep learning models
We can also use a trained model based on tensorflow or pytorch. Below, we use a trained ML model which produces high accuracy on test datasets, comparable to other popular baselines. This sample trained model comes in-built with our package.


In [10]:
import tensorflow as tf  # noqa

backend = 'TF' + tf.__version__[0]  # TF2
ML_modelpath = helpers.get_adult_income_modelpath(backend=backend)
m = dice_ml.Model(model_path=ML_modelpath, backend=backend, func="ohe-min-max")

In [11]:
# initiate DiCE
exp = dice_ml.Dice(d, m, method="gradient")

In [12]:

# query instance in the form of a dictionary; keys: feature name, values: feature value
query_instance = pd.DataFrame({'age': 22,
                               'workclass': 'Private',
                               'education': 'HS-grad',
                               'marital_status': 'Single',
                               'occupation': 'Service',
                               'race': 'White',
                               'gender': 'Female',
                               'hours_per_week': 45}, index=[0])

In [13]:
# generate counterfactuals
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite")
# visualize the results
dice_exp.visualize_as_dataframe(show_only_changes=True)

Diverse Counterfactuals found! total time taken: 00 min 46 sec
Query instance (original outcome : 0)


Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,22.0,Private,HS-grad,Single,Service,White,Female,45.0,0.019



Diverse Counterfactual set without sparsity correction since only metadata about each  feature is available (new outcome: 1)


Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,60.0,Self-Employed,Prof-school,Married,Professional,-,-,43.0,0.911
1,38.0,Other/Unknown,Assoc,Married,-,-,-,55.0,0.74
2,90.0,-,Doctorate,-,-,-,-,99.0,0.755
3,70.0,-,-,-,White-Collar,Other,Male,73.0,0.525
