In [1]:
pip install dice-ml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dice-ml
  Downloading dice_ml-0.9-py3-none-any.whl (2.6 MB)
[K     |████████████████████████████████| 2.6 MB 5.1 MB/s 
Installing collected packages: dice-ml
Successfully installed dice-ml-0.9


In [2]:
import numpy as np
import timeit
import random

from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier

import dice_ml
from dice_ml.utils import helpers # helper functions
from sklearn.model_selection import train_test_split

In [3]:
%load_ext autoreload
%autoreload 2

We use the "adult" income dataset from UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/adult). For demonstration purposes, we transform the data as described in dice_ml.utils.helpers module.

In [4]:
dataset = helpers.load_adult_income_dataset()

In [5]:
dataset

Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,28,Private,Bachelors,Single,White-Collar,White,Female,60,0
1,30,Self-Employed,Assoc,Married,Professional,White,Male,65,1
2,32,Private,Some-college,Married,White-Collar,White,Male,50,0
3,20,Private,Some-college,Single,Service,White,Female,35,0
4,41,Self-Employed,Some-college,Married,White-Collar,White,Male,50,0
...,...,...,...,...,...,...,...,...,...
26043,28,Private,HS-grad,Married,White-Collar,White,Male,40,0
26044,18,Private,School,Single,Blue-Collar,White,Male,55,0
26045,22,Private,Some-college,Single,White-Collar,White,Female,40,0
26046,42,Self-Employed,Bachelors,Divorced,White-Collar,Other,Male,30,0


In [6]:
dataset = helpers.load_adult_income_dataset()
target = dataset["income"] # outcome variable
train_dataset, test_dataset, _, _ = train_test_split(dataset,
                                                     target,
                                                     test_size=0.2,
                                                     random_state=0,
                                                     stratify=target)
# Dataset for training an ML model
d = dice_ml.Data(dataframe=train_dataset,
                 continuous_features=['age', 'hours_per_week'],
                 outcome_name='income')

# Pre-trained ML model
m = dice_ml.Model(model_path=dice_ml.utils.helpers.get_adult_income_modelpath(),
                  backend='TF2', func="ohe-min-max")
# DiCE explanation instance
exp = dice_ml.Dice(d,m)


With DiCE, generating explanations is a simple three-step process: set up a dataset, train a model, and then invoke DiCE to generate counterfactual examples for any input. DiCE can also work with pre-trained models, with or without their original training data.

For any given input, we can now generate counterfactual explanations. For example, the following input leads to class 0 (low income) and we would like to know what minimal changes would lead to a prediction of 1 (high income).

In [46]:
query_instance = test_dataset.drop(columns="income")[0:1]
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=10, desired_class="opposite")
# Visualize counterfactual explanation
dice_exp.visualize_as_dataframe()

100%|██████████| 1/1 [00:01<00:00,  1.44s/it]

Query instance (original outcome : 0)





Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,29,Private,HS-grad,Married,Blue-Collar,White,Female,38,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,29.0,Private,HS-grad,Married,Professional,White,Female,94.0,1
1,29.0,Private,HS-grad,Married,Service,White,Female,55.0,1
2,29.0,Private,HS-grad,Married,Professional,White,Female,99.0,1
3,39.0,Private,HS-grad,Married,Other/Unknown,White,Female,38.0,1
4,29.0,Private,Some-college,Married,Blue-Collar,White,Female,89.0,1
5,54.0,Private,Prof-school,Married,Blue-Collar,White,Female,38.0,1
6,44.0,Private,HS-grad,Married,Other/Unknown,White,Female,38.0,1
7,29.0,Private,Prof-school,Married,Blue-Collar,White,Female,67.0,1
8,51.0,Private,HS-grad,Married,Blue-Collar,White,Female,57.0,1
9,58.0,Private,HS-grad,Married,Blue-Collar,White,Female,49.0,1



#DiCE can generate counterfactual examples using the following methods.

#Model-agnostic methods

1.Randomized sampling

2.KD-Tree (for counterfactuals within the training data)

3.Genetic algorithm


#Gradient-based methods

1.An explicit loss-based method described in Mothilal et al. (2020) (Default for deep learning models).

2.A Variational AutoEncoder (VAE)-based method described in Mahajan et al. (2019) (see the BaseVAE notebook).

The last two methods require a differentiable model, such as a neural network.