## Diverse Counterfactual Explanations (DiCE) for ML
https://github.com/interpretml/DiCE

Installing DICE

To install the latest (dev) version of DiCE and its dependencies, clone this repo and run pip install from the top-most folder of the repo:

In [6]:
pip install dice-ml

Note: you may need to restart the kernel to use updated packages.


In [7]:
pip install -e


Usage:   
  /opt/conda/bin/python -m pip install [options] <requirement specifier> [package-index-options] ...
  /opt/conda/bin/python -m pip install [options] -r <requirements file> [package-index-options] ...
  /opt/conda/bin/python -m pip install [options] [-e] <vcs project url> ...
  /opt/conda/bin/python -m pip install [options] [-e] <local project path> ...
  /opt/conda/bin/python -m pip install [options] <archive url/path> ...

-e option requires 1 argument
Note: you may need to restart the kernel to use updated packages.


In [8]:
#pip install -r requirements.txt
# Additional dependendies for deep learning models
#pip install -r requirements-deeplearning.txt
# For running unit tests
#pip install -r requirements-test.txt

## Load data

In [10]:
import pandas as pd
# Giving current root path
PATH = "./"

# name of dataset
DATASET_NAME = "diabetes.csv"

# variable containing the class labels in this case the dataset contains:
# 0 - if not diabetes
# 1 - if diabetes
class_var = "Outcome"

# load dataset
dataset_path = PATH + "datasets/" + DATASET_NAME
data = pd.read_csv( dataset_path )

# features
feature_names = data.drop([class_var], axis=1).columns.to_list()

# balance dataset
sampled_data = data.sample(frac=1)
sampled_data = sampled_data[ sampled_data["Outcome"] == 0]

no_data = sampled_data.sample(frac=1)[0:268]
yes_data = data[ data["Outcome"] == 1]

balanced_data = [no_data,yes_data]
balanced_data = pd.concat(balanced_data)

# apply one hot encoder to data
# standardize the input between 0 and 1
X, Y, encoder, scaler = encode_data( balanced_data, class_var)

n_features = X.shape[1]
n_classes = len(data[class_var].unique())

# load existing training data
print("Loading training data...")
X_train, Y_train, X_test, Y_test, X_validation, Y_validation= load_training_data( dataset_path )

print("====================Features====================")
print(feature_names)
print("================================================")

NameError: name 'encode_data' is not defined

## Load trained model

In [None]:
# the best performing model was obtained with 5 hidden layers with 12 neurons each
model_name = "model_h5_N12"

# specify paths where the blackbox model was saved
path_serialisation_model = PATH + "training/" + DATASET_NAME.replace(".csv", "") + "/model/" 
path_serialisation_histr = PATH + "training/" + DATASET_NAME.replace(".csv", "") + "/history/" 

# load model and model performance history
print("Loading Blackbox model...")
model_history = load_model_history( model_name, path_serialisation_histr )
model = load_model( model_name, path_serialisation_model )

# check modelxw
model.summary()

In [None]:
## Dentermine the feature range by training set.
diabetes_feature_range = (X_train.min(axis=0), X_train.max(axis=0))

In [None]:
## Get an example instance from test set.
example_idx = 5 ## Could be change!!
example_data = np.expand_dims(X_test[example_idx], axis=0)

## Getting started with DiCE

With DiCE, generating explanations is a simple three-step process: train mode and then invoke DiCE to generate counterfactual examples for any input.

In [None]:
import dice_ml
from dice_ml.utils import helpers # helper functions
# Dataset for training an ML model
d = dice_ml.Data(dataframe=helpers.load_adult_income_dataset(),
                 continuous_features=['age', 'hours_per_week'],
                 outcome_name='income')
# Pre-trained ML model
m = dice_ml.Model(model_path=dice_ml.utils.helpers.get_adult_income_modelpath())
# DiCE explanation instance
exp = dice_ml.Dice(d,m)

For any given input, we can now generate counterfactual explanations. For example, the following input leads to class 0 (low income).

In [None]:
query_instance = {'age':22,
    'workclass':'Private',
    'education':'HS-grad',
    'marital_status':'Single',
    'occupation':'Service',
    'race': 'White',
    'gender':'Female',
    'hours_per_week': 45}

Using DiCE, we can now generate examples that would have been classified as class 1 (high income).

In [5]:
# Generate counterfactual examples
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite")
# Visualize counterfactual explanation
dice_exp.visualize_as_dataframe()

NameError: name 'exp' is not defined