# 1. Conterfactual Examples :

A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output. In the following code, we have shown the counterfactual examples on the breast cancer dataset, showing the smallest change to feature values required to change benign cancer to malignant cancer and malignant cancer to benign cancer.

Installing the necessary libraries : 

In [None]:
!pip install DiCE
!pip install dice-ml

Collecting DiCE
  Downloading https://files.pythonhosted.org/packages/57/14/e6199545c8c24bfba94238d9bc49696fa86b4d059655ddebec0e0a79f537/dice-3.1.0-py2.py3-none-any.whl
Installing collected packages: DiCE
Successfully installed DiCE-3.1.0
Collecting dice-ml
[?25l  Downloading https://files.pythonhosted.org/packages/ee/5a/7c94dea50f61a7a4a793fa9b06f2940b5af32d145d1b248311c9bc88744f/dice_ml-0.4-py3-none-any.whl (134kB)
[K     |████████████████████████████████| 143kB 3.3MB/s 
Installing collected packages: dice-ml
Successfully installed dice-ml-0.4


Importing the necessary dependancies : 

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import rcParams
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.datasets import load_breast_cancer
import dice
import dice_ml
from dice_ml.utils import helpers
import tensorflow as tf
from tensorflow import keras
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
%ls

  import pandas.util.testing as tm


heartu.csv  [0m[01;34msample_data[0m/


In [None]:
from google.colab import files
uploaded = files.upload()

Saving heartu.csv to heartu (3).csv


In [None]:
import io
df = pd.read_csv(io.BytesIO(uploaded['heartu.csv']))
# Dataset is now stored in a Pandas Dataframe

In [None]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,condition
0,69,1,0,160,234,1,2,131,0,0.1,1,1,0,0
1,69,0,0,140,239,0,0,151,0,1.8,0,2,0,0
2,66,0,0,150,226,0,0,114,0,2.6,2,0,0,0
3,65,1,0,138,282,1,2,174,0,1.4,1,1,0,1
4,64,1,0,110,211,0,2,144,1,1.8,1,0,0,0


In [None]:
features=['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']
X = df[features]
y = df.condition

In [None]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,condition
0,69,1,0,160,234,1,2,131,0,0.1,1,1,0,0
1,69,0,0,140,239,0,0,151,0,1.8,0,2,0,0
2,66,0,0,150,226,0,0,114,0,2.6,2,0,0,0
3,65,1,0,138,282,1,2,174,0,1.4,1,1,0,1
4,64,1,0,110,211,0,2,144,1,1.8,1,0,0,0


In [None]:
sess = tf.compat.v1.InteractiveSession()
d = dice_ml.Data(dataframe=df, continuous_features=['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang',
                                                    'oldpeak', 'slope', 'ca', 'thal'], outcome_name='condition')

In [None]:
train, _ = d.split_data(d.normalize_data(d.one_hot_encoded_data))
X_train = train.loc[:, train.columns != 'condition']
y_train = train.loc[:, train.columns == 'condition']

In [None]:
ann_model = keras.Sequential()
ann_model.add(keras.layers.Dense(20, input_shape=(X_train.shape[1],), 
                                 kernel_regularizer=keras.regularizers.l1(0.001), 
                                 activation=tf.nn.relu))
ann_model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))

ann_model.compile(loss='binary_crossentropy', 
                  optimizer=tf.keras.optimizers.Adam(0.01), 
                  metrics=['accuracy'])
ann_model.fit(X_train, y_train, validation_split=0.20, epochs=100, verbose=0, class_weight={0:1,1:2})

<tensorflow.python.keras.callbacks.History at 0x7feed5fbbcf8>

In [None]:
backend = 'TF'+tf.__version__[0]
m = dice_ml.Model(model=ann_model, backend=backend)

In [None]:
exp = dice_ml.Dice(d, m)

In [None]:
query_instance = {
    'age' : 67, 'sex' : 1, 'cp' : 3, 'trestbps' : 120, 'chol' : 229, 'fbs' : 0, 'restecg' : 2, 'thalach' : 129, 'exang' : 1, 'oldpeak' : 2.6
    , 'slope' : 1
    , 'ca' : 2, 'thal' : 2
}

The heart disease has two target values: 
0 - which signifies that the patient does not have a heart disease.
1 - which signifies that the patient does have a heart disease.
From this dataset, we have taken a specific instance where the patient has a heart disease. We generate 4 different counterfactuals, all of which show us the minimum changes that we can 
make to the feature values in order to change the condition of the patient. 


In [None]:
# Generating counterfactual examples
dice_exp = exp.generate_counterfactuals(query_instance,  # The data from the 1st row of our dataframe 
                                        total_CFs=4,  # Total number of Counterfactual Examples we want to print out. There can be multiple.
                                        desired_class="opposite",  # We want to convert the quality to the opposite one.
                                       features_to_vary=['thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'restecg','fbs','chol','trestbps'])
# Visualizing counterfactual explanation
dice_exp.visualize_as_dataframe()



Diverse Counterfactuals found! total time taken: 00 min 46 sec
Query instance (original outcome : 1)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,condition
0,67.0,1.0,3.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,1.0,2.0,2.0,0.99965



Diverse Counterfactual set (new outcome : 0)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,condition
0,67.0,1.0,3.0,94.0,229.0,1.0,2.0,129.0,0.0,2.6,1.0,0.0,0.0,0.283
1,67.0,1.0,3.0,120.0,186.0,1.0,1.0,124.0,0.0,2.6,2.0,0.0,0.0,0.159
2,67.0,1.0,3.0,120.0,229.0,1.0,1.0,172.0,0.0,3.1,1.0,0.0,2.0,0.315
3,67.0,1.0,3.0,120.0,229.0,0.0,1.0,129.0,0.0,0.6,1.0,0.0,0.0,0.274


We cannot change the sex, age or the type of chest pain of a person suffering from a heart disease. Therefore, we can see that in each of the counterfactuals, we have kept those features unvaried.

The 4 different counterfactuals are all different ways of solving one problem, which is to change the target value from 1 to 0. 

For example, if we look at the second counterfactual on the list, we can see that reduction of cholesterol will lead to decreasing the intensity of the heart disease. Among several other changes, it also shows that upon performing the Thallium test on the heart, there should be normal results and no defects.

A recurring theme in all counterfactuals is the reduction of ‘ca’ from 2 to 0. ‘ca’ signifies the number of blocked vessels of heart. ca is the most important feature contributing to having a heart disease. So, from the results, we can say that the most important factor in changing the condition is to reduce the number of blocked vessels by using methods like angioplasty.
