# Counterfactual explanations with ordinally encoded categorical variables

This example notebook illustrates how to obtain counterfactual explanations for instances with a mixture of ordinally encoded categorical and numerical variables. Generate counterfactuals for instances in the *adult* dataset to predict whether a person's income is above or below $50k. Based on the ALIBI master notebooks: https://github.com/SeldonIO/alibi/blob/master/examples/

**First: Check/Change and Reset Runtime Type to GPU!**

In [2]:
!pip install alibi

Collecting alibi
[?25l  Downloading https://files.pythonhosted.org/packages/00/e7/54214fcf84a65339d6c993121da52edea52b56d39e6ec87ad30c755d665a/alibi-0.3.2-py3-none-any.whl (81kB)
[K     |████                            | 10kB 18.2MB/s eta 0:00:01[K     |████████                        | 20kB 1.8MB/s eta 0:00:01[K     |████████████                    | 30kB 2.6MB/s eta 0:00:01[K     |████████████████                | 40kB 1.7MB/s eta 0:00:01[K     |████████████████████            | 51kB 2.1MB/s eta 0:00:01[K     |████████████████████████        | 61kB 2.5MB/s eta 0:00:01[K     |████████████████████████████    | 71kB 2.9MB/s eta 0:00:01[K     |████████████████████████████████| 81kB 3.3MB/s eta 0:00:01[K     |████████████████████████████████| 92kB 2.6MB/s 
Installing collected packages: alibi
Successfully installed alibi-0.3.2


In [0]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

# to reverse
# warnings.filterwarnings("default", category=FutureWarning)
# warnings.filterwarnings("default", category=UserWarning)

In [4]:
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)  # suppress deprecation messages
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Dense, Input, Embedding, Concatenate, Reshape, Dropout, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.utils import to_categorical

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
from sklearn.preprocessing import OneHotEncoder
from time import time
from alibi.datasets import fetch_adult
from alibi.explainers import CounterFactualProto

## Load adult dataset

The `fetch_adult` function returns a `Bunch` object containing the features, the targets, the feature names and a mapping of the categories in each categorical variable.

In [0]:
adult = fetch_adult()
data = adult.data
target = adult.target
feature_names = adult.feature_names
category_map_tmp = adult.category_map
target_names = adult.target_names

Define shuffled training and test set:

In [0]:
def set_seed(s=0):
    np.random.seed(s)
    tf.set_random_seed(s)

In [0]:
set_seed()
data_perm = np.random.permutation(np.c_[data, target])
X = data_perm[:,:-1]
y = data_perm[:,-1]

In [0]:
idx = 30000
y_train, y_test = y[:idx], y[idx+1:]

Reorganize data so categorical features come first:

In [0]:
X = np.c_[X[:, 1:8], X[:, 11], X[:, 0], X[:, 8:11]]

Adjust `feature_names` and `category_map` as well:

In [10]:
feature_names = feature_names[1:8] + feature_names[11:12] + feature_names[0:1] + feature_names[8:11]
print(feature_names)

['Workclass', 'Education', 'Marital Status', 'Occupation', 'Relationship', 'Race', 'Sex', 'Country', 'Age', 'Capital Gain', 'Capital Loss', 'Hours per week']


In [0]:
category_map = {}
for i, (_, v) in enumerate(category_map_tmp.items()):
    category_map[i] = v

Create a dictionary with as keys the categorical columns and values the number of categories for each variable in the dataset. This dictionary will later be used in the counterfactual explanation.

In [12]:
cat_vars_ord = {}
n_categories = len(list(category_map.keys()))
for i in range(n_categories):
    cat_vars_ord[i] = len(np.unique(X[:, i]))
print(cat_vars_ord)

{0: 9, 1: 7, 2: 4, 3: 9, 4: 6, 5: 5, 6: 2, 7: 11}


## Preprocess data

Scale numerical features between -1 and 1:

In [0]:
X_num = X[:, -4:].astype(np.float32, copy=False)
xmin, xmax = X_num.min(axis=0), X_num.max(axis=0)
rng = (-1., 1.)
X_num_scaled = (X_num - xmin) / (xmax - xmin) * (rng[1] - rng[0]) + rng[0]
X_num_scaled_train = X_num_scaled[:idx, :]
X_num_scaled_test = X_num_scaled[idx+1:, :]

Combine numerical and categorical data:

In [14]:
X = np.c_[X[:, :-4], X_num_scaled].astype(np.float32, copy=False)
X_train, X_test = X[:idx, :], X[idx+1:, :]
print(X_train.shape, X_test.shape)

(30000, 12) (2560, 12)


## Train a neural net

The neural net will use entity embeddings for the categorical variables.

In [0]:
def nn_ord():
    
    x_in = Input(shape=(12,))
    layers_in = []
    
    # embedding layers
    for i, (_, v) in enumerate(cat_vars_ord.items()):
        emb_in = Lambda(lambda x: x[:, i:i+1])(x_in)
        emb_dim = int(max(min(np.ceil(.5 * v), 50), 2))
        emb_layer = Embedding(input_dim=v+1, output_dim=emb_dim, input_length=1)(emb_in)
        emb_layer = Reshape(target_shape=(emb_dim,))(emb_layer)
        layers_in.append(emb_layer)
        
    # numerical layers
    num_in = Lambda(lambda x: x[:, -4:])(x_in)
    num_layer = Dense(16)(num_in)
    layers_in.append(num_layer)
    
    # combine
    x = Concatenate()(layers_in)
    x = Dense(60, activation='relu')(x)
    x = Dropout(.2)(x)
    x = Dense(60, activation='relu')(x)
    x = Dropout(.2)(x)
    x = Dense(60, activation='relu')(x)
    x = Dropout(.2)(x)
    x_out = Dense(2, activation='softmax')(x)
    
    nn = Model(inputs=x_in, outputs=x_out)
    nn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return nn

In [16]:
set_seed()
nn = nn_ord()
nn.summary()
nn.fit(X_train, to_categorical(y_train), batch_size=128, epochs=20, verbose=0)

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 12)]         0                                            
__________________________________________________________________________________________________
lambda (Lambda)                 (None, 1)            0           input_1[0][0]                    
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 1)            0           input_1[0][0]                    
__________________________________________________________________________________________________
lambda_2 (Lambda)               (None, 1)            0           input_1[0][0]                    
______________________________________________________________________________________________

<tensorflow.python.keras.callbacks.History at 0x7fc46f02dcf8>

## Generate counterfactual

Original instance:

In [0]:
X = X_test[0].reshape((1,) + X_test[0].shape)

Initialize counterfactual parameters:

In [0]:
shape = X.shape

beta = .01
c_init = 1.
c_steps = 5
max_iterations = 500
rng = (-1., 1.)  # scale features between -1 and 1
rng_shape = (1,) + data.shape[1:]
feature_range = ((np.ones(rng_shape) * rng[0]).astype(np.float32), 
                 (np.ones(rng_shape) * rng[1]).astype(np.float32))

In [19]:
shape

(1, 12)

Initialize explainer. Since the `Embedding` layers in `tf.keras` do not let gradients propagate through, we will only make use of the model's predict function, treat it as a black box and perform numerical gradient calculations. 

In [0]:
set_seed()

# define predict function
predict_fn = lambda x: nn.predict(x)

cf = CounterFactualProto(predict_fn,
                         shape,
                         beta=beta,
                         cat_vars=cat_vars_ord,
                         max_iterations=max_iterations,
                         feature_range=feature_range,
                         c_init=c_init,
                         c_steps=c_steps,
                         eps=(.01, .01)  # perturbation size for numerical gradients
                        )

Fit explainer. Please check the [documentation](../doc/source/methods/CFProto.ipynb) for more info about the optional arguments.

In [0]:
cf.fit(X_train, d_type='abdm', disc_perc=[25, 50, 75])

Explain instance:

In [0]:
set_seed()
explanation = cf.explain(X)

Helper function to describe explanations:

In [0]:
def describe_instance(X, explanation, eps=1e-2):
    print('Original instance: {}  -- proba: {}'.format(target_names[explanation['orig_class']],
                                                       explanation['orig_proba'][0]))
    print('Counterfactual instance: {}  -- proba: {}'.format(target_names[explanation['cf']['class']],
                                                             explanation['cf']['proba'][0]))
    print('\nCounterfactual perturbations...')
    print('\nCategorical:')
    X_orig_ord = X
    X_cf_ord = explanation['cf']['X']
    delta_cat = {}
    for i, (_, v) in enumerate(category_map.items()):
        cat_orig = v[int(X_orig_ord[0, i])]
        cat_cf = v[int(X_cf_ord[0, i])]
        if cat_orig != cat_cf:
            delta_cat[feature_names[i]] = [cat_orig, cat_cf]
    if delta_cat:
        for k, v in delta_cat.items():
            print('{}: {}  -->   {}'.format(k, v[0], v[1]))
    print('\nNumerical:')
    delta_num = X_cf_ord[0, -4:] - X_orig_ord[0, -4:]
    n_keys = len(list(cat_vars_ord.keys()))
    for i in range(delta_num.shape[0]):
        if np.abs(delta_num[i]) > eps:
            print('{}: {:.2f}  -->   {:.2f}'.format(feature_names[i+n_keys],
                                            X_orig_ord[0,i+n_keys],
                                            X_cf_ord[0,i+n_keys]))

In [24]:
describe_instance(X, explanation)

Original instance: <=50K  -- proba: [0.83916825 0.16083172]
Counterfactual instance: >50K  -- proba: [0.44147754 0.5585224 ]

Counterfactual perturbations...

Categorical:

Numerical:
Capital Gain: -1.00  -->   -0.88


The person's income is predicted to be above $50k by increasing his or her capital gain.

In [0]:
 X = X_test[71].reshape((1,) + X_test[71].shape)

In [0]:
cf.fit(X_train, d_type='abdm', disc_perc=[25, 50, 75])

In [0]:
set_seed()
explanation = cf.explain(X)

In [28]:
describe_instance(X, explanation)

Original instance: <=50K  -- proba: [0.9950428  0.00495721]
Counterfactual instance: >50K  -- proba: [0.43720254 0.5627974 ]

Counterfactual perturbations...

Categorical:
Education: High School grad  -->   Doctorate
Relationship: Own-child  -->   Wife
Race: Black  -->   Asian-Pac-Islander
Country: United-States  -->   Other

Numerical:


Do you see some problems here?

In [0]:
X = X_test[241].reshape((1,) + X_test[241].shape)

In [0]:
cf.fit(X_train, d_type='abdm', disc_perc=[25, 50, 75])

In [0]:
set_seed()
explanation = cf.explain(X)

In [32]:
describe_instance(X, explanation)

Original instance: <=50K  -- proba: [0.9981041  0.00189582]
Counterfactual instance: >50K  -- proba: [0.2696767 0.7303234]

Counterfactual perturbations...

Categorical:
Occupation: White-Collar  -->   Other
Relationship: Own-child  -->   Not-in-family

Numerical:
Capital Gain: -1.00  -->   -0.79


Change their job?

In [0]:
 X = X_test[26].reshape((1,) + X_test[26].shape)

In [0]:
cf.fit(X_train, d_type='abdm', disc_perc=[25, 50, 75])

In [0]:
set_seed()
explanation = cf.explain(X)

In [36]:
describe_instance(X, explanation)

Original instance: <=50K  -- proba: [0.9735909  0.02640909]
Counterfactual instance: >50K  -- proba: [0.3968212 0.6031788]

Counterfactual perturbations...

Categorical:
Education: High School grad  -->   Doctorate
Marital Status: Separated  -->   Widowed

Numerical:


Interesting suggestions!

The person's income is predicted to be above $50k by furthering their education.

In [0]:
X = X_test[6].reshape((1,) + X_test[6].shape)

In [0]:
cf.fit(X_train, d_type='abdm', disc_perc=[25, 50, 75])

In [0]:
set_seed()
explanation = cf.explain(X)

In [41]:
describe_instance(X, explanation)

Original instance: >50K  -- proba: [0.43869647 0.5613035 ]
Counterfactual instance: <=50K  -- proba: [0.53004414 0.4699559 ]

Counterfactual perturbations...

Categorical:
Education: High School grad  -->   Dropout

Numerical:


OK. We're leaving now...