# Counterfactuals Membership Inference Experiment

In [1]:
import pandas as pd
import sklearn.ensemble as es
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
import numpy as np
import logging
import time
import dice_ml

In [2]:
%run experiment_setup.ipynb

This notebook will test whether membership inference is possible with counterfactuals (CF) that are drawn from the training data. Membership inference means an attacker with access to the explanation can determine for any sample whether it was included in the training data or not.

First we define the function that will run the experiment for the different variations. The attacker obtains a counterfactual for the test sample ("counterfactual \#1"). They access the explainer a second time to receive a counterfactual for counterfactual \#1 ("counterfactual \#2"). Counterfactual \#2 should have the same class as the original test sample. If counterfactual \#2 is equal to the test sample, then the test sample must be part of the training data.

In [3]:
class CounterfactualMembershipInference(XaiPrivacyExperiment):
    def train_explainer(self, data_train, model):
        # train explainer on training data
        d = dice_ml.Data(dataframe=data_train, continuous_features=self.continuous_features,\
                         outcome_name=self.outcome_name)
        m = dice_ml.Model(model=model, backend="sklearn", model_type='classifier')
        
        # use method "kd-tree" to get counterfactuals drawn from the training data
        return dice_ml.Dice(d, m, method="kdtree")
        
    @staticmethod
    def membership_inference_attack_model_access(explainer, samples_df, model):
        inferred_membership = np.empty(len(samples_df))
        
        for index in range(len(samples_df)):
            # needs double brackets so that iloc returns a dataframe instead of a series
            sample_df = samples_df.iloc[[index], :]

            logging.debug(f'Test sample: {sample_df.to_numpy()}')

            # there is an issue with dice where desired_class="opposite" does not calculate counterfactuals of opposite class 
            # for class 1: https://github.com/interpretml/DiCE/issues/215
            # this is why we need to manually set the desired class. This requires access to the model which would otherwise not 
            # be necessary.
            model_pred = model.predict(sample_df)[0]
            logging.debug(f'Prediction by model: {model_pred}')

            # get one counterfactual for test sample:
            e1 = explainer.generate_counterfactuals(sample_df, total_CFs=1, desired_class=int(1-model_pred))
            cf_dataframe = e1.cf_examples_list[0].final_cfs_df
            # remove label (in case of using sparse counterfactuals)
            # cf_dataframe = cf_dataframe.iloc[:, :-1]
            logging.debug('1st counterfactual: %s' % cf_dataframe.to_numpy())

            # get model prediction as workaround against the DiCE bug mentioned above
            model_pred = model.predict(cf_dataframe)[0]
            logging.debug(f'Prediction by model: {model_pred}')

            # get counterfactual for counterfactual:
            e2 = explainer.generate_counterfactuals(cf_dataframe, total_CFs=1, desired_class=int(1-model_pred))
            cf_cf_df = e2.cf_examples_list[0].final_cfs_df
            # remove label (in case of using sparse counterfactuals)
            # cf_cf_df = cf_cf_df.iloc[:, :-1]

            logging.debug('2nd counterfactual: %s' % cf_cf_df.to_numpy())
            logging.debug(f'Prediction by model: {model.predict(cf_cf_df)[0]}')

            # if the counter-counterfactual is equal to the test sample, then it is part of the training data:
            # np.isclose is used for comparison because explainer may round floating point values
            result = np.isclose(cf_cf_df.to_numpy().astype(float), sample_df.to_numpy().astype(float)).all()

            logging.debug('Inferred membership: %s' % result)
            inferred_membership[index] = result
        
        return inferred_membership

# Dataset 1: Heart Disease

We now generate five counterfactuals for the first sample from the training data to demonstrate counterfactual explanations in general.

In [5]:
features = data_num.drop('heart_disease_label', axis=1)
labels = data_num['heart_disease_label']

# Train a random forest on training data.
model = es.RandomForestClassifier(random_state=0)
model = model.fit(features, labels)

# Train explainer
d = dice_ml.Data(dataframe=data_num, continuous_features=continuous_features_num, outcome_name=outcome_name_num)


m = dice_ml.Model(model=model, backend="sklearn", model_type='classifier')
# Generating counterfactuals from training data (kd-tree)
exp = dice_ml.Dice(d, m, method="kdtree")

In [6]:
e1 = exp.generate_counterfactuals(features[0:1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

100%|██████████| 1/1 [00:02<00:00,  2.58s/it]

Query instance (original outcome : 0)





Unnamed: 0,age,cigs_per_day,total_chol,sys_bp,dia_bp,bmi,heart_rate,glucose,heart_disease_label
0,39.0,0.0,195.0,106.0,70.0,26.97,80.0,77.0,0.0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,age,cigs_per_day,total_chol,sys_bp,dia_bp,bmi,heart_rate,glucose,heart_disease_label
4188,44.0,-,180.0,106.9,-,23.98,92.0,67.0,1.0
1358,64.0,-,210.0,120.0,70.1,24.77,-,-,1.0
2633,43.0,-,196.0,121.5,86.5,20.82,92.0,-,-
1345,49.0,-,211.0,104.0,66.5,24.17,75.0,87.0,1.0
1253,50.0,-,196.0,126.0,88.0,26.73,-,77.1,1.0


We can see that the counterfactuals are similar to the query sample and that most of them have a flipped prediction. These are the two general properties of counterfactual explanations.

We will now do a small proof of concept of the experiment with logging enabled to demonstrate how it works.

In [7]:
logging.root.setLevel(logging.DEBUG)

experiment_num = CounterfactualMembershipInference(data_num, continuous_features_num, outcome_name_num, 13)
experiment_num.membership_inference_experiment(10, DecisionTreeClassifier(random_state=13), model_access=True)

logging.root.setLevel(logging.ERROR)

DEBUG:root:[[ 55.    0.  165.  166.  101.   24.8  65.   90. ]] taken from training data
DEBUG:root:[[ 36.    10.   214.   119.    76.    21.67  67.    75.  ]] taken from control data
DEBUG:root:[[ 52.    0.  269.  157.5  83.   26.6  70.   80. ]] taken from training data
DEBUG:root:[[ 36.    0.  237.  142.   82.   27.5  53.   87. ]] taken from control data
DEBUG:root:[[ 58.  10. 226. 125.  75.  24.  75.  73.]] taken from training data
DEBUG:root:[[ 37.     0.   300.   118.5   85.5   25.83  68.    82.  ]] taken from control data
DEBUG:root:[[ 43.    20.   184.   127.5   81.    28.31 108.    75.  ]] taken from training data
DEBUG:root:[[ 53.    20.   221.   131.    89.    24.09  90.    95.  ]] taken from control data
DEBUG:root:[[ 44.     0.   254.   130.    80.    28.15  80.    74.  ]] taken from training data
DEBUG:root:[[ 42.    20.   200.    95.    55.    23.68  60.    83.  ]] taken from control data
DEBUG:root:Test sample: [[ 55.    0.  165.  166.  101.   24.8  65.   90. ]]
DEBUG:roo

100%|██████████| 1/1 [00:00<00:00,  4.45it/s]
DEBUG:root:2nd counterfactual: [[ 40.    20.   214.   109.5   69.    20.32  80.    81.  ]]
DEBUG:root:Prediction by model: 0.0
DEBUG:root:Inferred membership: False
INFO:root:Accuracy: 0.5, precision: nan, recall: 0.0


In [8]:
results_ = {'dataset': [], 'model': [], 'accuracy': [], 'precision': [], 'recall': []}

results = pd.DataFrame(data = results_)

We can now begin with the actual experiments.

In [9]:
logging.info("features: continuous, model: decision tree.")

start_time = time.time()

experiment_num = CounterfactualMembershipInference(data_num, continuous_features_num, outcome_name_num, 0)
accuracy, precision, recall = experiment_num.membership_inference_experiment(100, DecisionTreeClassifier(random_state=0),\
                                                                             model_access=True)

print(f'accuracy: {accuracy}, precision: {precision}, recall: {recall}')
results.loc[len(results.index)] = ['continuous', 'decision tree', accuracy, precision, recall]

print("--- %s seconds ---" % (time.time() - start_time))

100%|██████████| 1/1 [00:00<00:00,  4.52it/s]
100%|██████████| 1/1 [00:00<00:00,  3.58it/s]
100%|██████████| 1/1 [00:00<00:00,  4.60it/s]
100%|██████████| 1/1 [00:00<00:00,  3.25it/s]
100%|██████████| 1/1 [00:00<00:00,  3.55it/s]
100%|██████████| 1/1 [00:00<00:00,  3.65it/s]
100%|██████████| 1/1 [00:00<00:00, 15.66it/s]
100%|██████████| 1/1 [00:00<00:00,  5.29it/s]
100%|██████████| 1/1 [00:00<00:00,  5.88it/s]
100%|██████████| 1/1 [00:00<00:00,  6.32it/s]
100%|██████████| 1/1 [00:00<00:00,  9.42it/s]
100%|██████████| 1/1 [00:00<00:00,  7.43it/s]
100%|██████████| 1/1 [00:00<00:00,  8.48it/s]
100%|██████████| 1/1 [00:00<00:00,  3.16it/s]
100%|██████████| 1/1 [00:00<00:00,  4.66it/s]
100%|██████████| 1/1 [00:00<00:00,  2.97it/s]
100%|██████████| 1/1 [00:00<00:00,  9.82it/s]
100%|██████████| 1/1 [00:00<00:00,  5.71it/s]
100%|██████████| 1/1 [00:00<00:00,  4.06it/s]
100%|██████████| 1/1 [00:00<00:00,  3.71it/s]
100%|██████████| 1/1 [00:00<00:00,  6.40it/s]
100%|██████████| 1/1 [00:00<00:00,

accuracy: 0.55, precision: 1.0, recall: 0.1
--- 43.54416060447693 seconds ---


In [10]:
logging.info("features: continuous, model: random forest.")

start_time = time.time()

experiment_num = CounterfactualMembershipInference(data_num, continuous_features_num, outcome_name_num, 0)
accuracy, precision, recall = experiment_num.membership_inference_experiment(100, es.RandomForestClassifier(random_state=0),\
                                                                             model_access=True)

print(f'accuracy: {accuracy}, precision: {precision}, recall: {recall}')
results.loc[len(results.index)] = ['continuous', 'random forest', accuracy, precision, recall]

print("--- %s seconds ---" % (time.time() - start_time))

100%|██████████| 1/1 [00:00<00:00,  1.13it/s]
100%|██████████| 1/1 [00:01<00:00,  1.49s/it]
100%|██████████| 1/1 [00:00<00:00,  1.59it/s]
100%|██████████| 1/1 [00:02<00:00,  2.08s/it]
100%|██████████| 1/1 [00:01<00:00,  1.66s/it]
100%|██████████| 1/1 [00:01<00:00,  1.87s/it]
100%|██████████| 1/1 [00:00<00:00,  4.80it/s]
100%|██████████| 1/1 [00:01<00:00,  1.16s/it]
100%|██████████| 1/1 [00:00<00:00,  1.01it/s]
100%|██████████| 1/1 [00:00<00:00,  1.49it/s]
100%|██████████| 1/1 [00:00<00:00,  1.25it/s]
100%|██████████| 1/1 [00:00<00:00,  1.83it/s]
100%|██████████| 1/1 [00:00<00:00,  1.79it/s]
100%|██████████| 1/1 [00:02<00:00,  2.02s/it]
100%|██████████| 1/1 [00:00<00:00,  1.16it/s]
100%|██████████| 1/1 [00:02<00:00,  2.02s/it]
100%|██████████| 1/1 [00:00<00:00,  2.76it/s]
100%|██████████| 1/1 [00:00<00:00,  1.32it/s]
100%|██████████| 1/1 [00:01<00:00,  1.36s/it]
100%|██████████| 1/1 [00:01<00:00,  1.67s/it]
100%|██████████| 1/1 [00:00<00:00,  1.35it/s]
100%|██████████| 1/1 [00:01<00:00,

accuracy: 0.55, precision: 1.0, recall: 0.1
--- 230.20228385925293 seconds ---





In [11]:
logging.info("features: continuous, model: neural network.")

start_time = time.time()

experiment_num = CounterfactualMembershipInference(data_num, continuous_features_num, outcome_name_num, 0)
accuracy, precision, recall = experiment_num.membership_inference_experiment(100,\
                                        MLPClassifier(hidden_layer_sizes=(32, 32, 32), random_state=0), model_access=True)

print(f'accuracy: {accuracy}, precision: {precision}, recall: {recall}')
results.loc[len(results.index)] = ['continuous', 'neural network', accuracy, precision, recall]

print("--- %s seconds ---" % (time.time() - start_time))

100%|██████████| 1/1 [00:00<00:00,  3.51it/s]
100%|██████████| 1/1 [00:00<00:00,  3.75it/s]
100%|██████████| 1/1 [00:00<00:00, 13.49it/s]
100%|██████████| 1/1 [00:00<00:00,  6.48it/s]
100%|██████████| 1/1 [00:00<00:00, 15.60it/s]
100%|██████████| 1/1 [00:00<00:00,  2.80it/s]
100%|██████████| 1/1 [00:00<00:00, 17.34it/s]
100%|██████████| 1/1 [00:00<00:00,  3.16it/s]
100%|██████████| 1/1 [00:00<00:00, 16.91it/s]
100%|██████████| 1/1 [00:00<00:00, 10.02it/s]
100%|██████████| 1/1 [00:00<00:00,  3.42it/s]
100%|██████████| 1/1 [00:00<00:00,  2.57it/s]
100%|██████████| 1/1 [00:00<00:00, 16.40it/s]
100%|██████████| 1/1 [00:00<00:00,  4.11it/s]
100%|██████████| 1/1 [00:00<00:00, 13.91it/s]
100%|██████████| 1/1 [00:00<00:00,  3.63it/s]
100%|██████████| 1/1 [00:00<00:00,  9.58it/s]
100%|██████████| 1/1 [00:00<00:00,  6.45it/s]
100%|██████████| 1/1 [00:00<00:00, 15.87it/s]
100%|██████████| 1/1 [00:00<00:00,  7.46it/s]
100%|██████████| 1/1 [00:00<00:00, 14.12it/s]
100%|██████████| 1/1 [00:00<00:00,

accuracy: 0.5, precision: nan, recall: 0.0
--- 38.32210850715637 seconds ---





# Dataset 2: Census Income (categorical)

In [12]:
# DiCE needs categorical features to be strings:
categorical_features = data_cat.columns.difference([outcome_name_cat])

for col in categorical_features:
    data_cat[col] = data_cat[col].astype(str)

In [13]:
logging.info("features: categorical, model: decision tree.")

start_time = time.time()

experiment_cat = CounterfactualMembershipInference(data_cat, continuous_features_cat, outcome_name_cat, 0)
accuracy, precision, recall = experiment_cat.membership_inference_experiment(100,\
                                        DecisionTreeClassifier(random_state=0), model_access=True)

print(f'accuracy: {accuracy}, precision: {precision}, recall: {recall}')
results.loc[len(results.index)] = ['categorical', 'decision tree', accuracy, precision, recall]

print("--- %s seconds ---" % (time.time() - start_time))

100%|██████████| 1/1 [00:00<00:00,  8.05it/s]
100%|██████████| 1/1 [00:00<00:00, 12.36it/s]
100%|██████████| 1/1 [00:00<00:00, 10.08it/s]
100%|██████████| 1/1 [00:00<00:00, 10.31it/s]
100%|██████████| 1/1 [00:00<00:00,  9.26it/s]
100%|██████████| 1/1 [00:00<00:00,  8.84it/s]
100%|██████████| 1/1 [00:00<00:00,  8.81it/s]
100%|██████████| 1/1 [00:00<00:00,  9.80it/s]
100%|██████████| 1/1 [00:00<00:00,  7.03it/s]
100%|██████████| 1/1 [00:00<00:00,  9.71it/s]
100%|██████████| 1/1 [00:00<00:00,  7.78it/s]
100%|██████████| 1/1 [00:00<00:00, 11.47it/s]
100%|██████████| 1/1 [00:00<00:00,  9.03it/s]
100%|██████████| 1/1 [00:00<00:00, 10.09it/s]
100%|██████████| 1/1 [00:00<00:00, 10.14it/s]
100%|██████████| 1/1 [00:00<00:00, 10.90it/s]
100%|██████████| 1/1 [00:00<00:00, 10.32it/s]
100%|██████████| 1/1 [00:00<00:00, 11.08it/s]
100%|██████████| 1/1 [00:00<00:00,  9.86it/s]
100%|██████████| 1/1 [00:00<00:00, 10.61it/s]
100%|██████████| 1/1 [00:00<00:00,  9.79it/s]
100%|██████████| 1/1 [00:00<00:00,

accuracy: 0.5, precision: nan, recall: 0.0
--- 24.721978425979614 seconds ---


In [14]:
logging.info("features: categorical, model: random forest.")

start_time = time.time()

experiment_cat = CounterfactualMembershipInference(data_cat, continuous_features_cat, outcome_name_cat, 0)
accuracy, precision, recall = experiment_cat.membership_inference_experiment(100,\
                                        es.RandomForestClassifier(random_state=0), model_access=True)

print(f'accuracy: {accuracy}, precision: {precision}, recall: {recall}')
results.loc[len(results.index)] = ['categorical', 'random forest', accuracy, precision, recall]

print("--- %s seconds ---" % (time.time() - start_time))

100%|██████████| 1/1 [00:00<00:00,  4.03it/s]
100%|██████████| 1/1 [00:00<00:00,  5.22it/s]
100%|██████████| 1/1 [00:00<00:00,  4.62it/s]
100%|██████████| 1/1 [00:00<00:00,  4.81it/s]
100%|██████████| 1/1 [00:00<00:00,  4.61it/s]
100%|██████████| 1/1 [00:00<00:00,  4.96it/s]
100%|██████████| 1/1 [00:00<00:00,  4.52it/s]
100%|██████████| 1/1 [00:00<00:00,  4.96it/s]
100%|██████████| 1/1 [00:00<00:00,  4.46it/s]
100%|██████████| 1/1 [00:00<00:00,  4.52it/s]
100%|██████████| 1/1 [00:00<00:00,  4.11it/s]
100%|██████████| 1/1 [00:00<00:00,  5.29it/s]
100%|██████████| 1/1 [00:00<00:00,  4.82it/s]
100%|██████████| 1/1 [00:00<00:00,  4.64it/s]
100%|██████████| 1/1 [00:00<00:00,  4.76it/s]
100%|██████████| 1/1 [00:00<00:00,  4.86it/s]
100%|██████████| 1/1 [00:00<00:00,  4.38it/s]
100%|██████████| 1/1 [00:00<00:00,  4.84it/s]
100%|██████████| 1/1 [00:00<00:00,  4.57it/s]
100%|██████████| 1/1 [00:00<00:00,  4.95it/s]
100%|██████████| 1/1 [00:00<00:00,  4.61it/s]
100%|██████████| 1/1 [00:00<00:00,

accuracy: 0.5, precision: nan, recall: 0.0
--- 50.81438994407654 seconds ---





In [15]:
logging.info("features: categorical, model: neural network.")

start_time = time.time()

experiment_cat = CounterfactualMembershipInference(data_cat, continuous_features_cat, outcome_name_cat, 0)
accuracy, precision, recall = experiment_cat.membership_inference_experiment(100,\
                                        MLPClassifier(hidden_layer_sizes=(32, 32, 32), random_state=0), model_access=True)

print(f'accuracy: {accuracy}, precision: {precision}, recall: {recall}')
results.loc[len(results.index)] = ['categorical', 'neural network', accuracy, precision, recall]

print("--- %s seconds ---" % (time.time() - start_time))

100%|██████████| 1/1 [00:00<00:00,  7.00it/s]
100%|██████████| 1/1 [00:00<00:00, 12.36it/s]
100%|██████████| 1/1 [00:00<00:00,  9.56it/s]
100%|██████████| 1/1 [00:00<00:00, 10.02it/s]
100%|██████████| 1/1 [00:00<00:00,  8.53it/s]
100%|██████████| 1/1 [00:00<00:00,  9.06it/s]
100%|██████████| 1/1 [00:00<00:00,  8.77it/s]
100%|██████████| 1/1 [00:00<00:00,  9.98it/s]
100%|██████████| 1/1 [00:00<00:00,  8.99it/s]
100%|██████████| 1/1 [00:00<00:00,  9.11it/s]
100%|██████████| 1/1 [00:00<00:00,  6.13it/s]
100%|██████████| 1/1 [00:00<00:00,  8.69it/s]
100%|██████████| 1/1 [00:00<00:00,  7.67it/s]
100%|██████████| 1/1 [00:00<00:00,  6.54it/s]
100%|██████████| 1/1 [00:00<00:00,  7.28it/s]
100%|██████████| 1/1 [00:00<00:00,  8.75it/s]
100%|██████████| 1/1 [00:00<00:00,  7.99it/s]
100%|██████████| 1/1 [00:00<00:00,  8.63it/s]
100%|██████████| 1/1 [00:00<00:00,  9.41it/s]
100%|██████████| 1/1 [00:00<00:00,  8.69it/s]
100%|██████████| 1/1 [00:00<00:00,  8.37it/s]
100%|██████████| 1/1 [00:00<00:00,

accuracy: 0.5, precision: nan, recall: 0.0
--- 31.722259044647217 seconds ---





# Results

The results of all variations of the membership inference experiment with counterfactuals. In each experiment, half the samples were picked randomly from the training data, while the other half were picked randomly from the control data not used for training. Both datasets originate from the same source dataset.

Accuracy is the percentage of samples whose membership (true or false) was correctly inferred. An algorithm guessing at random would achieve an accuracy of 50 percent.

Precision is the percentage of predicted training samples that is actually in the training data.

Recall is the percentage of training samples whose membership (true) was correctly inferred.

In [16]:
results

Unnamed: 0,dataset,model,accuracy,precision,recall
0,continuous,decision tree,0.55,1.0,0.1
1,continuous,random forest,0.55,1.0,0.1
2,continuous,neural network,0.5,,0.0
3,categorical,decision tree,0.5,,0.0
4,categorical,random forest,0.5,,0.0
5,categorical,neural network,0.5,,0.0


In [17]:
results.to_csv('results/cf-membership-inference-results.csv', index=False, na_rep='NaN')