# Counterfactuals Training Data Extraction Experiment

In [1]:
import pandas as pd
import sklearn.ensemble as es
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
import random
import logging
import warnings
import dice_ml

In [2]:
threads = 31

In [3]:
%run experiment_setup.ipynb

INFO:xai-privacy:Loading dataset 1: heart disease (numeric features) ...
INFO:xai-privacy:Loading dataset 2: census income (categorical features) ...


Feature Age: removed 0 rows for missing values.
Feature RestingBP: removed 59 rows for missing values.
Feature Cholesterol: removed 27 rows for missing values.
Feature FastingBS: add unknown category 2.0
Feature RestingECG: add unknown category 3.0
Feature MaxHR: removed 0 rows for missing values.
Feature Oldpeak: removed 7 rows for missing values.
Feature ST_Slope: add unknown category 4.0
Feature CA: add unknown category 4.0
Feature Thal: add unknown category 8.0
Dropped 271 of 1097
Dropped 273 of 1097
Dropped 277 of 1097
Dropped: 2399 of 32561
census: Dropped 3848 of 30162
num: Dropped 19859 of 30162
cat: Dropped 12136 of 30162


In [4]:
logger = logging.getLogger('xai-privacy')

This notebook will test whether training data extraction is possible with counterfactuals (CF) that are drawn from the training data. Training data extraction means an attacker can find out the feature values of samples from the training data without prior knowledge of them. The attacker only has access to the model's prediction function and the explanation.

This attack should be trivial because any counterfactual that is shown as an explanation was picked directly from the training data.

The idea for counterfacutal training data extraction is as follows: The attacker makes repeated queries to the model with random input values. In order to do this, the attacker knows the maximum and minimum value of each feature in the training data (or the categorical values of each feature). The returned counterfactuals are the extracted training data.

First, we implement the `train_explainer` and `training_data_extraction_model_access` functions:

In [5]:
class CounterfactualTDE(TrainingDataExtraction):
    def train_explainer(self, data_train, model):
        # train explainer on training data
        d = dice_ml.Data(dataframe=data_train, continuous_features=self.numeric_features,\
                         outcome_name=self.outcome_name)
        m = dice_ml.Model(model=model, backend="sklearn", model_type='classifier')
        
        # use method "kd-tree" to get counterfactuals drawn from the training data
        return dice_ml.Dice(d, m, method="kdtree")
        
    @staticmethod
    def training_data_extraction_no_model_access(explainer, num_queries, feature_formats, rng):
        rng = np.random.default_rng(rng)
        seed = rng.integers(100000).item()
        random.seed(seed)
        
        # Get all feature names
        feature_names = []
        
        for feature in feature_formats:
            feature_names.append(feature['name'])
        
        samples_df = pd.DataFrame(columns=feature_names)
    
        # This is the default number of counterfactuals per query used on the github page of DiCE
        cfs_per_query = 4
    
        # Generate random samples as queries for the explainer.
        for i in range(num_queries):
            sample = {}
            for feature in feature_formats:
                if feature['isCont']:
                    sample[feature['name']] = rng.integers(feature['min'], feature['max'])
                else:
                    sample[feature['name']] = random.choice(feature['categories'])
            sample_df = pd.DataFrame(sample, index=[0])
            samples_df = pd.concat([samples_df, sample_df], ignore_index=True)

        # Cast categorical features to string again because of DiCE peculiarities
        for feature in feature_formats:
            if not feature['isCont']:
                samples_df[feature['name']] = samples_df[feature['name']].astype(str)
            else:
                samples_df[feature['name']] = samples_df[feature['name']].astype(int)
                
        # Generate counterfactuals for all random query samples
        e1 = explainer.generate_counterfactuals(samples_df, total_CFs=cfs_per_query, desired_class='opposite')
                
        # Collect all extracted samples in this dataframe
        extracted_samples_df = pd.DataFrame(columns=feature_names)
        for index in range(len(samples_df)):
            cfs_of_sample = e1.cf_examples_list[index].final_cfs_df
            logger.debug(f'Sample {index}: Counterfactuals \n {cfs_of_sample.to_numpy()}')

            extracted_samples_df = pd.concat([extracted_samples_df, cfs_of_sample], ignore_index=True)
        
        return extracted_samples_df

# Executing Training Data Extraction

We now generate five counterfactuals for the first sample from the training data to demonstrate counterfactual explanations in general.

In [6]:
features = data_heart.drop(outcome_name_heart, axis=1)
labels = data_heart[outcome_name_heart]

# Train a random forest on training data.
model = es.RandomForestClassifier(random_state=0)
model = model.fit(features, labels)

# Train explainer
d = dice_ml.Data(dataframe=data_heart, continuous_features=numeric_features_heart, outcome_name=outcome_name_heart)

m = dice_ml.Model(model=model, backend="sklearn", model_type='classifier')
# Generating counterfactuals from training data (kd-tree)
exp = dice_ml.Dice(d, m, method="kdtree")

In [7]:
e1 = exp.generate_counterfactuals(features[0:1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(display_sparse_df=False)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.34s/it]

Query instance (original outcome : 0)





Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,CA,Thal,HeartDisease
0,47.0,1.0,1.0,110.0,249.0,0.0,0.0,150.0,0.0,0.0,4.0,4.0,8.0,0.0



Diverse Counterfactual set without sparsity correction (new outcome:  1


Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,CA,Thal
388,35.0,1.0,2.0,110.0,257.0,0.0,0.0,140.0,0.0,0.0,4.0,4.0,8.0
434,47.0,1.0,3.0,108.0,243.0,0.0,0.0,152.0,0.0,0.0,1.0,0.0,3.0
705,54.0,1.0,3.0,120.0,237.0,0.0,0.0,150.0,1.0,1.5,4.0,4.0,7.0
320,46.0,1.0,4.0,120.0,249.0,0.0,2.0,144.0,0.0,0.8,1.0,0.0,7.0
1111,33.0,0.0,4.0,100.0,246.0,0.0,0.0,150.0,1.0,1.0,2.0,4.0,8.0


We can see that the counterfactuals are similar to the query sample and that they have a flipped prediction. These are the two general properties of counterfactual explanations.

We will now do a small proof of concept of the experiment with logging enabled to demonstrate how it works.

In [8]:
logger.setLevel(logging.DEBUG)
logging.root.setLevel(logging.ERROR)

EXP = CounterfactualTDE(data_heart, numeric_features_heart, outcome_name_heart, random_state=0)
EXP.training_data_extraction_experiment(num_queries=12, model=es.RandomForestClassifier(random_state=0), model_access=False)

logger.setLevel(logging.INFO)

DEBUG:xai-privacy:Numeric Features: ['Age', 'RestingBP', 'Cholesterol', 'MaxHR', 'Oldpeak']
DEBUG:xai-privacy:Categorical Features: ['CA', 'ChestPainType', 'ExerciseAngina', 'FastingBS', 'RestingECG', 'ST_Slope', 'Sex', 'Thal']
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:29<00:00,  2.62s/it]
DEBUG:xai-privacy:Sample 0: Counterfactuals 
 [[51.0 '1.0' '3.0' 110.0 175.0 '0.0' '0.0' 123.0 '0.0' 0.6 '1.0' '0.0'
  '3.0']
 [49.0 '1.0' '3.0' 118.0 149.0 '0.0' '2.0' 126.0 '0.0' 0.8 '1.0' '3.0'
  '3.0']
 [55.0 '1.0' '4.0' 116.0 186.0 '1.0' '1.0' 102.0 '0.0' 0.0 '4.0' '4.0'
  '8.0']
 [54.0 '1.0' '4.0' 120.0 188.0 '0.0' '0.0' 113.0 '0.0' 1.4 '2.0' '1.0'
  '7.0']]
DEBUG:xai-privacy:Sample 1: Counterfactuals 
 [[55.0 '1.0' '3.0' 0.0 0.0 '0.0' '0.0' 155.0 '0.0' 1.5 '2.0' '4.0' '8.0']
 [32.0 '1.0' '1.0' 95.0 0.0 '2.0' '0.0' 127.0 '

Total time: 30.86s (training model: 0.38s, training explainer: 0.06s, experiment: 30.42s)
Number of extracted samples: 39
Number of accurate extracted samples: 39
Precision: 1.0, recall: 3.25


The proof of concept should show that each extracted sample is an actual training sample (precision of 100%). Recall is above 100% because this method can extract multiple samples per query (multiple counterfactuals are returned). Recall will reach a reasonable value if the experiment is executed for the full training data. In this case, the attack cannot return more samples than the number of queries because the attack is limited by the number of training samples.

Now we begin executing the actual experiment. We begin by defining the table that will hold the results for all our different experiment variations. Then we execute all variations of the experiment for this dataset. We vary the model between a decision tree, a random forest and a neural network. Each model uses the default configuration of scikit-learn.

In [9]:
results_ = {'dataset': [], 'model': [], 'precision': [], 'recall': []}

results = pd.DataFrame(data = results_)

In [10]:
dataset_dicts = [data_heart_dict, data_heart_num_dict, data_heart_cat_dict, data_census_dict, data_census_num_dict, data_census_cat_dict]

dt_dict = {'name': 'decision tree', 'model': DecisionTreeClassifier}
rf_dict = {'name': 'random forest', 'model': es.RandomForestClassifier}
nn_dict = {'name': 'neural network', 'model': MLPClassifier}

model_dicts = [dt_dict, rf_dict, nn_dict]

# We set the number of extractions to the length of the dataset
num_queries_dict = { 'heart': len(data_heart), 'heart numeric': len(data_heart_num), 'heart categorical': len(data_heart_cat), 'census': len(data_census), 'census numeric': len(data_census_num), 'census categorical': len(data_census_cat)}

In [11]:
# remove pandas warnings
warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)

In [12]:
# This will run the experiment for each dataset and model combination

results = run_all_experiments(CounterfactualTDE, dataset_dicts, model_dicts, random_state=0, num_queries=num_queries_dict, model_access=False, threads=threads, results_table=results, is_mem_inf=False, convert_cat_to_str=True)

dataset: heart, model: decision tree


100%|██████████| 26/26 [00:26<00:00,  1.17s/it]
 92%|█████████▏| 24/26 [00:36<00:02,  1.40s/it]
100%|██████████| 27/27 [00:37<00:00,  1.98s/it]
 70%|███████   | 19/27 [00:37<00:21,  2.74s/it]
100%|██████████| 27/27 [00:38<00:00,  1.57s/it]
100%|██████████| 27/27 [00:39<00:00,  1.28s/it]
100%|██████████| 26/26 [00:39<00:00,  1.10s/it]
100%|██████████| 27/27 [00:40<00:00,  1.84s/it]
100%|██████████| 27/27 [00:40<00:00,  1.67s/it]
100%|██████████| 27/27 [00:40<00:00,  1.05s/it]
100%|██████████| 26/26 [00:40<00:00,  1.99s/it]
100%|██████████| 26/26 [00:41<00:00,  1.90s/it]
100%|██████████| 27/27 [00:41<00:00,  1.40s/it]
100%|██████████| 26/26 [00:43<00:00,  1.09s/it]
100%|██████████| 27/27 [00:46<00:00,  3.40s/it]
100%|██████████| 27/27 [00:46<00:00,  1.07s/it]
100%|██████████| 26/26 [00:46<00:00,  2.31s/it]
100%|██████████| 26/26 [00:46<00:00,  1.32s/it]
100%|██████████| 26/26 [00:47<00:00,  1.36s/it]
100%|██████████| 27/27 [00:48<00:00,  1.98s/it]
100%|██████████| 27/27 [00:48<00:00,  1.

Total time: 59.68s (training model: 0.04s, training explainer: 0.08s, experiment: 59.56s)
Number of extracted samples: 540
Number of accurate extracted samples: 540
Precision: 1.0, recall: 0.6537530266343826
dataset: heart, model: random forest


100%|██████████| 27/27 [00:48<00:00,  2.71s/it]
100%|██████████| 26/26 [01:04<00:00,  2.62s/it]
100%|██████████| 27/27 [01:07<00:00,  2.13s/it]
100%|██████████| 27/27 [01:09<00:00,  2.72s/it]
100%|██████████| 27/27 [01:13<00:00,  1.67s/it]
100%|██████████| 27/27 [01:14<00:00,  1.20s/it]
100%|██████████| 27/27 [01:15<00:00,  4.89s/it]
 96%|█████████▌| 25/26 [01:16<00:04,  4.19s/it]
100%|██████████| 26/26 [01:17<00:00,  2.15s/it]
100%|██████████| 26/26 [01:18<00:00,  3.60s/it]
100%|██████████| 27/27 [01:19<00:00,  1.64s/it]
100%|██████████| 27/27 [01:19<00:00,  5.06s/it]
100%|██████████| 26/26 [01:22<00:00,  1.54s/it]
100%|██████████| 27/27 [01:23<00:00,  3.54s/it]
100%|██████████| 27/27 [01:23<00:00,  2.08s/it]
100%|██████████| 26/26 [01:23<00:00,  3.91s/it]
100%|██████████| 26/26 [01:25<00:00,  3.47s/it]
100%|██████████| 27/27 [01:25<00:00,  2.02s/it]
100%|██████████| 26/26 [01:26<00:00,  2.20s/it]
100%|██████████| 26/26 [01:29<00:00,  3.47s/it]
100%|██████████| 26/26 [01:31<00:00,  4.

Total time: 125.34s (training model: 0.78s, training explainer: 0.09s, experiment: 124.47s)
Number of extracted samples: 482
Number of accurate extracted samples: 482
Precision: 1.0, recall: 0.5835351089588378
dataset: heart, model: neural network


100%|██████████| 27/27 [00:59<00:00,  2.16s/it]
100%|██████████| 26/26 [01:05<00:00,  3.62s/it]
100%|██████████| 27/27 [01:08<00:00,  1.65s/it]
100%|██████████| 26/26 [01:09<00:00,  2.69s/it]
 81%|████████▏ | 22/27 [01:14<00:21,  4.32s/it]
 96%|█████████▋| 26/27 [01:19<00:05,  5.74s/it]
100%|██████████| 27/27 [01:19<00:00,  2.02s/it]
100%|██████████| 26/26 [01:20<00:00,  1.77s/it]
 88%|████████▊ | 23/26 [01:20<00:13,  4.45s/it]
100%|██████████| 27/27 [01:21<00:00,  2.14s/it]
100%|██████████| 27/27 [01:21<00:00,  2.84s/it]
100%|██████████| 27/27 [01:23<00:00,  2.11s/it]
100%|██████████| 27/27 [01:23<00:00,  4.50s/it]
100%|██████████| 27/27 [01:27<00:00,  1.85s/it]
100%|██████████| 27/27 [01:27<00:00,  3.10s/it]
100%|██████████| 27/27 [01:27<00:00,  4.26s/it]
100%|██████████| 26/26 [01:28<00:00,  2.58s/it]
100%|██████████| 27/27 [01:28<00:00,  1.50s/it]
100%|██████████| 26/26 [01:29<00:00,  3.31s/it]
100%|██████████| 27/27 [01:29<00:00,  1.70s/it]
100%|██████████| 27/27 [01:30<00:00,  2.

Total time: 111.11s (training model: 4.31s, training explainer: 0.09s, experiment: 106.71s)
Number of extracted samples: 469
Number of accurate extracted samples: 469
Precision: 1.0, recall: 0.5677966101694916
dataset: heart numeric, model: decision tree


100%|██████████| 27/27 [00:14<00:00,  1.37it/s]
100%|██████████| 27/27 [00:17<00:00,  1.91it/s]
100%|██████████| 27/27 [00:17<00:00,  2.19it/s]
100%|██████████| 27/27 [00:17<00:00,  2.23it/s]
100%|██████████| 26/26 [00:17<00:00,  1.17it/s]
 74%|███████▍  | 20/27 [00:18<00:04,  1.66it/s]
 65%|██████▌   | 17/26 [00:18<00:03,  2.30it/s]
100%|██████████| 27/27 [00:18<00:00,  1.75it/s]
100%|██████████| 26/26 [00:18<00:00,  1.29it/s]
 93%|█████████▎| 25/27 [00:19<00:02,  1.05s/it]
100%|██████████| 27/27 [00:20<00:00,  1.30it/s]
100%|██████████| 27/27 [00:21<00:00,  1.99it/s]
100%|██████████| 26/26 [00:21<00:00,  1.10s/it]
100%|██████████| 27/27 [00:21<00:00,  1.27it/s]
100%|██████████| 27/27 [00:21<00:00,  1.56it/s]
100%|██████████| 27/27 [00:22<00:00,  1.48it/s]
100%|██████████| 27/27 [00:22<00:00,  1.32it/s]
100%|██████████| 26/26 [00:22<00:00,  1.13it/s]
100%|██████████| 27/27 [00:22<00:00,  1.09s/it]
100%|██████████| 27/27 [00:22<00:00,  1.52s/it]

 96%|█████████▋| 26/27 [00:22<00:00,  1

Total time: 28.31s (training model: 0.01s, training explainer: 0.01s, experiment: 28.30s)
Number of extracted samples: 532
Number of accurate extracted samples: 532
Precision: 1.0, recall: 0.6456310679611651
dataset: heart numeric, model: random forest


100%|██████████| 27/27 [00:32<00:00,  1.03it/s]
100%|██████████| 27/27 [00:43<00:00,  1.73s/it]
100%|██████████| 26/26 [00:46<00:00,  1.93s/it]
100%|██████████| 27/27 [00:47<00:00,  1.52s/it]
100%|██████████| 26/26 [00:47<00:00,  1.77s/it]
100%|██████████| 27/27 [00:46<00:00,  2.04s/it]
100%|██████████| 27/27 [00:47<00:00,  1.97s/it]
100%|██████████| 27/27 [00:48<00:00,  1.72s/it]
100%|██████████| 26/26 [00:49<00:00,  1.12it/s]
100%|██████████| 27/27 [00:52<00:00,  1.20s/it]
100%|██████████| 26/26 [00:53<00:00,  2.37s/it]
100%|██████████| 27/27 [00:53<00:00,  1.04it/s]
100%|██████████| 27/27 [00:54<00:00,  2.75s/it]
100%|██████████| 26/26 [00:53<00:00,  1.21s/it]
100%|██████████| 26/26 [00:54<00:00,  1.95s/it]
100%|██████████| 26/26 [00:54<00:00,  2.11s/it]
100%|██████████| 26/26 [00:55<00:00,  2.53s/it]
100%|██████████| 27/27 [00:56<00:00,  1.64s/it]
100%|██████████| 26/26 [00:59<00:00,  1.99s/it]
100%|██████████| 26/26 [01:00<00:00,  3.54s/it]
 93%|█████████▎| 25/27 [01:01<00:04,  2.

Total time: 83.56s (training model: 0.29s, training explainer: 0.03s, experiment: 83.24s)
Number of extracted samples: 504
Number of accurate extracted samples: 504
Precision: 1.0, recall: 0.6116504854368932
dataset: heart numeric, model: neural network


100%|██████████| 27/27 [00:37<00:00,  1.34s/it]
 85%|████████▌ | 23/27 [00:40<00:08,  2.22s/it]
 89%|████████▉ | 24/27 [00:44<00:04,  1.50s/it]
100%|██████████| 26/26 [00:48<00:00,  1.51s/it]
100%|██████████| 26/26 [00:49<00:00,  1.99s/it]
100%|██████████| 27/27 [00:49<00:00,  2.54s/it]
100%|██████████| 26/26 [00:49<00:00,  1.41s/it]
100%|██████████| 26/26 [00:50<00:00,  1.44s/it]
100%|██████████| 27/27 [00:51<00:00,  2.02s/it]
100%|██████████| 26/26 [00:51<00:00,  1.33s/it]
 88%|████████▊ | 23/26 [00:53<00:09,  3.31s/it]
 93%|█████████▎| 25/27 [00:53<00:06,  3.30s/it]
100%|██████████| 26/26 [00:55<00:00,  1.27s/it]
 85%|████████▍ | 22/26 [00:55<00:11,  2.86s/it]
100%|██████████| 26/26 [00:56<00:00,  1.91s/it]
100%|██████████| 26/26 [00:56<00:00,  1.14it/s]
100%|██████████| 26/26 [00:56<00:00,  1.63s/it]
 88%|████████▊ | 23/26 [00:56<00:06,  2.31s/it]
100%|██████████| 26/26 [00:57<00:00,  1.67s/it]
100%|██████████| 27/27 [00:56<00:00,  2.26s/it]
100%|██████████| 27/27 [00:58<00:00,  1.

Total time: 67.75s (training model: 2.53s, training explainer: 0.03s, experiment: 65.19s)
Number of extracted samples: 493
Number of accurate extracted samples: 493
Precision: 1.0, recall: 0.5983009708737864
dataset: heart categorical, model: decision tree


 81%|████████  | 21/26 [00:08<00:02,  2.26it/s]
 77%|███████▋  | 20/26 [00:09<00:02,  2.06it/s]
 85%|████████▍ | 22/26 [00:10<00:01,  2.03it/s]
 77%|███████▋  | 20/26 [00:09<00:02,  2.09it/s]
100%|██████████| 27/27 [00:10<00:00,  2.50it/s]
100%|██████████| 26/26 [00:10<00:00,  2.30it/s]
 89%|████████▉ | 24/27 [00:10<00:01,  2.01it/s]
100%|██████████| 27/27 [00:10<00:00,  2.50it/s]
100%|██████████| 27/27 [00:11<00:00,  2.16it/s]
 96%|█████████▌| 25/26 [00:11<00:00,  2.13it/s]
100%|██████████| 26/26 [00:11<00:00,  2.39it/s]
 78%|███████▊  | 21/27 [00:11<00:02,  2.42it/s]
100%|██████████| 27/27 [00:11<00:00,  2.58it/s]
100%|██████████| 26/26 [00:11<00:00,  2.46it/s]
100%|██████████| 26/26 [00:11<00:00,  3.07it/s]
 93%|█████████▎| 25/27 [00:11<00:00,  3.13it/s]
100%|██████████| 26/26 [00:11<00:00,  2.15it/s]
100%|██████████| 26/26 [00:11<00:00,  2.91it/s]
 85%|████████▍ | 22/26 [00:11<00:02,  1.38it/s]
100%|██████████| 26/26 [00:12<00:00,  1.89it/s]
100%|██████████| 27/27 [00:12<00:00,  1.

Total time: 16.14s (training model: 0.03s, training explainer: 0.04s, experiment: 16.07s)
Number of extracted samples: 709
Number of accurate extracted samples: 709
Precision: 1.0, recall: 0.8646341463414634
dataset: heart categorical, model: random forest


100%|██████████| 26/26 [00:15<00:00,  1.88it/s]
100%|██████████| 26/26 [00:15<00:00,  1.52it/s]
100%|██████████| 26/26 [00:16<00:00,  1.52it/s]
100%|██████████| 27/27 [00:17<00:00,  1.73it/s]
100%|██████████| 26/26 [00:16<00:00,  1.39it/s]

100%|██████████| 26/26 [00:16<00:00,  1.71it/s]
100%|██████████| 26/26 [00:17<00:00,  1.48it/s]
100%|██████████| 27/27 [00:17<00:00,  1.40it/s]
100%|██████████| 26/26 [00:17<00:00,  1.44it/s]
100%|██████████| 26/26 [00:17<00:00,  1.39it/s]
100%|██████████| 27/27 [00:17<00:00,  1.27it/s]
100%|██████████| 26/26 [00:17<00:00,  1.39it/s]
100%|██████████| 26/26 [00:18<00:00,  1.33it/s]
 96%|█████████▌| 25/26 [00:17<00:00,  1.56it/s]
100%|██████████| 26/26 [00:17<00:00,  1.38it/s]
100%|██████████| 26/26 [00:17<00:00,  1.72it/s]
100%|██████████| 26/26 [00:18<00:00,  1.43it/s]
100%|██████████| 26/26 [00:18<00:00,  1.47it/s]
100%|██████████| 27/27 [00:18<00:00,  1.42it/s]
100%|██████████| 27/27 [00:18<00:00,  1.31it/s]
 85%|████████▌ | 23/27 [00:17<00:03,  1

Total time: 24.14s (training model: 0.71s, training explainer: 0.05s, experiment: 23.38s)
Number of extracted samples: 648
Number of accurate extracted samples: 648
Precision: 1.0, recall: 0.7902439024390244
dataset: heart categorical, model: neural network


100%|██████████| 26/26 [00:38<00:00,  1.60s/it]
100%|██████████| 26/26 [00:39<00:00,  1.25s/it]
 93%|█████████▎| 25/27 [00:40<00:03,  1.75s/it]
100%|██████████| 27/27 [00:40<00:00,  2.03s/it]
100%|██████████| 26/26 [00:40<00:00,  1.83s/it]

 96%|█████████▌| 25/26 [00:40<00:01,  1.70s/it]
 78%|███████▊  | 21/27 [00:40<00:10,  1.76s/it]
100%|██████████| 26/26 [00:41<00:00,  1.47s/it]
 85%|████████▌ | 23/27 [00:41<00:07,  1.85s/it]
100%|██████████| 26/26 [00:40<00:00,  1.54s/it]
100%|██████████| 26/26 [00:41<00:00,  1.54s/it]
100%|██████████| 27/27 [00:41<00:00,  1.48s/it]
100%|██████████| 26/26 [00:41<00:00,  1.56s/it]
100%|██████████| 26/26 [00:42<00:00,  1.69s/it]
 93%|█████████▎| 25/27 [00:42<00:02,  1.42s/it]
100%|██████████| 27/27 [00:42<00:00,  1.33s/it]
 85%|████████▌ | 23/27 [00:42<00:05,  1.37s/it]
 93%|█████████▎| 25/27 [00:42<00:02,  1.20s/it]
 93%|█████████▎| 25/27 [00:43<00:03,  1.64s/it]
100%|██████████| 26/26 [00:43<00:00,  1.48s/it]
100%|██████████| 27/27 [00:43<00:00,  1

Total time: 50.73s (training model: 2.80s, training explainer: 0.04s, experiment: 47.89s)
Number of extracted samples: 645
Number of accurate extracted samples: 645
Precision: 1.0, recall: 0.7865853658536586
dataset: census, model: decision tree


 98%|█████████▊| 835/848 [18:21<00:18,  1.40s/it]
100%|██████████| 849/849 [18:30<00:00,  1.16s/it]
100%|██████████| 849/849 [18:31<00:00,  1.09s/it]
100%|██████████| 848/848 [18:32<00:00,  1.41s/it]
100%|██████████| 849/849 [18:35<00:00,  1.45s/it]
 99%|█████████▉| 839/849 [18:37<00:13,  1.40s/it]
100%|██████████| 849/849 [18:37<00:00,  1.08it/s]
100%|██████████| 849/849 [18:35<00:00,  1.46s/it]
 99%|█████████▊| 838/849 [18:37<00:15,  1.42s/it]
100%|██████████| 848/848 [18:37<00:00,  1.27s/it]
100%|██████████| 849/849 [18:39<00:00,  1.57s/it]
 99%|█████████▉| 838/848 [18:38<00:15,  1.50s/it]
100%|██████████| 849/849 [18:37<00:00,  1.13s/it]
100%|██████████| 849/849 [18:38<00:00,  1.52s/it]
100%|██████████| 848/848 [18:41<00:00,  1.31s/it]
100%|██████████| 849/849 [18:48<00:00,  1.13s/it]
100%|██████████| 849/849 [18:47<00:00,  1.25s/it]
100%|██████████| 849/849 [18:47<00:00,  1.29s/it]
100%|██████████| 849/849 [18:49<00:00,  1.22s/it]
100%|██████████| 849/849 [18:49<00:00,  1.31s/it]


Total time: 1179.12s (training model: 1.40s, training explainer: 0.28s, experiment: 1177.43s)
Number of extracted samples: 2027
Number of accurate extracted samples: 2027
Precision: 1.0, recall: 0.07703123812419245
dataset: census, model: random forest


100%|██████████| 849/849 [40:22<00:00,  3.32s/it]
100%|██████████| 849/849 [40:26<00:00,  3.19s/it]
100%|██████████| 849/849 [40:20<00:00,  2.85s/it]
100%|██████████| 848/848 [40:29<00:00,  3.01s/it]
100%|██████████| 849/849 [40:21<00:00,  2.73s/it]
100%|█████████▉| 847/849 [40:21<00:05,  2.70s/it]
100%|██████████| 848/848 [40:31<00:00,  2.75s/it]
100%|██████████| 849/849 [40:39<00:00,  3.08s/it]
100%|██████████| 849/849 [40:26<00:00,  2.61s/it]
100%|██████████| 848/848 [40:30<00:00,  2.99s/it]
100%|██████████| 849/849 [40:28<00:00,  2.71s/it]
100%|██████████| 848/848 [40:46<00:00,  2.76s/it]
100%|██████████| 849/849 [40:35<00:00,  2.66s/it]
100%|██████████| 849/849 [40:38<00:00,  2.85s/it]
100%|██████████| 849/849 [40:46<00:00,  2.61s/it]
100%|██████████| 849/849 [40:50<00:00,  2.75s/it]
100%|██████████| 849/849 [40:39<00:00,  2.53s/it]
100%|██████████| 849/849 [40:38<00:00,  2.52s/it]
100%|██████████| 849/849 [40:44<00:00,  3.06s/it]
100%|██████████| 849/849 [40:48<00:00,  2.78s/it]


Total time: 2539.67s (training model: 35.51s, training explainer: 0.19s, experiment: 2503.98s)
Number of extracted samples: 1476
Number of accurate extracted samples: 1476
Precision: 1.0, recall: 0.05609181424336855
dataset: census, model: neural network


 97%|█████████▋| 820/848 [39:33<01:05,  2.34s/it]
 98%|█████████▊| 831/849 [39:51<00:45,  2.53s/it]
100%|██████████| 848/848 [39:51<00:00,  2.52s/it]
100%|██████████| 849/849 [39:54<00:00,  2.57s/it]
100%|██████████| 849/849 [40:09<00:00,  2.31s/it]
100%|██████████| 849/849 [40:19<00:00,  3.09s/it]
100%|██████████| 849/849 [40:27<00:00,  2.31s/it]
100%|██████████| 849/849 [40:29<00:00,  2.70s/it]

 98%|█████████▊| 831/849 [40:28<00:41,  2.28s/it]
100%|██████████| 848/848 [40:29<00:00,  2.42s/it]
100%|██████████| 849/849 [40:35<00:00,  2.52s/it]
100%|██████████| 849/849 [40:33<00:00,  2.01s/it]
100%|██████████| 849/849 [40:35<00:00,  2.16s/it]
100%|██████████| 849/849 [40:39<00:00,  2.66s/it]
100%|██████████| 849/849 [40:40<00:00,  2.80s/it]
100%|██████████| 848/848 [40:40<00:00,  2.17s/it]
100%|██████████| 849/849 [40:40<00:00,  2.02s/it]
100%|██████████| 849/849 [40:43<00:00,  2.18s/it]
 99%|█████████▉| 838/848 [40:41<00:19,  1.95s/it]
100%|██████████| 849/849 [40:43<00:00,  1.89s/it]

Total time: 2617.26s (training model: 132.49s, training explainer: 0.19s, experiment: 2484.57s)
Number of extracted samples: 1866
Number of accurate extracted samples: 1866
Precision: 1.0, recall: 0.07091282207190089
dataset: census numeric, model: decision tree


 98%|█████████▊| 326/333 [01:20<00:02,  3.05it/s]
 98%|█████████▊| 324/332 [01:22<00:01,  4.00it/s]
 95%|█████████▍| 315/333 [01:22<00:03,  4.64it/s]
 94%|█████████▎| 311/332 [01:22<00:05,  3.52it/s]
100%|██████████| 332/332 [01:23<00:00,  4.49it/s]
100%|██████████| 332/332 [01:24<00:00,  4.10it/s]
100%|██████████| 332/332 [01:24<00:00,  3.95it/s]
100%|██████████| 332/332 [01:24<00:00,  3.92it/s]
 94%|█████████▍| 313/332 [01:25<00:05,  3.72it/s]
100%|██████████| 333/333 [01:25<00:00,  4.13it/s]
 99%|█████████▉| 329/332 [01:25<00:00,  5.45it/s]
100%|██████████| 332/332 [01:25<00:00,  3.50it/s]
100%|██████████| 332/332 [01:26<00:00,  3.43it/s]
100%|██████████| 332/332 [01:26<00:00,  4.24it/s]
100%|██████████| 332/332 [01:25<00:00,  4.17it/s]
100%|██████████| 332/332 [01:26<00:00,  5.38it/s]
100%|██████████| 332/332 [01:26<00:00,  4.55it/s]
 98%|█████████▊| 326/333 [01:26<00:01,  3.86it/s]
 97%|█████████▋| 323/332 [01:27<00:02,  4.39it/s]
100%|██████████| 332/332 [01:27<00:00,  4.38it/s]


Total time: 99.89s (training model: 0.06s, training explainer: 0.03s, experiment: 99.81s)
Number of extracted samples: 1377
Number of accurate extracted samples: 1377
Precision: 1.0, recall: 0.13365039308939144
dataset: census numeric, model: random forest


100%|██████████| 332/332 [04:19<00:00,  1.58it/s]
100%|██████████| 333/333 [04:22<00:00,  1.55it/s]
100%|██████████| 332/332 [04:24<00:00,  1.12it/s]
100%|██████████| 332/332 [04:20<00:00,  1.13it/s]
100%|██████████| 332/332 [04:26<00:00,  1.40it/s]
100%|█████████▉| 331/332 [04:23<00:00,  1.47it/s]
100%|██████████| 332/332 [04:26<00:00,  1.05it/s]
100%|██████████| 332/332 [04:24<00:00,  1.20it/s]
100%|██████████| 332/332 [04:26<00:00,  1.16it/s]
100%|██████████| 332/332 [04:30<00:00,  1.50it/s]
100%|██████████| 332/332 [04:29<00:00,  1.25it/s]
100%|██████████| 332/332 [04:26<00:00,  1.09it/s]
100%|██████████| 333/333 [04:30<00:00,  1.19it/s]
100%|██████████| 333/333 [04:28<00:00,  1.22it/s]
100%|██████████| 332/332 [04:29<00:00,  1.27it/s]

100%|██████████| 333/333 [04:27<00:00,  1.00it/s]
100%|██████████| 333/333 [04:29<00:00,  1.15it/s]
100%|██████████| 332/332 [04:32<00:00,  1.20it/s]
100%|██████████| 333/333 [04:29<00:00,  1.12it/s]
100%|█████████▉| 331/332 [04:34<00:00,  1.14it/s]

Total time: 289.17s (training model: 1.28s, training explainer: 0.03s, experiment: 287.87s)
Number of extracted samples: 1080
Number of accurate extracted samples: 1080
Precision: 1.0, recall: 0.10482383771716976
dataset: census numeric, model: neural network


 96%|█████████▌| 320/333 [10:13<00:21,  1.62s/it]
100%|██████████| 332/332 [10:20<00:00,  2.12s/it]
100%|██████████| 333/333 [10:23<00:00,  1.57s/it]
100%|██████████| 332/332 [10:22<00:00,  1.73s/it]
100%|██████████| 332/332 [10:30<00:00,  1.63s/it]
 99%|█████████▉| 329/332 [10:29<00:04,  1.47s/it]
100%|██████████| 333/333 [10:29<00:00,  1.23s/it]
 99%|█████████▉| 329/332 [10:33<00:04,  1.39s/it]
100%|██████████| 332/332 [10:34<00:00,  1.45s/it]
100%|██████████| 332/332 [10:32<00:00,  1.53s/it]
100%|██████████| 332/332 [10:38<00:00,  1.46s/it]
 99%|█████████▉| 328/332 [10:37<00:06,  1.53s/it]
100%|██████████| 332/332 [10:38<00:00,  1.64s/it]
 98%|█████████▊| 327/333 [10:38<00:11,  1.91s/it]
100%|██████████| 333/333 [10:41<00:00,  1.10s/it]
100%|██████████| 332/332 [10:41<00:00,  1.10s/it]
100%|██████████| 333/333 [10:41<00:00,  1.21s/it]
100%|██████████| 333/333 [10:39<00:00,  1.90s/it]
100%|██████████| 332/332 [10:42<00:00,  1.17s/it]
100%|██████████| 333/333 [10:43<00:00,  1.12s/it]


Total time: 679.19s (training model: 26.76s, training explainer: 0.02s, experiment: 652.40s)
Number of extracted samples: 1468
Number of accurate extracted samples: 1468
Precision: 1.0, recall: 0.14248277200815296
dataset: census categorical, model: decision tree


100%|██████████| 581/581 [10:38<00:00,  1.04s/it]
100%|█████████▉| 579/581 [10:41<00:02,  1.21s/it]
100%|██████████| 581/581 [10:43<00:00,  1.33it/s]
100%|██████████| 581/581 [10:43<00:00,  1.03s/it]
100%|██████████| 582/582 [10:49<00:00,  1.01it/s]
100%|██████████| 581/581 [10:49<00:00,  1.25it/s]
100%|██████████| 582/582 [10:52<00:00,  1.26s/it]
100%|██████████| 581/581 [10:52<00:00,  1.00s/it]
100%|██████████| 582/582 [10:53<00:00,  1.21s/it]
100%|██████████| 581/581 [10:53<00:00,  1.00it/s]
100%|██████████| 582/582 [10:55<00:00,  1.04s/it]
100%|██████████| 582/582 [10:56<00:00,  1.10s/it]
100%|██████████| 581/581 [10:56<00:00,  1.15s/it]
100%|██████████| 582/582 [10:56<00:00,  1.37s/it]
100%|██████████| 581/581 [10:56<00:00,  1.17s/it]
100%|██████████| 582/582 [10:59<00:00,  1.14s/it]
100%|██████████| 581/581 [10:58<00:00,  1.05it/s]
100%|██████████| 582/582 [10:59<00:00,  1.20s/it]
100%|██████████| 582/582 [11:00<00:00,  1.18it/s]
100%|██████████| 582/582 [11:00<00:00,  1.07it/s]


Total time: 745.76s (training model: 1.23s, training explainer: 0.56s, experiment: 743.97s)
Number of extracted samples: 14178
Number of accurate extracted samples: 14178
Precision: 1.0, recall: 0.7865305669588373
dataset: census categorical, model: random forest


100%|██████████| 581/581 [20:26<00:00,  1.96s/it]
 97%|█████████▋| 566/581 [20:29<00:28,  1.89s/it]
100%|██████████| 581/581 [20:41<00:00,  1.96s/it]
100%|██████████| 582/582 [20:41<00:00,  2.26s/it]
100%|██████████| 581/581 [20:38<00:00,  2.30s/it]
100%|██████████| 582/582 [20:38<00:00,  2.10s/it]
 99%|█████████▉| 575/582 [20:33<00:15,  2.20s/it]
100%|██████████| 582/582 [20:37<00:00,  1.66s/it]
100%|██████████| 582/582 [20:42<00:00,  2.05s/it]
100%|██████████| 582/582 [20:46<00:00,  2.06s/it]
100%|██████████| 582/582 [20:42<00:00,  1.78s/it]
100%|█████████▉| 580/581 [20:56<00:01,  1.88s/it]
100%|██████████| 582/582 [20:54<00:00,  1.98s/it]
100%|██████████| 582/582 [20:43<00:00,  1.89s/it]
100%|██████████| 581/581 [20:58<00:00,  1.76s/it]
100%|██████████| 582/582 [20:46<00:00,  1.86s/it]
100%|██████████| 582/582 [20:55<00:00,  2.04s/it]
100%|██████████| 581/581 [20:56<00:00,  2.13s/it]
100%|██████████| 581/581 [20:57<00:00,  2.19s/it]
100%|██████████| 581/581 [20:48<00:00,  1.86s/it]


Total time: 1372.50s (training model: 16.91s, training explainer: 0.48s, experiment: 1355.11s)
Number of extracted samples: 14038
Number of accurate extracted samples: 14038
Precision: 1.0, recall: 0.7787640075446577
dataset: census categorical, model: neural network


 99%|█████████▉| 578/581 [22:04<00:07,  2.51s/it]
100%|██████████| 581/581 [22:04<00:00,  2.15s/it]
100%|██████████| 581/581 [22:08<00:00,  1.82s/it]
100%|██████████| 581/581 [22:10<00:00,  1.79s/it]
100%|██████████| 581/581 [22:13<00:00,  2.86s/it]
100%|██████████| 581/581 [22:15<00:00,  2.75s/it]
 99%|█████████▉| 576/581 [22:17<00:08,  1.75s/it]
100%|██████████| 582/582 [22:18<00:00,  1.69s/it]
100%|██████████| 582/582 [22:17<00:00,  2.24s/it]
100%|██████████| 582/582 [22:23<00:00,  1.97s/it]
100%|██████████| 581/581 [22:24<00:00,  1.86s/it]
100%|██████████| 582/582 [22:26<00:00,  1.62s/it]
100%|██████████| 582/582 [22:27<00:00,  2.36s/it]
100%|██████████| 581/581 [22:26<00:00,  1.78s/it]
100%|██████████| 581/581 [22:27<00:00,  1.97s/it]
100%|██████████| 581/581 [22:25<00:00,  1.70s/it]
100%|██████████| 581/581 [22:29<00:00,  1.97s/it]
100%|██████████| 581/581 [22:30<00:00,  1.30s/it]
 99%|█████████▉| 577/581 [22:31<00:06,  1.58s/it]
100%|██████████| 582/582 [22:33<00:00,  1.70s/it]


Total time: 1525.03s (training model: 88.70s, training explainer: 0.41s, experiment: 1435.93s)
Number of extracted samples: 12541
Number of accurate extracted samples: 12541
Precision: 1.0, recall: 0.6957172972373239


# Results

Precision is the percentage of extracted samples that is actually from the training data. 

Recall is the ratio of the number extracted training samples to all training samples.

In [13]:
results

Unnamed: 0,dataset,model,precision,recall
0,heart,decision tree,1.0,0.653753
1,heart,random forest,1.0,0.583535
2,heart,neural network,1.0,0.567797
3,heart numeric,decision tree,1.0,0.645631
4,heart numeric,random forest,1.0,0.61165
5,heart numeric,neural network,1.0,0.598301
6,heart categorical,decision tree,1.0,0.864634
7,heart categorical,random forest,1.0,0.790244
8,heart categorical,neural network,1.0,0.786585
9,census,decision tree,1.0,0.077031


In [14]:
results.to_csv('results/2-1-cf-training-data-extraction-results.csv', index=False, na_rep='NaN', float_format='%.3f')

# Discussion

In our experiments, training data extraction with counterfactuals drawn from the training data has a recall between 45% and 67% for numeric data and 30% to 64% for categorical data. Since the attack cannot produce any false positive samples, precision is always 100%.