# Running attribute inference attacks on regular and anonymized data

In this tutorial we will show how to run both black-box and white-box inference attacks. This will be demonstarted on the Nursery dataset (original dataset can be found here: https://archive.ics.uci.edu/ml/datasets/nursery). 

The sensitive feature we are trying to infer is the 'social' feature, after we have turned it into a binary feature (the original value 'problematic' receives the new value 1 and the rest 0).

## Load data

In [1]:
import pandas as pd

df = pd.read_csv('Nursery_social_prepared_train.csv', sep=',', engine='python')

df

Unnamed: 0,label,children,social,parents_pretentious,parents_great_pret,parents_usual,has_nurs_very_crit,has_nurs_improper,has_nurs_proper,has_nurs_critical,...,form_incomplete,form_foster,housing_critical,housing_convenient,housing_less_conv,finance_convenient,finance_inconv,health_priority,health_recommended,health_not_recom
0,1,0.444450,-0.704142,1.434509,-0.713050,-0.711204,2.000724,-0.499216,-0.500723,-0.505541,...,-0.567324,-0.577425,1.428869,-0.711511,-0.709974,0.985252,-0.985252,1.399405,-0.708745,-0.698019
1,1,0.444450,-0.704142,-0.697102,1.402427,-0.711204,-0.499819,2.003141,-0.500723,-0.505541,...,-0.567324,-0.577425,-0.699854,1.405459,-0.709974,-1.014968,1.014968,1.399405,-0.708745,-0.698019
2,3,1.335242,1.420169,1.434509,-0.713050,-0.711204,-0.499819,-0.499216,1.997111,-0.505541,...,-0.567324,-0.577425,-0.699854,-0.711511,1.408503,-1.014968,1.014968,-0.714590,1.410946,-0.698019
3,3,0.444450,-0.704142,-0.697102,-0.713050,1.406067,-0.499819,-0.499216,-0.500723,1.978079,...,1.762661,-0.577425,-0.699854,1.405459,-0.709974,0.985252,-0.985252,1.399405,-0.708745,-0.698019
4,3,0.444450,-0.704142,-0.697102,-0.713050,1.406067,-0.499819,-0.499216,1.997111,-0.505541,...,-0.567324,-0.577425,1.428869,-0.711511,-0.709974,0.985252,-0.985252,1.399405,-0.708745,-0.698019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5178,0,-1.337132,-0.704142,1.434509,-0.713050,-0.711204,-0.499819,-0.499216,-0.500723,1.978079,...,-0.567324,-0.577425,-0.699854,-0.711511,1.408503,-1.014968,1.014968,-0.714590,-0.708745,1.432625
5179,1,-1.337132,1.420169,-0.697102,1.402427,-0.711204,-0.499819,2.003141,-0.500723,-0.505541,...,1.762661,-0.577425,-0.699854,1.405459,-0.709974,-1.014968,1.014968,-0.714590,1.410946,-0.698019
5180,3,-0.446341,1.420169,-0.697102,-0.713050,1.406067,-0.499819,2.003141,-0.500723,-0.505541,...,1.762661,-0.577425,-0.699854,1.405459,-0.709974,-1.014968,1.014968,-0.714590,1.410946,-0.698019
5181,1,-0.446341,-0.704142,-0.697102,-0.713050,1.406067,2.000724,-0.499216,-0.500723,-0.505541,...,-0.567324,-0.577425,1.428869,-0.711511,-0.709974,0.985252,-0.985252,1.399405,-0.708745,-0.698019


## Train decision tree model

In [2]:
from sklearn.tree import DecisionTreeClassifier

import os
import sys
sys.path.insert(0, os.path.abspath('..'))
from art.estimators.classification.scikitlearn import ScikitlearnDecisionTreeClassifier

features = df.drop(['label'], axis=1)
labels = df.loc[:, 'label']
model = DecisionTreeClassifier()
model.fit(features, labels)

art_classifier = ScikitlearnDecisionTreeClassifier(model)

df_test = pd.read_csv('Nursery_prepared_test.csv', sep=',', engine='python')
features_test = df_test.drop(['label'], axis=1)
labels_test = df_test.loc[:, 'label']
test_data = features_test.to_numpy()

print('Base model accuracy: ', model.score(features_test, labels_test))

Base model accuracy:  0.6851851851851852


## Attack
### Black-box attack
The black-box attack basically trains an additional classifier (called the attack model) to predict the attacked feature's value from the remaining n-1 features as well as the original (attacked) model's predictions.
#### Train attack model

In [12]:
import numpy as np
from art.attacks.inference import AttributeInferenceBlackBox

attack_feature = 1
data = features.to_numpy()

# training data without attacked feature
x_train_for_attack = np.delete(data, attack_feature, 1)
# only attacked feature
x_train_feature = data[:, attack_feature].copy().reshape(-1, 1)

bb_attack = AttributeInferenceBlackBox(art_classifier, attack_feature=attack_feature)

# get original model's predictions
x_train_predictions = np.array([np.argmax(arr) for arr in art_classifier.predict(data)]).reshape(-1,1)

# train attack model
bb_attack.fit(test_data)

#### Infer sensitive feature and check accuracy

In [13]:
# get inferred values
values = [-0.704141531, 1.420169037]
inferred_train_bb = bb_attack.infer(x_train_for_attack, x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_bb)
print(train_acc)

0.6158595408064828


This means that for 61% of the training set, the attacked feature is inferred correctly using this attack.

## Whitebox attacks
These two attacks do not train any additional model, they simply use additional information coded within the attacked decision tree model to compute the probability of each value of the attacked feature and outputs the value with the highest probability.
### First attack

In [14]:
from art.attacks.inference import AttributeInferenceWhiteBoxLifestyleDecisionTree

wb_attack = AttributeInferenceWhiteBoxLifestyleDecisionTree(art_classifier, attack_feature=attack_feature)

priors = [3465 / 5183, 1718 / 5183]

# get inferred values
inferred_train_wb1 = wb_attack.infer(x_train_for_attack, x_train_predictions, values=values, priors=priors)

# check accuracy
train_acc = np.sum(inferred_train_wb1 == x_train_feature.reshape(1,-1)) / len(inferred_train_wb1)
print(train_acc)

0.6183677406907196


### Second attack

In [15]:
from art.attacks.inference import AttributeInferenceWhiteBoxDecisionTree

wb2_attack = AttributeInferenceWhiteBoxDecisionTree(art_classifier, attack_feature=attack_feature)

# get inferred values
inferred_train_wb2 = wb2_attack.infer(x_train_for_attack, x_train_predictions, values=values, priors=priors)

# check accuracy
train_acc = np.sum(inferred_train_wb2 == x_train_feature.reshape(1,-1)) / len(inferred_train_wb2)
print(train_acc)

0.6905267219756898


The white-box attacks are able to correctly infer the attacked feature value in 61% and 69% of the test set respectively. 

# Anonymized data
## k=100

Now we will apply the same attacks on an anonymized version of the same dataset (k=100). The data has been anonymized on the quasi-identifiers: form, housing, finance, social, health.

k=100 means that each record in the anonymized dataset is identical to 99 others on the quasi-identifier values (i.e., when looking only at those 4 features, the records are indistinguishable).

In [4]:
anon_df = pd.read_csv('Nursery_anonymized_prepared_train.csv', sep=',', engine='python')

anon_df

Unnamed: 0,label,children,social,parents_pretentious,parents_great_pret,parents_usual,has_nurs_very_crit,has_nurs_improper,has_nurs_proper,has_nurs_critical,...,form_foster,form_complete,housing_critical,housing_convenient,housing_less_conv,finance_convenient,finance_inconv,health_priority,health_recommended,health_not_recom
0,1,0.444450,-0.403596,1.434509,-0.713050,-0.711204,2.000724,-0.499216,-0.500723,-0.505541,...,-0.464554,-0.952322,0.942423,-0.536975,-0.572078,0.680053,-0.680053,1.399405,-0.708745,-0.698019
1,1,0.444450,-0.403596,-0.697102,1.402427,-0.711204,-0.499819,2.003141,-0.500723,-0.505541,...,-0.464554,-0.952322,-1.061095,1.862284,-0.572078,-1.470474,1.470474,1.399405,-0.708745,-0.698019
2,3,1.335242,2.477724,1.434509,-0.713050,-0.711204,-0.499819,-0.499216,1.997111,-0.505541,...,-0.464554,-0.952322,-1.061095,-0.536975,1.748015,0.680053,-0.680053,-0.714590,1.410946,-0.698019
3,3,0.444450,-0.403596,-0.697102,-0.713050,1.406067,-0.499819,-0.499216,-0.500723,1.978079,...,2.152602,-0.952322,-1.061095,1.862284,-0.572078,0.680053,-0.680053,1.399405,-0.708745,-0.698019
4,3,0.444450,-0.403596,-0.697102,-0.713050,1.406067,-0.499819,-0.499216,1.997111,-0.505541,...,-0.464554,1.050065,0.942423,-0.536975,-0.572078,0.680053,-0.680053,1.399405,-0.708745,-0.698019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5178,0,-1.337132,-0.403596,1.434509,-0.713050,-0.711204,-0.499819,-0.499216,-0.500723,1.978079,...,-0.464554,1.050065,0.942423,-0.536975,-0.572078,0.680053,-0.680053,-0.714590,-0.708745,1.432625
5179,1,-1.337132,2.477724,-0.697102,1.402427,-0.711204,-0.499819,2.003141,-0.500723,-0.505541,...,-0.464554,1.050065,-1.061095,1.862284,-0.572078,-1.470474,1.470474,-0.714590,1.410946,-0.698019
5180,3,-0.446341,2.477724,-0.697102,-0.713050,1.406067,-0.499819,2.003141,-0.500723,-0.505541,...,-0.464554,1.050065,-1.061095,1.862284,-0.572078,-1.470474,1.470474,-0.714590,1.410946,-0.698019
5181,1,-0.446341,-0.403596,-0.697102,-0.713050,1.406067,2.000724,-0.499216,-0.500723,-0.505541,...,-0.464554,1.050065,0.942423,-0.536975,-0.572078,0.680053,-0.680053,1.399405,-0.708745,-0.698019


In [84]:
# number of distinct rows in original data
len(df.drop_duplicates().index)

4479

In [85]:
# number of distinct rows in anonymized data
len(anon_df.drop_duplicates().index)

1079

## Train decision tree model

In [5]:
anon_features = anon_df.drop(['label'], axis=1)
anon_labels = anon_df.loc[:, 'label']
anon_model = DecisionTreeClassifier()
anon_model.fit(anon_features, anon_labels)

anon_art_classifier = ScikitlearnDecisionTreeClassifier(anon_model)

print('Anonymized model accuracy: ', anon_model.score(features_test, labels_test))

Anonymized model accuracy:  0.6304012345679012


## Attack
### Black-box attack

In [140]:
anon_bb_attack = AttributeInferenceBlackBox(anon_art_classifier, attack_feature=attack_feature)

# get original model's predictions
anon_x_train_predictions = np.array([np.argmax(arr) for arr in anon_art_classifier.predict(data)]).reshape(-1,1)

# train attack model
anon_bb_attack.fit(test_data)

# get inferred values
inferred_train_anon_bb = anon_bb_attack.infer(x_train_for_attack, anon_x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_anon_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_anon_bb)
print(train_acc)

0.6280146633224002


### White box attacks

In [107]:
anon_wb_attack = AttributeInferenceWhiteBoxLifestyleDecisionTree(anon_art_classifier, attack_feature=attack_feature)

# get inferred values
inferred_train_anon_wb1 = anon_wb_attack.infer(x_train_for_attack, anon_x_train_predictions, values=values, priors=priors)

# check accuracy
train_acc = np.sum(inferred_train_anon_wb1 == x_train_feature.reshape(1,-1)) / len(inferred_train_anon_wb1)
print(train_acc)

0.6143160331854138


In [108]:
anon_wb2_attack = AttributeInferenceWhiteBoxDecisionTree(anon_art_classifier, attack_feature=attack_feature)

# get inferred values
inferred_train_anon_wb2 = anon_wb2_attack.infer(x_train_for_attack, anon_x_train_predictions, values=values, priors=priors)

# check accuracy
train_acc = np.sum(inferred_train_anon_wb2 == x_train_feature.reshape(1,-1)) / len(inferred_train_anon_wb2)
print(train_acc)

0.6955431217441637


The accuracy of the attacks remains more or less the same. This may be due to the fact that the prior distribution of the larger class has increased (from 0.67 to 0.85). Let's check the precision and recall for each case:

In [141]:
def calc_precision_recall(predicted, actual, positive_value=1):
    score = 0  # both predicted and actual are positive
    num_positive_predicted = 0  # predicted positive
    num_positive_actual = 0  # actual positive
    for i in range(len(predicted)):
        if predicted[i] == positive_value:
            num_positive_predicted += 1
        if actual[i] == positive_value:
            num_positive_actual += 1
        if predicted[i] == actual[i]:
            if predicted[i] == positive_value:
                score += 1
    
    if num_positive_predicted == 0:
        precision = 1
    else:
        precision = score / num_positive_predicted  # the fraction of predicted “Yes” responses that are correct
    if num_positive_actual == 0:
        recall = 1
    else:
        recall = score / num_positive_actual  # the fraction of “Yes” responses that are predicted correctly

    return precision, recall
    
# black-box regular
print(calc_precision_recall(inferred_train_bb, x_train_feature, positive_value=1.420169037))
# black-box anonymized
print(calc_precision_recall(inferred_train_anon_bb, x_train_feature, positive_value=1.420169037))

(0.4103703703703704, 0.3224679860302678)
(0.41613418530351437, 0.3032596041909197)


In [110]:
# white-box 1 regular
print(calc_precision_recall(inferred_train_wb1, x_train_feature, positive_value=1.420169037))
# white-box 1 anonymized
print(calc_precision_recall(inferred_train_anon_wb1, x_train_feature, positive_value=1.420169037))

# white-box 2 regular
print(calc_precision_recall(inferred_train_wb2, x_train_feature, positive_value=1.420169037))
# white-box 2 anonymized
print(calc_precision_recall(inferred_train_anon_wb2, x_train_feature, positive_value=1.420169037))

(0.3402948402948403, 0.1612339930151339)
(0.3426651735722284, 0.1781140861466822)
(0.5880551301684533, 0.22351571594877764)
(0.612540192926045, 0.2217694994179278)


Precision and recall remain almost the same, sometimes even slightly increasing.

Now let's see what happens when we increase k to 1000.

## k=1000

Now we apply the attacks on an anonymized version of the same dataset (k=1000). The data has been anonymized on the quasi-identifiers: form, housing, finance, social, health.

In [6]:
anon2_df = pd.read_csv('Nursery_anonymized1000_prepared_train.csv', sep=',', engine='python')

anon2_df

Unnamed: 0,label,children,social,parents_pretentious,parents_great_pret,parents_usual,has_nurs_very_crit,has_nurs_improper,has_nurs_proper,has_nurs_critical,...,form_completed.1,form_complete,housing_convenient,housing_less_conv,housing_critical,finance_inconv,finance_convenient,health_priority,health_recommended,health_not_recom
0,4,0.444450,0,1.434509,-0.713050,-0.711204,2.000724,-0.499216,-0.500723,-0.505541,...,-0.708745,-0.698019,1.399405,-0.708745,-0.698019,1.399405,-1.399405,1.399405,-0.708745,-0.698019
1,4,0.444450,0,-0.697102,1.402427,-0.711204,-0.499819,2.003141,-0.500723,-0.505541,...,-0.708745,-0.698019,1.399405,-0.708745,-0.698019,1.399405,-1.399405,1.399405,-0.708745,-0.698019
2,3,1.335242,0,1.434509,-0.713050,-0.711204,-0.499819,-0.499216,1.997111,-0.505541,...,1.410946,-0.698019,-0.714590,1.410946,-0.698019,-0.714590,0.714590,-0.714590,1.410946,-0.698019
3,3,0.444450,0,-0.697102,-0.713050,1.406067,-0.499819,-0.499216,-0.500723,1.978079,...,-0.708745,-0.698019,1.399405,-0.708745,-0.698019,1.399405,-1.399405,1.399405,-0.708745,-0.698019
4,3,0.444450,0,-0.697102,-0.713050,1.406067,-0.499819,-0.499216,1.997111,-0.505541,...,-0.708745,-0.698019,1.399405,-0.708745,-0.698019,1.399405,-1.399405,1.399405,-0.708745,-0.698019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5178,0,-1.337132,0,1.434509,-0.713050,-0.711204,-0.499819,-0.499216,-0.500723,1.978079,...,-0.708745,1.432625,-0.714590,-0.708745,1.432625,-0.714590,0.714590,-0.714590,-0.708745,1.432625
5179,4,-1.337132,0,-0.697102,1.402427,-0.711204,-0.499819,2.003141,-0.500723,-0.505541,...,1.410946,-0.698019,-0.714590,1.410946,-0.698019,-0.714590,0.714590,-0.714590,1.410946,-0.698019
5180,3,-0.446341,0,-0.697102,-0.713050,1.406067,-0.499819,2.003141,-0.500723,-0.505541,...,1.410946,-0.698019,-0.714590,1.410946,-0.698019,-0.714590,0.714590,-0.714590,1.410946,-0.698019
5181,4,-0.446341,0,-0.697102,-0.713050,1.406067,2.000724,-0.499216,-0.500723,-0.505541,...,-0.708745,-0.698019,1.399405,-0.708745,-0.698019,1.399405,-1.399405,1.399405,-0.708745,-0.698019


In [96]:
# number of distinct rows in anonymized data
len(anon2_df.drop_duplicates().index)

261

## Train decision tree model

In [7]:
anon2_features = anon2_df.drop(['label'], axis=1)
anon2_labels = anon2_df.loc[:, 'label']
anon2_model = DecisionTreeClassifier()
anon2_model.fit(anon2_features, anon2_labels)

anon2_art_classifier = ScikitlearnDecisionTreeClassifier(anon2_model)

print('Anonymized model accuracy: ', anon2_model.score(features_test, labels_test))

Anonymized model accuracy:  0.8742283950617284


## Attack
### Black-box attack

In [142]:
anon2_bb_attack = AttributeInferenceBlackBox(anon2_art_classifier, attack_feature=attack_feature)

# get original model's predictions
anon2_x_train_predictions = np.array([np.argmax(arr) for arr in anon2_art_classifier.predict(data)]).reshape(-1,1)

# train attack model
anon2_bb_attack.fit(test_data)

# get inferred values
inferred_train_anon2_bb = anon2_bb_attack.infer(x_train_for_attack, anon2_x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_anon2_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_anon2_bb)
print(train_acc)

0.585568203743006


### White box attacks

In [146]:
anon2_wb_attack = AttributeInferenceWhiteBoxLifestyleDecisionTree(anon2_art_classifier, attack_feature=attack_feature)

# get inferred values
inferred_train_anon2_wb1 = anon2_wb_attack.infer(x_train_for_attack, anon2_x_train_predictions, values=values, priors=priors)

# check accuracy
train_acc = np.sum(inferred_train_anon2_wb1 == x_train_feature.reshape(1,-1)) / len(inferred_train_anon2_wb1)
print(train_acc)

0.6685317383754582


In [114]:
anon2_wb2_attack = AttributeInferenceWhiteBoxDecisionTree(anon2_art_classifier, attack_feature=attack_feature)

# get inferred values
inferred_train_anon2_wb2 = anon2_wb2_attack.infer(x_train_for_attack, anon2_x_train_predictions, values=values, priors=priors)

# check accuracy
train_acc = np.sum(inferred_train_anon2_wb2 == x_train_feature.reshape(1,-1)) / len(inferred_train_anon_wb2)
print(train_acc)

0.6685317383754582


In some cases, the accuracy of the attacks has slightly increased. Let's check the precision and recall for each case:

In [143]:
# black-box regular
print(calc_precision_recall(inferred_train_bb, x_train_feature, positive_value=1.420169037))
# black-box anonymized
print(calc_precision_recall(inferred_train_anon2_bb, x_train_feature, positive_value=1.420169037))

# white-box 1 regular
print(calc_precision_recall(inferred_train_wb1, x_train_feature, positive_value=1.420169037))
# white-box 1 anonymized
print(calc_precision_recall(inferred_train_anon2_wb1, x_train_feature, positive_value=1.420169037))

# white-box 2 regular
print(calc_precision_recall(inferred_train_wb2, x_train_feature, positive_value=1.420169037))
# white-box 2 anonymized
print(calc_precision_recall(inferred_train_anon2_wb2, x_train_feature, positive_value=1.420169037))

(0.4103703703703704, 0.3224679860302678)
(0.3348694316436252, 0.25378346915017463)
(0.3402948402948403, 0.1612339930151339)
(1, 0.0)
(0.5880551301684533, 0.22351571594877764)
(1, 0.0)


Precision and recall decreased in all cases.

*In the anonymized version of the white-box attacks, no records were predicted with the positive value for the attacked feature.

## k=100, all QI
Now let's see what happens if we define all 8 features in the Nursery dataset as quasi-identifiers.

In [8]:
anon3_df = pd.read_csv('Nursery_anonymized_allQI_prepared_train.csv', sep=',', engine='python')

anon3_df

Unnamed: 0,label,children,social,parents_pretentious,parents_great_pret,parents_usual,has_nurs_very_crit,has_nurs_improper,has_nurs_less_proper,has_nurs_critical,...,form_foster,form_incomplete,housing_critical,housing_less_conv,housing_convenient,finance_convenient,finance_inconv,health_priority,health_recommended,health_not_recom
0,4,2.209469,0.0,2.463928,-1.184900,-0.615189,1.074753,-0.388929,-0.562272,-0.396783,...,-0.479224,-0.440726,1.059043,-0.767083,-0.433525,0.618467,-0.618467,1.389210,-0.703529,-0.698019
1,4,0.535492,0.0,-0.405856,0.843953,-0.615189,-0.930447,2.571161,-0.562272,-0.396783,...,-0.479224,-0.440726,-0.944249,1.303641,-0.433525,-1.616900,1.616900,1.389210,-0.703529,-0.698019
2,3,0.535492,0.0,2.463928,-1.184900,-0.615189,-0.930447,-0.388929,1.778497,-0.396783,...,-0.479224,-0.440726,-0.944249,1.303641,-0.433525,-1.616900,1.616900,-0.719834,1.421406,-0.698019
3,3,-1.138485,0.0,-0.405856,-1.184900,1.625517,-0.930447,-0.388929,-0.562272,2.520272,...,2.086705,-0.440726,-0.944249,-0.767083,2.306671,0.618467,-0.618467,-0.719834,1.421406,-0.698019
4,3,-1.138485,0.0,-0.405856,-1.184900,1.625517,-0.930447,-0.388929,1.778497,-0.396783,...,-0.479224,-0.440726,-0.944249,1.303641,-0.433525,0.618467,-0.618467,1.389210,-0.703529,-0.698019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5178,0,0.535492,0.0,-0.405856,0.843953,-0.615189,1.074753,-0.388929,-0.562272,-0.396783,...,-0.479224,-0.440726,1.059043,-0.767083,-0.433525,0.618467,-0.618467,-0.719834,-0.703529,1.432625
5179,4,0.535492,0.0,-0.405856,0.843953,-0.615189,-0.930447,2.571161,-0.562272,-0.396783,...,-0.479224,-0.440726,-0.944249,1.303641,-0.433525,-1.616900,1.616900,1.389210,-0.703529,-0.698019
5180,3,-1.138485,0.0,-0.405856,-1.184900,1.625517,-0.930447,2.571161,-0.562272,-0.396783,...,-0.479224,2.268982,-0.944249,1.303641,-0.433525,0.618467,-0.618467,1.389210,-0.703529,-0.698019
5181,4,-1.138485,0.0,-0.405856,-1.184900,1.625517,1.074753,-0.388929,-0.562272,-0.396783,...,2.086705,-0.440726,1.059043,-0.767083,-0.433525,0.618467,-0.618467,1.389210,-0.703529,-0.698019


In [105]:
# number of distinct rows in anonymized data
len(anon3_df.drop_duplicates().index)

30

In [16]:
# train model
anon3_features = anon3_df.drop(['label'], axis=1)
anon3_labels = anon3_df.loc[:, 'label']
anon3_model = DecisionTreeClassifier()
anon3_model.fit(anon3_features, anon3_labels)

anon3_art_classifier = ScikitlearnDecisionTreeClassifier(anon3_model)

print('Anonymized model accuracy: ', anon3_model.score(features_test, labels_test))

# Black-box attack

anon3_bb_attack = AttributeInferenceBlackBox(anon3_art_classifier, attack_feature=attack_feature)

# get original model's predictions
anon3_x_train_predictions = np.array([np.argmax(arr) for arr in anon3_art_classifier.predict(data)]).reshape(-1,1)

# train attack model
anon3_bb_attack.fit(test_data)

# get inferred values
inferred_train_anon3_bb = anon3_bb_attack.infer(x_train_for_attack, anon3_x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_anon3_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_anon3_bb)
print(train_acc)

# White box attacks

anon3_wb_attack = AttributeInferenceWhiteBoxLifestyleDecisionTree(anon3_art_classifier, attack_feature=attack_feature)

# get inferred values
inferred_train_anon3_wb1 = anon3_wb_attack.infer(x_train_for_attack, anon3_x_train_predictions, values=values, priors=priors)

# check accuracy
train_acc = np.sum(inferred_train_anon3_wb1 == x_train_feature.reshape(1,-1)) / len(inferred_train_anon3_wb1)
print(train_acc)

anon3_wb2_attack = AttributeInferenceWhiteBoxDecisionTree(anon3_art_classifier, attack_feature=attack_feature)

# get inferred values
inferred_train_anon3_wb2 = anon3_wb2_attack.infer(x_train_for_attack, anon3_x_train_predictions, values=values, priors=priors)

# check accuracy
train_acc = np.sum(inferred_train_anon3_wb2 == x_train_feature.reshape(1,-1)) / len(inferred_train_anon3_wb2)
print(train_acc)

Anonymized model accuracy:  0.7445987654320988
0.5653096662164769
0.6685317383754582
0.6685317383754582


In [145]:
# black-box anonymized
print(calc_precision_recall(inferred_train_anon3_bb, x_train_feature, positive_value=1.420169037))

# white-box 1 anonymized
print(calc_precision_recall(inferred_train_anon3_wb1, x_train_feature, positive_value=1.420169037))

# white-box 2 anonymized
print(calc_precision_recall(inferred_train_anon3_wb2, x_train_feature, positive_value=1.420169037))

(0.317115551694179, 0.21245634458672877)
(1, 0.0)
(1, 0.0)


Precision and recall decreased in all cases. Accuracy of two of the attacks also decreased.