# Running attribute inference attacks on regular and differentially private models created using DiffPrivLib

In this tutorial we will show how to run attribute inference attacks on models trained using differential privacy. This will be demonstarted on the Nursery dataset (original dataset can be found here: https://archive.ics.uci.edu/ml/datasets/nursery), with a Naive Bayes classifier trained using the IBM Differential Privacy Library (https://github.com/IBM/differential-privacy-library).

The sensitive feature we are trying to infer is the 'social' feature, after we have turned it into a binary feature (the original value 'problematic' receives the new value 1 and the rest 0).

To run this example you must first install DiffPrivLib by running: pip install diffprivlib

## Load data

In [1]:
import pandas as pd

df = pd.read_csv('Nursery_social_prepared_train.csv', sep=',', engine='python')

df

Unnamed: 0,label,children,social,parents_pretentious,parents_great_pret,parents_usual,has_nurs_very_crit,has_nurs_improper,has_nurs_proper,has_nurs_critical,...,form_incomplete,form_foster,housing_critical,housing_convenient,housing_less_conv,finance_convenient,finance_inconv,health_priority,health_recommended,health_not_recom
0,1,0.444450,-0.704142,1.434509,-0.713050,-0.711204,2.000724,-0.499216,-0.500723,-0.505541,...,-0.567324,-0.577425,1.428869,-0.711511,-0.709974,0.985252,-0.985252,1.399405,-0.708745,-0.698019
1,1,0.444450,-0.704142,-0.697102,1.402427,-0.711204,-0.499819,2.003141,-0.500723,-0.505541,...,-0.567324,-0.577425,-0.699854,1.405459,-0.709974,-1.014968,1.014968,1.399405,-0.708745,-0.698019
2,3,1.335242,1.420169,1.434509,-0.713050,-0.711204,-0.499819,-0.499216,1.997111,-0.505541,...,-0.567324,-0.577425,-0.699854,-0.711511,1.408503,-1.014968,1.014968,-0.714590,1.410946,-0.698019
3,3,0.444450,-0.704142,-0.697102,-0.713050,1.406067,-0.499819,-0.499216,-0.500723,1.978079,...,1.762661,-0.577425,-0.699854,1.405459,-0.709974,0.985252,-0.985252,1.399405,-0.708745,-0.698019
4,3,0.444450,-0.704142,-0.697102,-0.713050,1.406067,-0.499819,-0.499216,1.997111,-0.505541,...,-0.567324,-0.577425,1.428869,-0.711511,-0.709974,0.985252,-0.985252,1.399405,-0.708745,-0.698019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5178,0,-1.337132,-0.704142,1.434509,-0.713050,-0.711204,-0.499819,-0.499216,-0.500723,1.978079,...,-0.567324,-0.577425,-0.699854,-0.711511,1.408503,-1.014968,1.014968,-0.714590,-0.708745,1.432625
5179,1,-1.337132,1.420169,-0.697102,1.402427,-0.711204,-0.499819,2.003141,-0.500723,-0.505541,...,1.762661,-0.577425,-0.699854,1.405459,-0.709974,-1.014968,1.014968,-0.714590,1.410946,-0.698019
5180,3,-0.446341,1.420169,-0.697102,-0.713050,1.406067,-0.499819,2.003141,-0.500723,-0.505541,...,1.762661,-0.577425,-0.699854,1.405459,-0.709974,-1.014968,1.014968,-0.714590,1.410946,-0.698019
5181,1,-0.446341,-0.704142,-0.697102,-0.713050,1.406067,2.000724,-0.499216,-0.500723,-0.505541,...,-0.567324,-0.577425,1.428869,-0.711511,-0.709974,0.985252,-0.985252,1.399405,-0.708745,-0.698019


## Train Naive Bayes model

In [9]:
from sklearn.naive_bayes import GaussianNB

import os
import sys
sys.path.insert(0, os.path.abspath('..'))
from art.estimators.classification.scikitlearn import ScikitlearnClassifier

features = df.drop(['label'], axis=1)
labels = df.loc[:, 'label']

model = GaussianNB()
model.fit(features, labels)

art_classifier = ScikitlearnClassifier(model)

df_test = pd.read_csv('Nursery_prepared_test.csv', sep=',', engine='python')
features_test = df_test.drop(['label'], axis=1)
labels_test = df_test.loc[:, 'label']
test_data = features_test.to_numpy()

print('Base model accuracy: ', model.score(features_test, labels_test))

Base model accuracy:  0.5655864197530864


## Attack
The black-box attack basically trains an additional classifier (called the attack model) to predict the attacked feature's value from the remaining n-1 features as well as the original (attacked) model's predictions.
#### Train attack model

In [3]:
import numpy as np
from art.attacks.inference import AttributeInferenceBlackBox

attack_feature = 1
data = features.to_numpy()

# training data without attacked feature
x_train_for_attack = np.delete(data, attack_feature, 1)
# only attacked feature
x_train_feature = data[:, attack_feature].copy().reshape(-1, 1)

bb_attack = AttributeInferenceBlackBox(art_classifier, attack_feature=attack_feature)

# get original model's predictions
x_train_predictions = np.array([np.argmax(arr) for arr in art_classifier.predict(data)]).reshape(-1,1)

# train attack model
bb_attack.fit(test_data)

#### Infer sensitive feature and check accuracy

In [4]:
# get inferred values
values = [-0.704141531, 1.420169037]
inferred_train_bb = bb_attack.infer(x_train_for_attack, x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_bb)
print(train_acc)

0.6326451861856068


This means that for 63% of the training set, the attacked feature is inferred correctly using this attack.

# Differential privacy


## Train Naive Bayes model

In [5]:
import diffprivlib.models as dp

priv_model = dp.GaussianNB()
priv_model.fit(features, labels)

priv_art_classifier = ScikitlearnClassifier(priv_model)

print('Differentially private model accuracy: ', priv_model.score(features_test, labels_test))

Differentially private model accuracy:  0.6087962962962963




## Attack
### Black-box attack

In [6]:
priv_bb_attack = AttributeInferenceBlackBox(priv_art_classifier, attack_feature=attack_feature)

# get original model's predictions
priv_x_train_predictions = np.array([np.argmax(arr) for arr in priv_art_classifier.predict(data)]).reshape(-1,1)

# train attack model
priv_bb_attack.fit(test_data)

# get inferred values
inferred_train_priv_bb = priv_bb_attack.infer(x_train_for_attack, priv_x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_priv_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_priv_bb)
print('Accuracy of attack on training data:', train_acc)

Accuracy of attack on training data: 0.6156666023538491


Let's check the precision and recall for each case:

In [7]:
def calc_precision_recall(predicted, actual, positive_value=1):
    score = 0  # both predicted and actual are positive
    num_positive_predicted = 0  # predicted positive
    num_positive_actual = 0  # actual positive
    for i in range(len(predicted)):
        if predicted[i] == positive_value:
            num_positive_predicted += 1
        if actual[i] == positive_value:
            num_positive_actual += 1
        if predicted[i] == actual[i]:
            if predicted[i] == positive_value:
                score += 1
    
    if num_positive_predicted == 0:
        precision = 1
    else:
        precision = score / num_positive_predicted  # the fraction of predicted “Yes” responses that are correct
    if num_positive_actual == 0:
        recall = 1
    else:
        recall = score / num_positive_actual  # the fraction of “Yes” responses that are predicted correctly

    return precision, recall
    
# black-box regular
print(calc_precision_recall(inferred_train_bb, x_train_feature, positive_value=1.420169037))
# black-box differential privacy
print(calc_precision_recall(inferred_train_priv_bb, x_train_feature, positive_value=1.420169037))

(0.43413597733711046, 0.35681024447031434)
(0.3983679525222552, 0.31257275902211873)


Precision and recall remain almost the same, sometimes even slightly increasing.

Now let's try with a lower epsilon value (which increases privacy).

In [14]:
priv2_model = dp.GaussianNB(epsilon=0.1)
priv2_model.fit(features, labels)

priv2_art_classifier = ScikitlearnClassifier(priv2_model)

print('Differentially private model accuracy: ', priv2_model.score(features_test, labels_test))

priv2_bb_attack = AttributeInferenceBlackBox(priv2_art_classifier, attack_feature=attack_feature)

# get original model's predictions
priv2_x_train_predictions = np.array([np.argmax(arr) for arr in priv2_art_classifier.predict(data)]).reshape(-1,1)

# train attack model
priv2_bb_attack.fit(test_data)

# get inferred values
inferred_train_priv2_bb = priv2_bb_attack.infer(x_train_for_attack, priv2_x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_priv2_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_priv2_bb)
print('Accuracy of attack on training data:', train_acc)

print('Precision and recall for differentially private model:', calc_precision_recall(inferred_train_priv2_bb, x_train_feature, positive_value=1.420169037))



Differentially private model accuracy:  0.4417438271604938
Accuracy of attack on training data: 0.589041095890411
Precision and recall for differentially private model: (0.33650793650793653, 0.2467986030267753)


Accuracy, precision and recall decreased.

*You can play with the epsilon value to see how it affects model accuracy and atack accuracy.

**Due to the randomness introduced by differential privacy, each run may yield slightly different results even with the same parameters.