# Running ART attribute inference attacks on regular and differentially private models created using DiffPrivLib

In this tutorial we will show how to run attribute inference attacks on models trained using differential privacy. This will be demonstrated on the iris dataset (https://archive.ics.uci.edu/ml/datasets/iris), with a Naive Bayes classifier trained using the IBM Differential Privacy Library (https://github.com/IBM/differential-privacy-library).

## Load data

In [19]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

dataset = datasets.load_iris()
x_train_iris, x_test_iris, y_train_iris, y_test_iris = train_test_split(dataset.data, dataset.target, test_size=0.2)

attack_feature = 2  # petal length

# need to transform attacked feature into categorical
def transform_feature(x):
    x[x <= 2] = 0.0
    x[(x > 2) & (x <= 5)] = 1.0
    x[x > 5] = 2.0

values = [0.0, 1.0, 2.0]

# training data without attacked feature
x_train_for_attack = np.delete(x_train_iris, attack_feature, 1)
# only attacked feature
x_train_feature = x_train_iris[:, attack_feature].copy().reshape(-1, 1)
transform_feature(x_train_feature)
# training data with attacked feature (after transformation)
x_train = np.concatenate((x_train_for_attack[:, :attack_feature], x_train_feature), axis=1)
x_train = np.concatenate((x_train, x_train_for_attack[:, attack_feature:]), axis=1)

## Train Naive Bayes model

In [20]:
from sklearn.naive_bayes import GaussianNB

import os
import sys
sys.path.insert(0, os.path.abspath('..'))
from art.estimators.classification.scikitlearn import ScikitlearnClassifier

model = GaussianNB()
model.fit(x_train, y_train_iris)

art_classifier = ScikitlearnClassifier(model)

## Attack
### Black-box attack
The black-box attack basically trains an additional classifier (called the attack model) to predict the attacked feature's value from the remaining n-1 features as well as the original (attacked) model's predictions.
#### Train attack model

In [21]:
import numpy as np
from art.attacks.inference import AttributeInferenceBlackBox

bb_attack = AttributeInferenceBlackBox(art_classifier, attack_feature=attack_feature)

# get original model's predictions
x_train_predictions = np.array([np.argmax(arr) for arr in art_classifier.predict(x_train)]).reshape(-1,1)

# train attack model
bb_attack.fit(x_train)

#### Infer sensitive feature and check accuracy

In [22]:
# get inferred values
inferred_train_bb = bb_attack.infer(x_train_for_attack, x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_bb)
print('Accuracy of attack on training data:', train_acc)

Accuracy of attack on training data: 0.9916666666666667


This means that for 99% of the test set, the attacked feature is inferred correctly using this attack.

# Differentially private model

Now we will apply the same attacks on a differentially private model trained using the diffprivlib library, with the default parameters (epsilon=1).

## Train model

In [23]:
import diffprivlib.models as dp

priv_model = dp.GaussianNB()
priv_model.fit(x_train, y_train_iris)

priv_art_classifier = ScikitlearnClassifier(priv_model)



## Attack
### Black-box attack

In [24]:
priv_bb_attack = AttributeInferenceBlackBox(priv_art_classifier, attack_feature=attack_feature)

# get original model's predictions
priv_x_train_predictions = np.array([np.argmax(arr) for arr in priv_art_classifier.predict(x_train)]).reshape(-1,1)

# train attack model
priv_bb_attack.fit(x_train)

# get inferred values
inferred_train_priv_bb = priv_bb_attack.infer(x_train_for_attack, priv_x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_priv_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_priv_bb)
print('Accuracy of attack on training data:', train_acc)

Accuracy of attack on training data: 0.9333333333333333


Accuracy has decreased slightly. Now let's check the precision and recall (we use the value of 2.0 as the positive value in this case):

In [25]:
def calc_precision_recall(predicted, actual, positive_value=1):
    score = 0  # both predicted and actual are positive
    num_positive_predicted = 0  # predicted positive
    num_positive_actual = 0  # actual positive
    for i in range(len(predicted)):
        if predicted[i] == positive_value:
            num_positive_predicted += 1
        if actual[i] == positive_value:
            num_positive_actual += 1
        if predicted[i] == actual[i]:
            if predicted[i] == positive_value:
                score += 1
    
    if num_positive_predicted == 0:
        precision = 1
    else:
        precision = score / num_positive_predicted  # the fraction of predicted “Yes” responses that are correct
    if num_positive_actual == 0:
        recall = 1
    else:
        recall = score / num_positive_actual  # the fraction of “Yes” responses that are predicted correctly

    return precision, recall
    
# black-box regular
print('Precision and recall for regular model:', calc_precision_recall(inferred_train_bb, x_train_feature, positive_value=2))
# black-box differential privacy
print('Precision and recall for differentially private model:', calc_precision_recall(inferred_train_priv_bb, x_train_feature, positive_value=2))

Precision and recall for regular model: (0.9705882352941176, 1.0)
Precision and recall for differentially private model: (0.8787878787878788, 0.8787878787878788)


Precision and recall decrease slightly.

Now let's try with a lower epsilon value (which increases privacy) and specify the bounds for each feature.

In [26]:
bounds = [(4.3, 7.9), (2.0, 4.4), (0.0, 2.0), (0.1, 2.5)]

priv2_model = dp.GaussianNB(epsilon=0.01, bounds=bounds)
priv2_model.fit(x_train, y_train_iris)

priv2_art_classifier = ScikitlearnClassifier(priv2_model)

priv2_bb_attack = AttributeInferenceBlackBox(priv2_art_classifier, attack_feature=attack_feature)

# get original model's predictions
priv2_x_train_predictions = np.array([np.argmax(arr) for arr in priv2_art_classifier.predict(x_train)]).reshape(-1,1)

# train attack model
priv2_bb_attack.fit(x_train)

# get inferred values
inferred_train_priv2_bb = priv2_bb_attack.infer(x_train_for_attack, priv2_x_train_predictions, values=values)
# check accuracy
train_acc = np.sum(inferred_train_priv2_bb == x_train_feature.reshape(1,-1)) / len(inferred_train_priv2_bb)
print('Accuracy of attack on training data:', train_acc)

print('Precision and recall for differentially private model:', calc_precision_recall(inferred_train_priv2_bb, x_train_feature, positive_value=2))

Accuracy of attack on training data: 0.925
Precision and recall for differentially private model: (0.8787878787878788, 0.8787878787878788)


Accuracy, precision and recall are further decreased.