# Running database reconstruction attacks on the Iris dataset

In this tutorial we will show how to run a database reconstruction attack on the Iris dataset and evaluate its effectiveness against models trained non-privately (i.e., naively with scikit-learn) and models trained with differential privacy guarantees.

## Preliminaries

The database reconstruction attack takes a trained machine learning model `model`, which has been trained by a training dataset of `n` examples.  Then, using `n-1` examples of the training dataset (i.e., with the target row removed), we seek to reconstruct the `n`th example of the dataset by using `model`.

In this example, we train a Gaussian Naive Bayes classifier (`model`) with the training dataset, then remove a single row from that dataset, and seek to reconstruct that row using `model`. For typical examples, this attack is successful up to machine precision.

We then show that launching the same attack on a ML model trained with differential privacy guarantees provides protection for the training dataset, and prevents learning the target row with precision.

## Example usage

## Load data

First, we load the data of interest and split into train/test subsets. 

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

dataset = datasets.load_iris()

In [2]:
x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2)

## Train model

We can now train a Gaussian naive Bayes classifier using the full training dataset. This is the model that will be used to attack the training dataset later.

In [3]:
import sklearn.naive_bayes as naive_bayes
from art.estimators.classification.scikitlearn import ScikitlearnGaussianNB

model1 = naive_bayes.GaussianNB().fit(x_train, y_train)
non_private_art = ScikitlearnGaussianNB(model1)

In [4]:
print("Model accuracy (on the test dataset): {}".format(model1.score(x_test, y_test)))

Model accuracy (on the test dataset): 0.9666666666666667


## Launch and evaluate attack

We now select a row from the training dataset that we will remove. This is the **target row** which the attack will seek to reconstruct. The attacker will have access to `x_public` and `y_public`.

In [5]:
target_row = int(np.random.random() * x_train.shape[0])

x_public = np.delete(x_train, target_row, axis=0)
y_public = np.delete(y_train, target_row, axis=0)

We can now launch the attack, and seek to infer the value of the target row. This is typically completed in less than a second.

In [6]:
from art.attacks.inference.reconstruction import DatabaseReconstruction

dbrecon = DatabaseReconstruction(non_private_art)

x, y = dbrecon.reconstruct(x_public, y_public)

We can evaluate the accuracy of the attack using root-mean-square error (RMSE), showing a high level of accuracy in the inferred value.

In [7]:
print("Inference RMSE: {}".format(
    np.sqrt(((x_train[target_row] - x) ** 2).sum() / x_train.shape[1])))

Inference RMSE: 5.789723287688911e-08


We can confirm that the attack also inferred the correct label `y`.

In [8]:
np.argmax(y) == y_train[target_row]

True

# Attacking a model trained with differential privacy

We can mitigate against this attack by training the public ML model with differential privacy.  We will use [diffprivlib](https://github.com/Trusted-AI/differential-privacy-library) to train a differentially private Gaussian naive Bayes classifier. We can mitigate against any loss in accuracy of the model by choosing an `epsilon` value appropriate to our needs.

## Train the model

In [9]:
from diffprivlib import models

model2 = models.GaussianNB(bounds=([4.3, 2.0, 1.1, 0.1], [7.9, 4.4, 6.9, 2.5]), epsilon=3).fit(x_train, y_train)
private_art = ScikitlearnGaussianNB(model2)

model2.score(x_test, y_test)

0.7

## Launch and evaluate attack

We then launch the same attack as before. In this case, the attack may take a number of seconds to return a result.

In [10]:
dbrecon = DatabaseReconstruction(private_art)

x_dp, y_dp = dbrecon.reconstruct(x_public, y_public)

In this case, the RMSE shows our attack has not been as successful

In [11]:
print("Inference RMSE (with differential privacy): {}".format(
    np.sqrt(((x_train[target_row] - x_dp) ** 2).sum() / x_train.shape[1])))

Inference RMSE (with differential privacy): 2.2594246979517965


This is confirmed by inspecting the inferred value and the true value.

In [12]:
x_dp, x_train[target_row]

(array([[4.80000094, 3.00000298, 1.39999864, 0.30000296]]),
 array([6.4, 2.7, 5.3, 1.9]))

In fact, the attack may not even be able to correctly infer the target label.

In [13]:
np.argmax(y_dp), y_train[target_row]

(0, 2)