## Label-only Membership Inference:

#### Intuition
The basic intuition behind the attack is that samples used in training will be farther away from the decision boundary than non-training data.

The attacker uses various means of perturbations to measure the amount of noise that is needed to "change the classifier's mind" about their prediction for a given sample. Since the ML model is more confident on training data, the attacker will need to perturb the input more to force the model to misclassify. Thus, the amount of perturbation needed will be analogue to the sample's distance from the decision boundary. Both of the below listed attacks use an adversarial perturbation technique called HopSkipJump.

#### Learning
Given some estimate of a sample's distance from the model's decision boundary, the attacker compares it to a threshold
. Any distance greater than
 will cause the sample to be classified as a training sample.

There are two ways to learn the distance threshold
:

with data (Choquette et al., https://arxiv.org/abs/2007.14321)
without data (Li et al., https://arxiv.org/abs/2007.15528)

#### With data
In this scenario, the attacker needs to know about a subset of the data if it had been used in training or not. It uses this data to calculate their distances to the decision boundary, and sets
 such that it maximizes membership inference accuracy. Misclassified samples will be regarded as non-training samples.

#### Without data
Here the attacker generates random data, and uses the same perturbation techniques as before to measure their distance from the decision threshold. In the end, the attacker chooses a suitable top t percentile over these distances to calibrate
.

#### Overview
How to implement the attack using ART.

1. Preliminaries
Load data and attacked model
Wrap model in ART classifier wrapper

2. Attack (w/ data, w/o data)
Instantiate attack (w/ data, w/o data)
Calibrate distance threshold
 (w/ data, w/o data)
3. Infer membership on evaluation data (w/ data, w/o data)

In [1]:
!pip install adversarial-robustness-toolbox



In [2]:
import torch
from torch import nn
import numpy as np

#### Preliminaries:

In [3]:
from art.utils import load_mnist

# data
(x_train, y_train), (x_test, y_test), _min, _max = load_mnist(raw=True)

x_train = np.expand_dims(x_train, axis=1).astype(np.float32)
x_test = np.expand_dims(x_test, axis=1).astype(np.float32)

In [4]:
# model
model = nn.Sequential(
    nn.Conv2d(1, 16, 4, stride=2, padding=1),
    nn.ReLU(),
    nn.Conv2d(16, 32, 4, stride=2, padding=1),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(32*7*7,100),
    nn.ReLU(),
    nn.Linear(100, 10)
)

#### Wrap model in PyTorchClassifier:

In [5]:
import torch.optim as optim
from art.estimators.classification.pytorch import PyTorchClassifier

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

art_model = PyTorchClassifier(model=model, loss=criterion, optimizer=optimizer, channels_first=True, input_shape=(1,28,28,), nb_classes=10, clip_values=(_min,_max))
art_model.fit(x_train, y_train, nb_epochs=10, batch_size=128)

pred = np.array([np.argmax(arr) for arr in art_model.predict(x_test)])

print('Base model accuracy: ', np.sum(pred == y_test) / len(y_test))

Base model accuracy:  0.985


In [6]:
art_model

art.estimators.classification.pytorch.PyTorchClassifier(model=ModelWrapper(
  (_model): Sequential(
    (0): Conv2d(1, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(16, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (3): ReLU()
    (4): Flatten(start_dim=1, end_dim=-1)
    (5): Linear(in_features=1568, out_features=100, bias=True)
    (6): ReLU()
    (7): Linear(in_features=100, out_features=10, bias=True)
  )
), loss=CrossEntropyLoss(), optimizer=Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
), input_shape=(1, 28, 28), nb_classes=10, channels_first=True, clip_values=array([  0., 255.], dtype=float32), preprocessing_defences=None, postprocessing_defences=None, preprocessing=StandardisationMeanStdPyTorch(mean=0.0, std=1.0, apply_fit=True, apply_predict=True, device

#### Save the Model:

In [7]:
# save model and its architecture
art_model.save('/content/drive/MyDrive/AI SECURITY/AI Red Teaming/CODE/Model/pytorchclassifier.pt')

## Attack (supervised, with data):

#### Instantiate attack:

In [8]:
from art.attacks.inference.membership_inference import LabelOnlyDecisionBoundary

mia_label_only = LabelOnlyDecisionBoundary(art_model)

#### Calibrate distance threshold:

In [9]:
# number of samples used to calibrate distance threshold
attack_train_size = 1500
attack_test_size = 1500

x = np.concatenate([x_train, x_test])
y = np.concatenate([y_train, y_test])
training_sample = np.array([1] * len(x_train) + [0] * len(x_test))

In [13]:
mia_label_only.calibrate_distance_threshold(x_train[:attack_train_size], y_train[:attack_train_size],
                                            x_test[:attack_test_size], y_test[:attack_test_size])

HopSkipJump:   0%|          | 0/1500 [00:00<?, ?it/s]

RuntimeError: Input type (double) and bias type (float) should be the same