# Label Only Membership Inference

### Attack Scenario:

- **Black Box** access to an overfitted classifier with no access to actual $D_{train}$
- Predict API returns **only labels instead of confidence vectors**
- We have some samples over the training data distribution, $D_{out}$, such that $D_{train} \cap D_{out} = \varnothing$


### Attack Target: 
- Use a shadow model to attack locally and extract membership leakage features
- Use data perturbations in order to exploit test/training data approximation relevancies to the classification boundaries.
- Train attack model based on this assumption and compare with original attack

Implemented based on [this paper](https://arxiv.org/abs/2007.14321).

In [1]:
import numpy as np
import matplotlib.pyplot as plt

import math
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import regularizers

# for image interpolation
import scipy.ndimage.interpolation as interpolation

from tqdm import tqdm
import sys
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))


Num GPUs Available:  1


## Target Model

### Model Architecture

In [19]:
D_TARGET_SIZE = 5000

In [20]:
def f_target(X_train, y_train, X_test=None, y_test=None, epochs=100):
  """
  Returns a trained target model, if test data are specified we will evaluate the model and print its accuracy
  """
  model = models.Sequential()
  model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
  model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
  model.add(layers.MaxPooling2D((2, 2)))
  model.add(layers.Conv2D(64, (3, 3), activation='relu'))
  model.add(layers.Conv2D(64, (3, 3), activation='relu'))
  model.add(layers.MaxPooling2D((2, 2)))

  model.add(layers.Flatten())
  model.add(layers.Dense(512, activation='relu'))

  model.add(layers.Dense(10))
  
  optimizer = keras.optimizers.Adam(learning_rate=0.001)
  model.compile(optimizer=optimizer,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
  if X_test is None or y_test is None:
    history = model.fit(X_train, y_train, epochs=epochs, 
                    validation_split=0.2)
  else:
    history = model.fit(X_train, y_train, epochs=epochs, 
                    validation_data=(X_test, y_test))
  return model

In [21]:
with tf.device('/gpu:0'):
  (train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
  # use the rest as testing - 'out' records
  attacker_labels = np.concatenate((train_labels[D_TARGET_SIZE:], test_labels))
  attacker_images = np.concatenate((train_images[D_TARGET_SIZE:], test_images))

  target_images = train_images[:D_TARGET_SIZE] # as the paper attack train wiht only 200 records
  target_labels = train_labels[:D_TARGET_SIZE]


In [22]:
with tf.device('/gpu:0'):
  train_images, eval_images, train_labels, eval_labels = train_test_split(target_images, target_labels, test_size=0.2, shuffle=True)
  target_model = f_target(train_images, train_labels, eval_images, eval_labels, epochs=50) 

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


### Target Model prediction API

In [68]:
# API of model to get predictions : returns labels only
def target_predict(X):
  prob = layers.Softmax()
  ret = prob(target_model.predict(X)).numpy()
  return np.apply_along_axis(np.argmax, 1, ret).reshape((-1, 1))

## Shadow Models

### Shadow Model Architecture

### Shadow Dataset Composition

In [146]:
N_SHADOWS = 5
D_SHADOW_SIZE = D_TARGET_SIZE

In [147]:
def f_shadow(X_train, y_train, X_test=None, y_test=None, epochs=25):
  model = models.Sequential()
  model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
  model.add(layers.MaxPooling2D((2, 2)))
  model.add(layers.Conv2D(64, (3, 3), activation='relu'))
  model.add(layers.MaxPooling2D((2, 2)))
  model.add(layers.Conv2D(128, (3, 3), activation='relu'))
  model.add(layers.MaxPooling2D((2, 2)))

  model.add(layers.Flatten())
  model.add(layers.Dense(128, activation='relu'))

  model.add(layers.Dense(10)   )
  
  optimizer = keras.optimizers.Adam()
  model.compile(optimizer=optimizer,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
  if X_test is None or y_test is None:
    history = model.fit(X_train, y_train, epochs=epochs, 
                    validation_split=0.2)
  else:
    history = model.fit(X_train, y_train, epochs=epochs, 
                    validation_data=(X_test, y_test))
  return model

In [148]:

def divide_dataset(n_shadows, shadow_dataset_size, X, y):
  D_shadows = []
  rng = np.random.default_rng()
  for i in range(n_shadows):
    sample_i = np.random.choice(range(X.shape[0]), shadow_dataset_size, replace=False)
    assert np.unique(sample_i).shape[0] == shadow_dataset_size # sanity check
    D_shadows.append((X[sample_i, :], y[sample_i, :]))
  return D_shadows

# returns a list of 'n_shadows' datasets
def generate_shadow_dataset(target_model, n_shadows, shadow_dataset_size, n_classes, attacker_X=None, attacker_y=None):
  # param target model is not used yet


  # in case we give test data we will just divide those to train the shadow models
  if attacker_X is not None and attacker_y is not None:
    return divide_dataset(n_shadows, shadow_dataset_size, attacker_X, attacker_y)
  else:
    raise ValueError("X and y provided are None.")


def create_shadows(D_shadows):
  shadow_models = [] # shadow model list

  for D_shadow in D_shadows:
    # sample data to feed/evaluate the model
    X_shadow, y_shadow = D_shadow
    shadow_X_train, shadow_X_test, shadow_y_train, shadow_y_test = train_test_split(X_shadow, y_shadow, shuffle=True, test_size=0.33)

    # generate the shadow model
    shadow_model = f_shadow(shadow_X_train, shadow_y_train, shadow_X_test, shadow_y_test)

    D_shadow = (shadow_X_train, shadow_y_train), (shadow_X_test, shadow_y_test)
    shadow_models.append((shadow_model, D_shadow))

  return shadow_models # return a list where every item is (model, acc), train-data, test-data

In [149]:
# generate shadow datasets
D_shadows = generate_shadow_dataset(target_model, N_SHADOWS, D_SHADOW_SIZE, 10, attacker_images, attacker_labels)

In [150]:
# train the shadow models
shadow_models = create_shadows(D_shadows)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoc

## Attack Model

### Attack Model Architecture
The attack model is consisted of 1 swallow layer of 10 neurons just as proposed in Shokri et al. and in the relative label only attack paper.

### Perturbed Queries for feature extraction and Attack Dataset

In order to construct the actual attack dataset we have 2 perturbation functions:
- Translate
- Rotate

that can apply the necessary augmentations in order to acquire the feature vector for a query.

This works by applying all augmentations to the input X and querying the target model in order to return a binary vector $x_{attack}$ where $$x_{attack_p} = 1 \; if \;y_p == y_{true} \; else \; 0, \forall p \in Perturbations(X)$$

where $y_p$ is the label for pertubation $p$ of input $X$.

In [164]:
r = 3 # rotate range => creating 2*r+1 rotations 
d = 1 # translate range =? creating 4*d + 1 translates

In [173]:
def __f_attack(X_train, y_train, X_test, y_test, epochs=50):
  print(X_train.shape, X_test.shape)
  model = models.Sequential()
  model.add(layers.Dense(10, input_shape=(X_train.shape[1],)))
  model.add(layers.LeakyReLU(alpha=0.3))
  model.add(layers.Dense(1, activation='sigmoid'))
  
  model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
  history = model.fit(X_train, y_train, epochs=epochs,
                    validation_data=(X_test, y_test), verbose=True)
  
  return model

def f_attack(X, y):
  # X_i = (class, probability vector, )
  classes = np.unique(train_labels) # all class labels
  with tf.device('/gpu:0'):
  # split to train and test datasets
    X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, test_size=0.3)
    attack_model = __f_attack(X_train, y_train, X_test, y_test)

  return attack_model

In [166]:
# create all relative rotates for interpolation (returns 2*r + 1 translates)
def create_rotates(r):
  if r is None:
    return None
  if r == 1:
    return [0.0]
  rotates = np.linspace(-r, r, (r * 2 + 1))
  return rotates

# create all possible translates (returns 4*d+1 translates)
def create_translates(d):
  if d is None:
    return None

  def all_shifts(mshift):
    if mshift == 0:
      return [(0, 0, 0, 0)]
    
    all_pairs = []
    start = (0, mshift, 0, 0)
    end = (0, mshift, 0, 0)
    vdir = -1
    hdir = -1
    first_time = True
    while (start[1] != end[1] or start[2] != end[2]) or first_time:
      all_pairs.append(start)
      start = (0, start[1] + vdir, start[2] + hdir, 0)
      if abs(start[1]) == mshift:
        vdir *= -1
      if abs(start[2]) == mshift:
        hdir *= -1
      first_time = False
    all_pairs = [(0, 0, 0, 0)] + all_pairs  # add no shift
    return all_pairs

  translates = all_shifts(d)
  return translates


def apply_augment(d, augment, type_):
  if type_ == 'd':
    d = interpolation.shift(d, augment, mode='constant')
  elif type_ == 'r':
    d = interpolation.rotate(d, augment, (1, 2), reshape=False)
  else:
    raise ValueError(f'Augmentation Type: \'{type_}\' doesn\'t exist. Try \'r\' or \'d\'')
  return d

# param model the model to query
# param X the input to perurb
# param y_pred is the predictions of the model for given input
def augmented_queries(model, X, y_pred):
  #create perturbations
  rotates = create_rotates(r)
  translates = create_translates(d)

  X_attack = None
  print(f"Applying {len(rotates)} Rotation")
  for rot in rotates:
    #  create perturbed image
    X_perturbed = apply_augment(X, rot, 'r')
    # return query line
    y_perturbed = target_predict(X_perturbed)
    X_attack_col = (y_pred == y_perturbed).astype(int) # transform the prediction column into a binary collumn where x_i = 1 when y_true == y_pred else 0
    
    if X_attack is None:
      X_attack = X_attack_col
    else:
      X_attack = np.concatenate((X_attack, X_attack_col), axis=1)
  print("OK")

  print(f"Applying {len(translates)} Translates")
  for tra in translates:
    X_perturbed = apply_augment(X, tra, 'd')
    # return query line
    y_perturbed = target_predict(X_perturbed)
    X_attack_col = (y_pred == y_perturbed).astype(int) # transform the prediction column into a binary collumn where x_i = 1 when y_true == y_pred else 0
    # concate the col to the rest of x_attack feature vector
    if X_attack is None:
      X_attack = X_attack_col
    else:
      X_attack = np.concatenate((X_attack, X_attack_col), axis=1)
  print("OK")
  return X_attack

In [167]:
# lol = train_images[:2]
# print("labels: ", train_labels[:2, 0])
# m = shadow_models[0][0]
# # get the y_pred 
# prob = layers.Softmax()
# ret = prob(m.predict(X)).numpy()
# y_pred = np.apply_along_axis(np.argmax, 1, ret).reshape((-1, 1))
# print('pred:', y_pred)

In [168]:
# helper function to prepare each shadow dataset batch
def prepare_batch(model, X, y, in_D):
  #decide membership
  y_member = np.ones(shape=(y.shape[0], 1)) if in_D else np.zeros(shape=(y.shape[0], 1))

  # get the y_pred 
  prob = layers.Softmax()
  ret = prob(model.predict(X)).numpy()
  y_pred = np.apply_along_axis(np.argmax, 1, ret).reshape((-1, 1))
  perturbed_queries_res = augmented_queries(model, X, y_pred)
  
  # return an instance <actual class, predicted class, perturbed_queries_res from shadow models, 'in'/'out' D_target membership> 
  return np.concatenate((y.reshape(-1, 1), y_pred, perturbed_queries_res, y_member), axis=1)

def generate_attack_dataset(shadow_models, n_classes):
  # input is a list where items are model, (X_train, y_train), (X_test, y_test)

  D_attack = None
  # D_attack_i format = <class, prob_vec, membership label (1 or 0)> 
  for shadow_model, ((X_train, y_train), (X_test, y_test)) in shadow_models:
    s = min(X_train.shape[0], X_test.shape[0])
    print(f"Preparing shadow batch of size {2*s}")
    batch = np.concatenate((
        prepare_batch(shadow_model, X_train[:s], y_train[:s], True), # members of shadow dataset 
        prepare_batch(shadow_model, X_test[:s], y_test[:s], False)   # non members of shadow dataset
    ))   

    D_attack = np.concatenate((D_attack, batch)) if D_attack is not None else batch  
    print("Done!")
  return D_attack 

In [169]:
D_attack = generate_attack_dataset(shadow_models, 10)


Preparing shadow batch of size 3300
Applying 7 Rotation
OK
Applying 5 Translates
OK
Applying 7 Rotation
OK
Applying 5 Translates
OK
Done!
Preparing shadow batch of size 3300
Applying 7 Rotation
OK
Applying 5 Translates
OK
Applying 7 Rotation
OK
Applying 5 Translates
OK
Done!
Preparing shadow batch of size 3300
Applying 7 Rotation
OK
Applying 5 Translates
OK
Applying 7 Rotation
OK
Applying 5 Translates
OK
Done!
Preparing shadow batch of size 3300
Applying 7 Rotation
OK
Applying 5 Translates
OK
Applying 7 Rotation
OK
Applying 5 Translates
OK
Done!
Preparing shadow batch of size 3300
Applying 7 Rotation
OK
Applying 5 Translates
OK
Applying 7 Rotation
OK
Applying 5 Translates
OK
Done!


In [174]:
attack_model_bundle = f_attack(D_attack[:, :-1], D_attack[:, -1])

(11550, 14) (4950, 14)
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


## Attack Evaluation

In [175]:
def evaluate_attack(attack_model, X_attack, y_attack, n_classes):
  acc_per_class = []
  for c in range(n_classes):
    class_instances = X_attack[:, 0] == c # get same class samples
    test_loss, test_acc = attack_model.evaluate(X_attack[class_instances, :], y_attack[class_instances], verbose=0)
    acc_per_class.append(test_acc)
    print(f"class-{c+1}: {test_acc}")
  return acc_per_class



In [176]:
# create a test dataset 

D_in = prepare_batch(target_model, train_images[:1000], train_labels[:1000], True)
print("Testing with 'in' data only:")
res_in = evaluate_attack(attack_model_bundle, D_in[:, :-1], D_in[:, -1], 10)

D_out = prepare_batch(target_model, attacker_images[:1000], attacker_labels[:1000], False)
print("\nTesting with 'out' data only:")
res_out = evaluate_attack(attack_model_bundle, D_out[:, :-1], D_out[:, -1], 10)

print("\nTesting with all prev data: ")
res_all = evaluate_attack(attack_model_bundle, np.concatenate((D_out[:, :-1], D_in[:, :-1])), np.concatenate((D_out[:, -1], D_in[:, -1])), 10)

print(f"\nTotal attack accuracy: {np.mean(res_all)}")

Applying 7 Rotation
OK
Applying 5 Translates
OK
Testing with 'in' data only:
class-1: 0.9732142686843872
class-2: 1.0
class-3: 0.9387755393981934
class-4: 0.9368420839309692
class-5: 0.9909090995788574
class-6: 0.9418604373931885
class-7: 1.0
class-8: 0.957446813583374
class-9: 1.0
class-10: 0.9892473220825195
Applying 7 Rotation
OK
Applying 5 Translates
OK

Testing with 'out' data only:
class-1: 0.561904788017273
class-2: 0.450549453496933
class-3: 0.800000011920929
class-4: 0.8723404407501221
class-5: 0.5978260636329651
class-6: 0.800000011920929
class-7: 0.4020618498325348
class-8: 0.4954954981803894
class-9: 0.3199999928474426
class-10: 0.48695650696754456

Testing with all prev data: 
class-1: 0.774193525314331
class-2: 0.7382199168205261
class-3: 0.868686854839325
class-4: 0.9047619104385376
class-5: 0.8118811845779419
class-6: 0.8674033284187317
class-7: 0.7010309100151062
class-8: 0.707317054271698
class-9: 0.6837209463119507
class-10: 0.7115384340286255

Total attack accuracy:

# Questions for meeting

- Paper said of accuracy above 80%? Am I doing something wrong?
- How to get more features in this threat model, so that attack is even more successful
- Should we try to attack model with MIA defences on, or is it too early?
- What's next?
