<h1> Preparing and carring out the attack </h1>

The original paper could be found here: https://arxiv.org/abs/1602.02697

Hacking the discriminative model is the goal of this script: images which are classified in such a way, modify them lightly so the discriminative changes its mind and classify them diffently and, in general, wrongly.

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import functools
import os

from PIL import Image

import numpy as np
from keras.preprocessing.image import ImageDataGenerator

from keras.applications.inception_v3 import preprocess_input
from six.moves import xrange

import logging
import tensorflow as tf
from tensorflow.python.platform import flags
from keras.models import load_model

from cleverhans.loss import CrossEntropy
from cleverhans.model import Model
from cleverhans.utils import to_categorical
from cleverhans.utils import set_log_level
from cleverhans.utils_tf import train, model_eval, batch_eval
from cleverhans.attacks import FastGradientMethod
from cleverhans.attacks_tf import jacobian_graph, jacobian_augmentation
from cleverhans.utils_keras import KerasModelWrapper
from cleverhans.utils_keras import cnn_model

from cleverhans_tutorials.tutorial_models import HeReLuNormalInitializer
from cleverhans.utils import TemporaryLogLevel

Using TensorFlow backend.


In [2]:
# Set logging level to see debug information
set_log_level(logging.DEBUG)

<h4> SETTINGS </h4>
Define the settings

In [3]:
CUDA_VISIBLE_DEVICES=0,1

IM_WIDTH, IM_HEIGHT = 299, 299
BATCH_SIZE = 32

In [4]:
input_path = 'Food12'  # Input images path
out_path = 'attack_data'  # Path for the adversarial images
batch_size = 128  # Size of training batches
learning_rate = 0.001  # Learning rate for training

# related to substitute
data_aug = 5 # Nb of substitute data augmentations
nb_epochs_s = 100  #  Training epochs for substitute
lmbda = 0.1  # Lambda from arxiv.org/abs/1602.02697
aug_batch_size = 512  # Batch size for augmentation

In [5]:
# Seed random number generator so tutorial is reproducible
rng = np.random.RandomState([2017, 8, 30])

<h3> Create the subsitute </h3>

We are going to train a substitute with the purpose of obtain the same performance as the model that we want to attack.

First, we are going to prepare the training, evaluation and testind data using Image data Generator method from keras.

We are going to use tensorflow to train the model, it is necessary to define the data as placeholders

In [6]:
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
                                   rotation_range=30,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True
)

test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)


train_generator = train_datagen.flow_from_directory('{}/training'.format(input_path),
                                                    target_size=(IM_WIDTH, IM_HEIGHT),
                                                    batch_size=BATCH_SIZE,
)
validation_generator = test_datagen.flow_from_directory('{}/evaluation'.format(input_path),
                                                        target_size=(IM_WIDTH, IM_HEIGHT),
                                                        batch_size=BATCH_SIZE,
)

  image.ImageDataGenerator.__init__).args:
  image.ImageDataGenerator.__init__).args:


Found 711 images belonging to 12 classes.
Found 577 images belonging to 12 classes.


In [7]:
# get one batch of data
x_train, y_train = train_generator.next()
x_test, y_test = validation_generator.next()

In [8]:
# Initialize substitute training set reserved for adversary
X_sub = x_test
Y_sub = np.argmax(y_test, axis=1)

In [9]:
# Obtain Image parameters
img_rows, img_cols, nchannels = x_train.shape[1:4]
nb_classes = y_train.shape[1]

In [10]:
# Define input TF placeholder
x = tf.placeholder(tf.float32, shape=(None, img_rows, img_cols, nchannels))
y = tf.placeholder(tf.float32, shape=(None, nb_classes))

<h3> Get the predictions of the discriminative model </h3>

Once the data is ready, we need to get some predictions from the model that we want to attack.

Because of our model is locally, we are going to load the saved model and get the predictions, but if your model is an API, is in a server, here you should modify how you need to change how you obtain the predictions

In [12]:
# Simulate the black-box model locally
# You could replace this by a remote labeling API for instance
print("Setting up the black-box model.")
black_box_model = load_model('inceptionv3-ft120_910acc.model')
black_box_model_wrapper = KerasModelWrapper(black_box_model)
bbox_preds = black_box_model_wrapper.get_probs(x)

Setting up the black-box model.


  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_names:
  if weight_

<h3> Define the substitute </h3>

The objective of the substitute is copy the performance of the model that we want to attack, so we do not need here focus our training goal in getting the higher accuracy of the model, we want to copy the bounduary borders of the model, so its errors, are ours too. We need to predict as close as possible as the same way that the discriminative model.

To define the architecture, because of the previous explanation, the architecture is simple, we can add layers or modify them, but it is not necessary at all.

The subsitute is build using as base cleverhans, and TF is need to use. Cleverhans is very usefull for working with attacks, defenses, generatives... in neural networks. 

In [13]:
#create TF session
sess = tf.Session()

create the substitute model from cleverhans Model class. 

In [22]:
class ModelSubstitute(Model):
    def __init__(self, scope, nb_classes, nb_filters=200, **kwargs):
        del kwargs
        Model.__init__(self, scope, nb_classes, locals())
        self.nb_filters = nb_filters

    def fprop(self, x, **kwargs):
        del kwargs
        my_dense = functools.partial(
            tf.layers.dense, kernel_initializer=HeReLuNormalInitializer)
        with tf.variable_scope(self.scope, reuse=tf.AUTO_REUSE):
            y = tf.layers.flatten(x)
            y = my_dense(y, self.nb_filters, activation=tf.nn.relu)
            y = my_dense(y, self.nb_filters, activation=tf.nn.relu)
            logits = my_dense(y, self.nb_classes)
            return {self.O_LOGITS: logits,
                    self.O_PROBS: tf.nn.softmax(logits=logits)}

In [23]:
model_sub = ModelSubstitute('model_s', nb_classes)
preds_sub = model_sub.get_logits(x)
loss_sub = CrossEntropy(model_sub, smoothing=0)

It is need to define the Jacobian, this method blabla bla bla...

In [24]:
# Define the Jacobian symbolically using TensorFlow
grads = jacobian_graph(preds_sub, x, nb_classes)

<h4> Data augmentation </h4>

The model that is going to be attacked, is locally and have been built by ourself, but in some cases, we can not obtain as many predictions as we desire to train out substitute (as much data, better), so we need to augment the data that we have. In this case, Cleverhans is going to be a very util tool.

So, when the training procedure is carrying out, after a training iteration, data augmentation is realized. But in the last training iteration, in this case, after the training, data augmentation it is not realized.

The reason of training + data augmentation each iteration (rho), is because doing the data augmentation after the training would produce data more accurate.

For the data augmentation, Jacobian method is going to be used. This method

In [None]:
for rho in xrange(data_aug):
    print("Substitute training epoch #" + str(rho))
    train_params = {
        'nb_epochs': nb_epochs_s,
        'batch_size': batch_size,
        'learning_rate': learning_rate
    }
    with TemporaryLogLevel(logging.WARNING, "cleverhans.utils.tf"):
        train(sess, loss_sub, x, y, X_sub,
              to_categorical(Y_sub, nb_classes),
              init_all=False, args=train_params, rng=rng,
              var_list=model_sub.get_params())

    # If we are not at last substitute training iteration, augment dataset
    if rho < data_aug - 1:
        print("Augmenting substitute training data.")
        # Perform the Jacobian augmentation
        lmbda_coef = 2 * int(int(rho / 3) != 0) - 1
        X_sub = jacobian_augmentation(sess, x, X_sub, Y_sub, grads,
                                      lmbda_coef * lmbda, aug_batch_size)

        print("Labeling substitute training data.")
        # Label the newly generated synthetic points using the black-box
        Y_sub = np.hstack([Y_sub, Y_sub])
        X_sub_prev = X_sub[int(len(X_sub)/2):]
        eval_params = {'batch_size': batch_size}
        bbox_val = batch_eval(sess, [x], [bbox_preds], [X_sub_prev],
                              args=eval_params)[0]
        # Note here that we take the argmax because the adversary
        # only has access to the label (not the probabilities) output
        # by the black-box model
        Y_sub[int(len(X_sub)/2):] = np.argmax(bbox_val, axis=1)

Substitute training epoch #0




Evaluate the substitute after training it

In [None]:
# Evaluate the substitute model on clean test examples
eval_params = {'batch_size': batch_size}
acc = model_eval(sess, x, y, preds_sub, x_test, y_test, args=eval_params)
accuracies['sub'] = acc

<h3> Create the generative attacks </h3>

When the training procedure of the subsitute model is done, the gradient of the subsitute is going to be used to create the adversial images. 

The method which is used to generate the adversial samples is the fast gradient sign method (fgsm). 

In [None]:
# Initialize the Fast Gradient Sign Method (FGSM) attack object.
fgsm_par = {'eps': 0.3, 'ord': np.inf, 'clip_min': 0., 'clip_max': 1.}
fgsm = FastGradientMethod(model_sub, sess=sess)

Now it's time to generate the adversial images using the evaluation subset

In [None]:
# Craft adversarial examples using the substitute
eval_params = {'batch_size': batch_size}
x_adv_sub = fgsm.generate(x, **fgsm_par)

The black box model can be tested with the adversial images.

In [None]:
# Evaluate the accuracy of the "black-box" model on adversarial examples
accuracy = model_eval(sess, x, y, black_box_model_wrapper.get_probs(x_adv_sub),
                      x_test, y_test, args=eval_params)
print('Test accuracy of oracle on adversarial examples generated using the substitute: ' + str(accuracy))
accuracies['bbox_on_sub_adv_ex'] = accuracy

#Evaluate the accuracy of the 'black-box' model on clean images
history = black_box_model.evaluateevaluate(x_test, x_test)
accuracies['bbox_on_clean_images'] = 'not available'
if history[1]:
    accuracies['bbox_on_clean_images'] = history[1]

Accuracies of the substitute, the black box model with clean images and with adversial images, are stored.
We should obtain that the accuracy of substitute and the blackbox model with the test images is similar (if the substitute training is correct) and the accuracy of the black box model on adversial images is lower than in the clean images.

In [None]:
print(f'accuracies on cleanning images and substitute model: {accuracies['sub']}')
print(f'accuracies on cleanning images on black box model: {accuracies['bbox_on_clean_images']}')
print(f'accuracies on adversarial images: {accuracies['bbox_on_sub_adv_ex']}')    

If we want, we can save the adversarial images that are misclassified by the black model classify, but not with the original images.

First, we need to obtain the predictions, so we can compare if the classification is the same (we have not tricked the black model) or is different.

In [None]:
adv_images = sess.run(x_adv_sub, feed_dict={x: x_test})

original_pred = black_box_model.predict(x_test)
attack_pred = black_box_model.predict(adv_images)

In [None]:
# Save the images only if they cheat the oracle
img_to_save = []
for op, ap, x_img, adv_img in zip(np.argmax(original_pred, axis=1),
                                  np.argmax(attack_pred, axis=1), x_test, adv_images):
    if op != ap:
        img_to_save.append((x_img, adv_img))

x_filenames = ['file_{}.jpg'.format(i) for i in range(len(img_to_save))]
x_filenames_attack = ['file_{}_attack.jpg'.format(i) for i in range(len(img_to_save))]

for i, filename in enumerate(x_filenames):
    # Images for inception classifier are normalized to be in [-1, 1] interval,
    # so rescale them back to [0, 1].
    with tf.gfile.Open(os.path.join(out_path, filename), 'w') as f:
        img = (((images[i, :, :, :] + 1.0) * 0.5) * 255.0).astype(np.uint8)
        Image.fromarray(img).save(f)

for i, filename in enumerate(x_filenames_attack):
    # Images for inception classifier are normalized to be in [-1, 1] interval,
    # so rescale them back to [0, 1].
    with tf.gfile.Open(os.path.join(out_path, filename), 'w') as f:
        img = (((images[i, :, :, :] + 1.0) * 0.5) * 255.0).astype(np.uint8)
        Image.fromarray(img).save(f)