## Introduction

This notebook shows how to load and evaluate the MNIST and CIFAR-10 models synthesized and trained as described in the following paper:

M.Sinn, M.Wistuba, B.Buesser, M.-I.Nicolae, M.N.Tran: **Evolutionary Search for Adversarially Robust Neural Network** *ICLR SafeML Workshop 2019 (arXiv link to the paper will be added shortly)*.

The models were saved in `.h5` using Python 3.6, TensorFlow 1.11.0, Keras 2.2.4.

In [1]:
import warnings
warnings.filterwarnings('ignore')
from keras.datasets import mnist, cifar10
from keras.models import load_model
from keras.utils.np_utils import to_categorical
import numpy as np

from art import config
from art.estimators.classification import KerasClassifier
from art.attacks.evasion import ProjectedGradientDescent
from art.utils import get_file

Using TensorFlow backend.


## MNIST

Three different MNIST models are available. Use the following URLs to access them:
- `mnist_ratio=0.h5`: trained on 100% benign samples (https://www.dropbox.com/s/bv1xwjaf1ov4u7y/mnist_ratio%3D0.h5?dl=1)
- `mnist_ratio=0.5.h5`: trained on 50% benign and 50% adversarial samples (https://www.dropbox.com/s/0skvoxjd6klvti3/mnist_ratio%3D0.5.h5?dl=1)
- `mnist_ratio=1.h5`: trained on 100% adversarial samples (https://www.dropbox.com/s/oa2kowq7kgaxh1o/mnist_ratio%3D1.h5?dl=1)

Load data:

In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype('float32') / 255
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype('float32') / 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


E.g. load the model trained on 50% benign and 50% adversarial samples:

In [3]:
path = get_file('mnist_ratio=0.5.h5',extract=False, path=config.ART_DATA_PATH,
                url='https://www.dropbox.com/s/0skvoxjd6klvti3/mnist_ratio%3D0.5.h5?dl=1')
model = load_model(path)
classifier = KerasClassifier(model=model, use_logits=False, clip_values=[0,1])

Assess accuracy on first `n` benign test samples:

In [4]:
n = 10000
y_pred = classifier.predict(X_test[:n])
accuracy = np.mean(np.argmax(y_pred, axis=1) == np.argmax(y_test[:n], axis=1))
print("Accuracy on first %i benign test samples: %f" % (n, accuracy))

Accuracy on first 10000 benign test samples: 0.995100


Define adversarial attack:

In [5]:
attack = ProjectedGradientDescent(classifier, eps=0.3, eps_step=0.01, max_iter=40, targeted=False, 
                                  num_random_init=True) 

Assess accuracy on first `n` adversarial test samples:

In [6]:
n = 10
X_test_adv = attack.generate(X_test[:n], y=y_test[:n])
y_adv_pred = classifier.predict(X_test_adv)
accuracy = np.mean(np.argmax(y_adv_pred, axis=1) == np.argmax(y_test[:n], axis=1))
print("Accuracy on first %i adversarial test samples: %f" % (n, accuracy))

Accuracy on first 10 adversarial test samples: 0.900000


## CIFAR-10

Similarly to MNIST, three different CIFAR-10 models are available at the following URLs:
- `cifar-10_ratio=0.h5`: trained on 100% benign samples (https://www.dropbox.com/s/hbvua7ynhvara12/cifar-10_ratio%3D0.h5?dl=1)
- `cifar-10_ratio=0.5.h5`: trained on 50% benign and 50% adversarial samples (https://www.dropbox.com/s/96yv0r2gqzockmw/cifar-10_ratio%3D0.5.h5?dl=1)
- `cifar-10_ratio=1.h5`: trained on 100% adversarial samples (https://www.dropbox.com/s/7btc2sq7syf68at/cifar-10_ratio%3D1.h5?dl=1)

Load data:

In [7]:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.reshape(X_train.shape[0], 32, 32, 3).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 32, 32, 3).astype('float32')
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

E.g. load the model trained on 50% benign and 50% adversarial samples:

In [8]:
path = get_file('cifar-10_ratio=0.5.h5',extract=False, path=config.ART_DATA_PATH,
                url='https://www.dropbox.com/s/96yv0r2gqzockmw/cifar-10_ratio%3D0.5.h5?dl=1')
model = load_model(path)
classifier = KerasClassifier(model=model, use_logits=False, clip_values=[0,255])

Assess accuracy on first `n` benign test samples:

In [9]:
n = 100
y_pred = classifier.predict(X_test[:n])
accuracy = np.mean(np.argmax(y_pred, axis=1) == np.argmax(y_test[:n], axis=1))
print("Accuracy on first %i benign test samples: %f" % (n, accuracy))

Accuracy on first 100 benign test samples: 0.940000


Define adversarial attack:

In [10]:
attack = ProjectedGradientDescent(classifier, eps=8, eps_step=2, max_iter=10, targeted=False, 
                                  num_random_init=True) 

Assess accuracy on first `n` adversarial test samples:

In [11]:
n = 100
X_test_adv = attack.generate(X_test[:n], y=y_test[:n])
y_adv_pred = classifier.predict(X_test_adv)
accuracy = np.mean(np.argmax(y_adv_pred, axis=1) == np.argmax(y_test[:n], axis=1))
print("Accuracy on first %i adversarial test samples: %f" % (n, accuracy))

Accuracy on first 100 adversarial test samples: 0.470000
