<table style="border: none" align="center">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="4" color="black"><b>  Demonstrate adversarial training using ART  </b></font></font></th>
   </tr> 
</table>

In this notebook we demonstrate adversarial training using ART on the MNIST dataset.


## Contents

1.	[Load prereqs and data](#prereqs)
2.  [Train and evaluate a baseline classifier](#classifier)
3.  [Adversarially train a robust classifier](#adv_training)
4.	[Evaluate the robust classifier](#evaluation)

<a id="prereqs"></a>
## 1. Load prereqs and data

In [None]:
!pip install tensorflow --upgrade
!pip install adversarial-robustness-toolbox==0.9

from keras.models import load_model

from art.utils import load_dataset
from art.classifiers import KerasClassifier
from art.attacks.fast_gradient import FastGradientMethod
from art.attacks.iterative_method import BasicIterativeMethod
from art.defences.adversarial_trainer import AdversarialTrainer

import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
!mkdir ../.art/data

(x_train, y_train), (x_test, y_test), min_, max_ = load_dataset('mnist')

<a id="classifier"></a>
## 2. Train and evaluate a baseline classifier

Training this model can take a long time, so for this example we just load a previously trained model from its saved state. On Watson Studio we download the saved model onto the current Python kernel. Then we will load the classifier model.

Load a pre-trained model to show the whole process faster.
Upload the saved models to the Cloud Object Storage.
Run the below code cell to upload them.

In [None]:
import requests

def download_files(link, name):
    response = requests.get(link)
    open(name, 'wb').write(response.content)


orig_model_link = 'https://github.com/IBM/ML-Pipelines-101/raw/master/models/mnist_cnn_original.h5'
robust_model_link = 'https://github.com/IBM/ML-Pipelines-101/raw/master/models/mnist_cnn_robust.h5'

download_files(orig_model_link, 'mnist_cnn_original.h5')
download_files(robust_model_link, 'mnist_cnn_robust.h5')


In [None]:
classifier_model = load_model("mnist_cnn_original.h5")
classifier = KerasClassifier( classifier_model, clip_values=(min_, max_), use_logits=False) 

In [None]:
classifier_model.summary()

Evaluate the classifier performance on the first 100 original test samples:

In [None]:
x_test_pred = np.argmax(classifier.predict(x_test[:100]), axis=1)
nb_correct_pred = np.sum(x_test_pred == np.argmax(y_test[:100], axis=1))

print("Original test data (first 100 images):")
print("Correctly classified: {}".format(nb_correct_pred))
print("Incorrectly classified: {}".format(100-nb_correct_pred))

Generate some adversarial samples:

In [None]:
attacker = FastGradientMethod(classifier, eps=0.5)
x_test_adv = attacker.generate(x_test[:100])

And evaluate performance on those:

In [None]:
x_test_adv_pred = np.argmax(classifier.predict(x_test_adv), axis=1)
nb_correct_adv_pred = np.sum(x_test_adv_pred == np.argmax(y_test[:100], axis=1))

print("Adversarial test data (first 100 images):")
print("Correctly classified: {}".format(nb_correct_adv_pred))
print("Incorrectly classified: {}".format(100-nb_correct_adv_pred))

<a id="adv_training"></a>
## 3. Adversarially train a robust classifier

In [None]:
robust_classifier_model = load_model("mnist_cnn_robust.h5")
robust_classifier = KerasClassifier(robust_classifier_model, clip_values=(min_, max_), use_logits=False) 

Note: the robust classifier has the same architecture as above, except the first dense layer has **1024** instead of **128** units. (This was recommend by Madry et al. (2017), *Towards Deep Learning Models Resistant to Adversarial Attacks*)

In [None]:
robust_classifier_model.summary()

Also as recommended by Madry et al., we use BIM/PGD attacks during adversarial training:

In [None]:
attacks = BasicIterativeMethod(robust_classifier, eps=0.3, eps_step=0.01, max_iter=40)

Perform adversarial training:

In [None]:
# We had performed this before, starting with a randomly intialized model.
# Adversarial training takes about 80 minutes on an NVIDIA V100.
# The resulting model is the one loaded from mnist_cnn_robust.h5 above.

# Here is the command we had used for the Adversarial Training

# trainer = AdversarialTrainer(robust_classifier, attacks, ratio=1.0)
# trainer.fit(x_train, y_train, nb_epochs=83, batch_size=50)

<a id="evaluation"></a>
## 4. Evaluate the robust classifier

Evaluate the robust classifier's performance on the original test data:

In [None]:
x_test_robust_pred = np.argmax(robust_classifier.predict(x_test[:100]), axis=1)
nb_correct_robust_pred = np.sum(x_test_robust_pred == np.argmax(y_test[:100], axis=1))

print("Original test data (first 100 images):")
print("Correctly classified: {}".format(nb_correct_robust_pred))
print("Incorrectly classified: {}".format(100-nb_correct_robust_pred))

Evaluate the robust classifier's performance on the adversarial test data (**white-box** setting):

In [None]:
attacker_robust = FastGradientMethod(robust_classifier, eps=0.5)
x_test_adv_robust = attacker_robust.generate(x_test[:100])

In [None]:
x_test_adv_robust_pred = np.argmax(robust_classifier.predict(x_test_adv_robust), axis=1)
nb_correct_adv_robust_pred = np.sum(x_test_adv_robust_pred == np.argmax(y_test[:100], axis=1))

print("Adversarial test data (first 100 images):")
print("Correctly classified: {}".format(nb_correct_adv_robust_pred))
print("Incorrectly classified: {}".format(100-nb_correct_adv_robust_pred))

Compare the performance of the original and the robust classifier over a range of `eps` values, plot the results:

In [None]:
eps_range = [0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
nb_correct_original = []
nb_correct_robust = []

for eps in eps_range:
    attacker.set_params(**{'eps': eps})
    x_test_adv = attacker.generate(x_test[:100])
    x_test_adv_robust = attacker_robust.generate(x_test[:100])
    
    x_test_adv_pred = np.argmax(classifier.predict(x_test_adv), axis=1)
    nb_correct_original += [np.sum(x_test_adv_pred == np.argmax(y_test[:100], axis=1))]
    
    x_test_adv_robust_pred = np.argmax(robust_classifier.predict(x_test_adv_robust), axis=1)
    nb_correct_robust += [np.sum(x_test_adv_robust_pred == np.argmax(y_test[:100], axis=1))]

eps_range = [0] + eps_range
nb_correct_original = [nb_correct_pred] + nb_correct_original
nb_correct_robust = [nb_correct_robust_pred] + nb_correct_robust

In [None]:
fig, ax = plt.subplots()
ax.plot(np.array(eps_range), np.array(nb_correct_original), 'b--', label='Original classifier')
ax.plot(np.array(eps_range), np.array(nb_correct_robust), 'r--', label='Robust classifier')

legend = ax.legend(loc='upper center', shadow=True, fontsize='large')
legend.get_frame().set_facecolor('#00FFCC')

plt.xlabel('Attack strength (eps)')
plt.ylabel('Correct predictions')
plt.show()