This notebook experiments with different smoother approximations of RELU as proposed in the paper - [Smooth Adversarial Training](https://arxiv.org/abs/2006.14536). The authors show that RELU hurts the adversarial robustness of models and if it is replaced with its smoother approximations like Swish, GELU, Parametric SoftPlus (proposed in the same paper) then the adversarial robustness is enhanced greatly - 

<center>
<img src="https://i.ibb.co/YbLDW3n/Screen-Shot-2020-10-26-at-3-36-20-PM.png" width=650></img>
</center>

The authors attribute this performance boost boost due to the fact that smoother activation functions help in producing more informed gradients that, in turn, help to create harder adversarial examples *during* training. So, we end up training our model to be robust against harder adversarial examples chances. This is desirable for many practical purposes. 

For the purpose of this notebook we will using GELU, and Swish which are available via TensorFlow core.  Here's a figure from the same paper depicting the forward and backward nature of smoother activation functions - 

<center>
<img src="https://i.ibb.co/YthVyZC/image.png" width=680></img>
</center>

**Note** the notebook uses code from [this tutorial](https://www.tensorflow.org/neural_structured_learning/tutorials/adversarial_keras_cnn_mnist). 

## Initial Setup

In [None]:
!pip install -q tf-nightly # `tf-nightly` because of gelu and swish
!pip install -q neural-structured-learning

In [27]:
import matplotlib.pyplot as plt
import neural_structured_learning as nsl
import numpy as np

import tensorflow as tf
tf.get_logger().setLevel('INFO')

import tensorflow_datasets as tfds
tfds.disable_progress_bar()

print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.5.0-dev20201104


## Define Hyperparameters

In [3]:
class HParams(object):
  def __init__(self):
    self.input_shape = [28, 28, 1]
    self.num_classes = 10
    self.conv_filters = [32, 64, 64]
    self.kernel_size = (3, 3)
    self.pool_size = (2, 2)
    self.num_fc_units = [64]
    self.batch_size = 32
    self.epochs = 5
    self.adv_multiplier = 0.2
    self.adv_step_size = 0.2
    self.adv_grad_norm = 'infinity'

HPARAMS = HParams()

## FashionMNIST Dataset

In [4]:
datasets = tfds.load('fashion_mnist')

train_dataset = datasets['train']
test_dataset = datasets['test']

IMAGE_INPUT_NAME = 'image'
LABEL_INPUT_NAME = 'label'

[1mDownloading and preparing dataset fashion_mnist/3.0.1 (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to /root/tensorflow_datasets/fashion_mnist/3.0.1...[0m
Shuffling and writing examples to /root/tensorflow_datasets/fashion_mnist/3.0.1.incompleteYQSH9Z/fashion_mnist-train.tfrecord
Shuffling and writing examples to /root/tensorflow_datasets/fashion_mnist/3.0.1.incompleteYQSH9Z/fashion_mnist-test.tfrecord
[1mDataset fashion_mnist downloaded and prepared to /root/tensorflow_datasets/fashion_mnist/3.0.1. Subsequent calls will reuse this data.[0m


In [5]:
def normalize(features):
  features[IMAGE_INPUT_NAME] = tf.cast(
      features[IMAGE_INPUT_NAME], dtype=tf.float32) / 255.0
  return features

def convert_to_tuples(features):
  return features[IMAGE_INPUT_NAME], features[LABEL_INPUT_NAME]

def convert_to_dictionaries(image, label):
  return {IMAGE_INPUT_NAME: image, LABEL_INPUT_NAME: label}

train_dataset = train_dataset.map(normalize).shuffle(10000).batch(HPARAMS.batch_size).map(convert_to_tuples)
test_dataset = test_dataset.map(normalize).batch(HPARAMS.batch_size).map(convert_to_tuples)

## Model Utils

In [6]:
def build_base_model(hparams, activation="relu"):
  """Builds a model according to the architecture defined in `hparams`."""
  inputs = tf.keras.Input(
      shape=hparams.input_shape, dtype=tf.float32, name=IMAGE_INPUT_NAME)

  x = inputs
  for i, num_filters in enumerate(hparams.conv_filters):
    x = tf.keras.layers.Conv2D(
        num_filters, hparams.kernel_size, activation=activation)(
            x)
    if i < len(hparams.conv_filters) - 1:
      # max pooling between convolutional layers
      x = tf.keras.layers.MaxPooling2D(hparams.pool_size)(x)
  x = tf.keras.layers.Flatten()(x)
  for num_units in hparams.num_fc_units:
    x = tf.keras.layers.Dense(num_units, activation=activation)(x)
  pred = tf.keras.layers.Dense(hparams.num_classes, activation='softmax')(x)
  model = tf.keras.Model(inputs=inputs, outputs=pred)
  return model

base_model = build_base_model(HPARAMS)
base_model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
image (InputLayer)           [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0     

## Train Baseline Model and Evaluation

Let's start with our baseline model that include RELU as its primary non-linearity.

In [7]:
base_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
                   metrics=['acc'])
base_model.fit(train_dataset, epochs=HPARAMS.epochs)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f20fab21d68>

In [8]:
results = base_model.evaluate(test_dataset)
relu_named_results = dict(zip(base_model.metrics_names, results))
print('\naccuracy:', relu_named_results['acc'])


accuracy: 0.9061999917030334


## GELU Model

In [9]:
gelu_model = build_base_model(HPARAMS, tf.nn.gelu)
gelu_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
                   metrics=['acc'])
gelu_model.fit(train_dataset, epochs=HPARAMS.epochs)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f20bc0305f8>

In [10]:
results = gelu_model.evaluate(test_dataset)
gelu_named_results = dict(zip(gelu_model.metrics_names, results))
print('\naccuracy:', gelu_named_results['acc'])


accuracy: 0.9025999903678894


## Swish Model

In [11]:
swish_model = build_base_model(HPARAMS, tf.nn.gelu)
swish_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
                   metrics=['acc'])
swish_model.fit(train_dataset, epochs=HPARAMS.epochs)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f20bbe28a20>

In [12]:
results = swish_model.evaluate(test_dataset)
swish_named_results = dict(zip(swish_model.metrics_names, results))
print('\naccuracy:', swish_named_results['acc'])


accuracy: 0.9065999984741211


We see all the three models yielding similar results. Now, we are interested to see how harder adversarial examples each of these models can produce.

## Adversarially Fooling the Models

To do this, we first create a configuration for producing adversarial pertubations and then we use that to wrap our models with `nsl.keras.AdversarialRegularization` .

In [13]:
adv_config = nsl.configs.make_adv_reg_config(
    multiplier=HPARAMS.adv_multiplier,
    adv_step_size=HPARAMS.adv_step_size,
    adv_grad_norm=HPARAMS.adv_grad_norm
)

In [15]:
def get_reference_model(reference_model):
    reference_model = nsl.keras.AdversarialRegularization(
        base_model,
        label_keys=[LABEL_INPUT_NAME],
        adv_config=adv_config)
    reference_model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['acc'])
    
    return reference_model

`nsl` expects the inputs to be in a dictionary format - `{'image': image, 'label': label}` for example.

In [16]:
train_set_for_adv_model = train_dataset.map(convert_to_dictionaries)
test_set_for_adv_model = test_dataset.map(convert_to_dictionaries)

Now, we evaluate these different models under adversarial robustness.

In [22]:
def benchmark_model(reference_model, models_to_eval):
    perturbed_images, labels, predictions = [], [], []
    metrics = {
        name: tf.keras.metrics.SparseCategoricalAccuracy()
        for name in models_to_eval.keys()
    }
    
    for batch in test_set_for_adv_model:
        perturbed_batch = reference_model.perturb_on_batch(batch)
        # Clipping makes perturbed examples have the same range as regular ones.
        perturbed_batch[IMAGE_INPUT_NAME] = tf.clip_by_value(                          
            perturbed_batch[IMAGE_INPUT_NAME], 0.0, 1.0)
        y_true = perturbed_batch.pop(LABEL_INPUT_NAME)
        perturbed_images.append(perturbed_batch[IMAGE_INPUT_NAME].numpy())
        labels.append(y_true.numpy())
        predictions.append({})
        for name, model in models_to_eval.items():
            y_pred = model(perturbed_batch)
            metrics[name](y_true, y_pred)
            predictions[-1][name] = tf.argmax(y_pred, axis=-1).numpy()

    for name, metric in metrics.items():
        print('%s model accuracy: %f' % (name, metric.result().numpy()))

In [28]:
# We take the RELU model to create adversarial examples first,
# then use that model to evaluate on the adversarial examples
relu_adv_model = get_reference_model(base_model)
models_to_eval = {
    'relu': base_model,
}
benchmark_model(relu_adv_model, models_to_eval)





































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































relu model accuracy: 0.024700


In [30]:
# We take the GELU model to create adversarial examples first,
# then use that model to evaluate on the adversarial examples
gelu_adv_model = get_reference_model(gelu_model)
models_to_eval = {
    'gelu': gelu_model,
}
benchmark_model(gelu_adv_model, models_to_eval)





































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































gelu model accuracy: 0.106900


In [26]:
# We take the Swish model to create adversarial examples first,
# then use that model to evaluate on the adversarial examples
swish_adv_model = get_reference_model(swish_model)
models_to_eval = {
    'swish': swish_model,
}
benchmark_model(swish_adv_model, models_to_eval)





































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































swish model accuracy: 0.118400


Notice that the RELU model fails considerably compared to the GELU and Swish models in terms of validation accuracy. Next we are going to use the Swish model (you can use GELU model too) to generate the adversarial examples and we will use the RELU model to evaluate it on those examples. 

In [31]:
swish_adv_model = get_reference_model(swish_model)
models_to_eval = {
    'relu': base_model,
}
benchmark_model(swish_adv_model, models_to_eval)





































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































relu model accuracy: 0.024700


Let's now see what happens if we swap the models i.e. use the RELU model to generate the adversarial examples and use the Swish model for evaluation. 

In [32]:
relu_adv_model = get_reference_model(base_model)
models_to_eval = {
    'swish': swish_model,
}
benchmark_model(relu_adv_model, models_to_eval)





































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































swish model accuracy: 0.118400


This indeed suggests that the Swish model is able to produce harder adversarial examples than the RELU model.

## Adversarial Training with Swish

We now train the Swish model with adversarial regularization. 

In [33]:
swish_adv_model = build_base_model(HPARAMS, tf.nn.swish)
adv_model = nsl.keras.AdversarialRegularization(
    swish_adv_model,
    label_keys=[LABEL_INPUT_NAME],
    adv_config=adv_config
)
adv_model.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy',
                   metrics=['acc'])
adv_model.fit(train_set_for_adv_model, epochs=HPARAMS.epochs)

Epoch 1/5








Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: <cyfunction Socket.send at 0x7f2111d41e58> is not a module, class, method, function, traceback, frame, or code object


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: <cyfunction Socket.send at 0x7f2111d41e58> is not a module, class, method, function, traceback, frame, or code object


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: <cyfunction Socket.send at 0x7f2111d41e58> is not a module, class, method, function, traceback, frame, or code object





Cause: while/else statement not yet supported


Cause: while/else statement not yet supported


Cause: while/else statement not yet supported








Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f2080a4d780>

We can now compare the performance of the Swish model and this adversarially regularized Swish model to see the benefits.

In [34]:
swish_ref_model = get_reference_model(swish_model)
models_to_eval = {
    'swish': swish_model,
    'swish-adv': adv_model.base_model
}
benchmark_model(swish_ref_model, models_to_eval)





































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































swish model accuracy: 0.118400
swish-adv model accuracy: 0.437000
