# Project Part 3: Adversarial, Transferability and Robustification



We recommand you to use Google Colab to edit and run this notebook. You can also install jupyter on your own computer.

In [None]:
import torch
import numpy as np
from sklearn.datasets import fetch_openml
from torch import nn
import torch.nn.functional as F

from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader, TensorDataset
from tqdm import tqdm
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

## 0. Prepare data

You can familiarise yourself with MNIST, a small size dataset, on its Wikipedia article [https://en.wikipedia.org/wiki/MNIST_database](https://en.wikipedia.org/wiki/MNIST_database). MNIST is composed of 28x28 grayscaled images of handwritten digits. This is a classification task with 10 classes (10 digits).

In [None]:
# Data Loading
mnist = fetch_openml('mnist_784', as_frame=False, cache=True)


In [None]:
x = mnist["data"]
y = mnist["target"]

In [None]:
# Data exploration
print(f"Shape of x: {x.shape}")
print(f"Min, max x: {x.min(), x.max()}")
print(f"Shape of y: {y.shape}")
print(f"Classes in y: {np.unique(y)}")

Shape of x: (70000, 784)
Min, max x: (0.0, 255.0)
Shape of y: (70000,)
Classes in y: ['0' '1' '2' '3' '4' '5' '6' '7' '8' '9']


In [None]:
# Preprocessing
x = torch.from_numpy(x.astype(float)).float()
y = torch.from_numpy(y.astype(int)).type(torch.LongTensor)
# Shape
x = x.reshape(-1, 1, 28, 28)
# Scaler
x = (x - x.min()) / (x.max() - x.min())


In [None]:
# Split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42, stratify=y, shuffle=True)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=42, stratify=y_train, shuffle=True)

## 1. Adversarial examples

The goal of this first part is to generate adversarial examples on a simple dataset called MNIST. MNIST is a dataset of 28x28 black and white images that represents hand-written digits, and their associate label 0,1,...,9.

You can use the following ressource to help you [https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#).


1. Train a Neural Network using the PyTorch library.

The architecture of the models and the training hyper-parameters are given below.
We recommend using these parameters, the SGD optimizer and the Cross Entropy loss.


In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return x

In [None]:
learning_rate = 0.001
momentum=0.9
epochs = 10
batch_size = 64


In [None]:
## YOUR CODE HERE: use SGD with the provided hyperparameters
model_0 = None
optimizer = None

In [None]:
## YOUR CODE HERE: use CrossEntropyLoss.
loss_func = None

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer, batch_size):
    size = len(dataloader.dataset)
    for batch, (X, y) in tqdm(enumerate(dataloader), total=int(size/batch_size)):


        # Compute prediction and loss


        ## YOUR CODE HERE:

        # Backpropagation

        ## YOUR CODE HERE:



In [None]:
## GIVEN, to evaluate the progress of the training at each epoch
def val_loop(dataloader, model, loss_fn, epoch_i):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Epoch {epoch_i}, Val Error: Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f}")

In [None]:
def train_model(model, x_train, y_train, x_val, y_val, optimizer, batch_size, loss_func, epochs):
    # Data processing
    train_dataset = TensorDataset(x_train, y_train)
    train_loader = DataLoader(
        dataset=train_dataset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=2,
    )
    val_dataset = TensorDataset(x_val, y_val)
    val_loader = DataLoader(
        dataset=val_dataset,
        batch_size=2000,
        shuffle=True,
        num_workers=2,
    )

    # Main train loop
    ## YOUR CODE HERE:


In [None]:
## YOUR CODE HERE: train the model using the training function you just implemented.


2. Evaluate clean accuracy of the Neural Network using a test set that has not been used for training.

In [None]:
# Set model into evaluation mode
model_0.eval()

In [None]:
## YOUR CODE HERE: Evaluate model accuracy

accuracy = None
print(f"Clean accuracy of the model is {accuracy}.")

3. Implement and execute the PGD attack on 1000 examples of the testing set. The hyperparameters of PGD are given below.
The perturbation is bounded by a maximum L-infinity norm, called epsilon (eps), which means that each pixel can be perturbed between -eps and +eps. We initialy set the maximum perturbation to eps = 32/255. For simplicity, you can set the step size alpha = epsilon / 10, and run PGD with only one random restart.

You can find the description of PGD in the paper [https://arxiv.org/abs/1706.06083](https://arxiv.org/abs/1706.06083) and an example of another adversarial attack on the PyTorch documentation [https://pytorch.org/tutorials/beginner/fgsm_tutorial.html](https://pytorch.org/tutorials/beginner/fgsm_tutorial.html).
Tips: use the F.cross_entropy loss during the attack.


In [None]:
n_examples = 1000
eps = 32/255
n_iter = 50
alpha = eps / 10


In [None]:
## YOUR CODE HERE: Generate adversarial examples


4. Show the robust accuracy of model_0, that is the accuracy of the model on the adversarial examples.

In [None]:
## YOUR CODE HERE: Evaluate model robust accuracy


5. Show the impact of the maximum perturbation allowed (denoted epsilon).

In [None]:
eps = [8/255, 16/255, 32/255, 64/255]
alpha = [e/10 for e in eps]

In [None]:
## YOUR CODE HERE: compute the adversarial examples for each provided epsilon
## (maximum l-infinity norm of the perturbation), and compute the associated robust accuracy
## Use a graph to display your result. You may use the [Matplotlib] (https://matplotlib.org/stable/index.html).

6. Using matplotlib, plot 10 adversarial examples, along with their corresponding original images. Choose one original image classified per class (the 10 class should be represented). For each image (adversarial and original), add on the plot the predicted class of the image.


In [None]:
## YOUR CODE HERE

**Question**: Please comment your results of this section.

**ANSWER HERE**


## 2. Transferability

In this section we will see how adversarial examples generated on one model can be adversarial on another model using a different architecture.
Let suppose a second model which parameters are unknown. For instance, it could be a model deploy on a cloud platform. We will use the examples generated in Section 1 on model_0 to fool this new model denoted model_1.
We say that model_0 is a surrogate for model_1.

1. Define a neural network architecture for MNIST different than the one used in Section 1.

In [None]:
## GIVEN
class FullyConnectedNetwork(nn.Module):
    def __init__(self):
        super(FullyConnectedNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

2. Train the neural network model_1 with the same hyperparameters as model_0


In [None]:
## YOUR CODE HERE
model_1 = None
optimizer = None  # create a new optimizer object when you train a new model

3. What is the ratio of successful adversarial examples on model_0 that transfers to model_1 (ie. that are also adversarial for model_1)?


In [None]:
model_1.eval()
## YOUR CODE HERE

What do you conclude about the robustness of the model? Can [secrecy](https://en.wikipedia.org/wiki/Security_through_obscurity) defend a model?

**ANSWER HERE**

## 3. Use adversarial training to robustify the model

Adversarial training is a common method to robustify models to adversarial examples as described in this paper [https://arxiv.org/abs/1706.06083](https://arxiv.org/abs/1706.06083). In this section you should update the training loop such that 3/4 of the batch is used for training while the remaining forth is first perturbed with PGD and then used for training. You can limit the number of iterations of PDG to 10. Use model_0 architecture from Section 1 in this section.

1. Train model_robust using adversarial training. You may want to run it for additional epoch (x2) to reach a similar clean accuracy.

In [None]:
n_iter = 10  # less iterations to accelerate training. But once trained, we will still evaluate the robust accuracy on more iterations for a more powerful attack.
eps = 32/255
alpha = eps / 5
model_robust = Net()  # newly initialized NN

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer, batch_size):
    size = len(dataloader.dataset)
    adv_size = int(batch_size/4)
    for batch, (X, y) in tqdm(enumerate(dataloader), total=int(size/batch_size)):

        # Generate adversarial examples for a forth of the data

        model.eval()
        ## YOUR CODE HERE
        model.train()

        # Compute prediction and loss

        ## YOUR CODE HERE:

        # Backpropagation

        ## YOUR CODE HERE:



In [None]:
## YOUR CODE HERE: The rest of training implementation is unchanged.
## Do not reuse the same optimizer object!!!

2. Compare the robust accuracies of model_0 and model_robust using the same hyperparameters of PGD for different eps size, use a graph to show your results.

In [None]:
n_examples = 1000
n_iter = 50
eps = [8/255, 16/255, 32/255, 64/255]
alpha = [e/10 for e in eps]

In [None]:
## YOUR CODE HERE

**Questions**: Please comment your results. Does adversarial training appears to be a valid defense? Please develop threads to validity of the robust accuracy evaluation carried out here. What could be done to improve the evaluation of the robustness of the model?

**ANSWER HERE**