HHU Deep Learning, WS2023/24, 26.01.2024

Lecture: Prof. Dr. Markus Kollmann

Exercises: Nikolas Adaloglou, Felix Michels

# Assignment 13 - Adversarial Attacks

---

Submit the solved notebook (not a zip) with your full name plus assignment number for the filename as an indicator, e.g `max_mustermann_a1.ipynb` for assignment 1. If we feel like you have genuinely tried to solve the exercise, you will receive 1 point for this assignment, regardless of the quality of your solution.

## <center> DUE FRIDAY 02.02.2024 2:30 pm </center>

Drop-off link: [https://uni-duesseldorf.sciebo.de/s/VmmJHs0Iws92Jyk](https://uni-duesseldorf.sciebo.de/s/VmmJHs0Iws92Jyk)

---

This exercise is about Adversarial Attacks. Feel free to have a look at the original [paper](https://arxiv.org/abs/1412.6572) by Goodfellow before writing any code.

### Some Tips and Remarks:

* Use `torch.clamp` for the clip operations.
* You can limit yourself to the first $1000$ images of the test set.
* Do not copy code from anywhere.
* Do not submit the checkpoints or datasets!
* If you run into problems, let us know!

# Part I. Preparation

In [None]:
!wget -c https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a13_adversarial_attack/data.tar.gz
!tar xf data.tar.gz

In [None]:
import torch
from torch import nn
from torch.nn import functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from tqdm import tqdm

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Part II. Loading a pretrained ResNet18

**Task:** Load the checkpoint `resnet18-cifar10.pth`and validate that the loaded model has a test accuracy of approximately $94\%$.

In [None]:
import torchvision.models as models
from models import ResNet18

# Load the ResNet18 using the given checkpoint
### START CODE HERE ### (approx. 4 lines)
### END CODE HERE ###

Initialise the CIFAR-10 dataset with dataloaders for the train and test sets. Choose the batch_size according to your system hardware. The input images are expected to be normalized to $[-1,1]$!

In [None]:
### START CODE HERE ### (approx. 7 lines)
### END CODE HERE ###
test_batch, test_labels = next(iter(testloader))

Implement the function below which calculates the model accuracy for a given dataloader.

In [None]:
def calc_accuracy(model, device, dataloader):
    correct, total = 0, 0
    ### START CODE HERE ### (approx. 4 lines)
    ### END CODE HERE ###
    return (correct/float(total)).item()

In [None]:
calc_accuracy(model, device, testloader)

# Part III. Fast-gradient-sign-method (FGSM)

**Task:** Implement the fast gradient sign method (FGSM) as described in the [paper](https://arxiv.org/abs/1412.6572), section 4. Clip the values of the adversarial image to $[-1,1]$. Assume that the objective $J$ is the cross entropy loss.
Do not modify the input data.

In [None]:
def fgsm(model, device, batch, labels, epsilon):
    """
    Returns adversarial examples of the batch
    """
    batch = batch.to(device)
    labels = labels.to(device)
    ### START CODE HERE ### (approx. 5 lines)
### END CODE HERE ###

The fooling rate is the relative amount of adversarial images that resulted in a different prediction by the model. Plot different values of $\epsilon$ against the resulting fooling rate on the test images in `test_batch`.

Find a good value for $\epsilon$ that results in a fooling rate of $70\%$ on the test images.

In [None]:
fooling_rates = []
predictions = model(test_batch.to(device)).argmax(1)
### START CODE HERE ### (approx. 5 lines)
### END CODE HERE ###
plt.figure(figsize=(8,5))
plt.plot(torch.linspace(0,0.4,20), fooling_rates)
plt.xlabel("Epsilon")
plt.ylabel("Fooling Rate")

### Expected Result

![fooling_rate.png](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a13_adversarial_attack/figs/fooling_rate.png)

In [None]:
# Run the fgsm function with your epsilon
### START CODE HERE ### (approx. 1 lines)
### END CODE HERE ###
perturbations = advs - test_batch

adv_predictions = model(advs.to(device)).argmax(1)
predictions = model(test_batch.to(device)).argmax(1)
(adv_predictions != predictions).float().mean().item()

Visualize at least $10$ adversarial images and compare them with the original images. Can you recognize the adversarial perturbations?

Check out `torchvision.utils.make_grid` for good-looking plots here.

In [None]:
### START CODE HERE ### (approx. 7 lines)
### END CODE HERE ###

# Part IV. Basic Iterative Method (BIM)

An extension of FGSM is the basic iterative method (BIM) as proposed in this [paper](https://arxiv.org/abs/1607.02533).

**Task:** Implement targeted BIM as described in Section 2.3 of the paper. This method expects a target label as additional input and will change the image in a way that the classifier will output this target label for the adversarial image if the attack was successful. Again, clip the values of the adversarial image to $[-1,1]$.

In [None]:
def bim(model, device, data, step_size, steps=20, epsilon=0.05):
    data = data.to(device)
    ### START CODE HERE ### (approx. 11 lines)
    ### END CODE HERE ###
    return advs

Tune the hyperparameters of the attack such that the fooling rate is above 90%.

In [None]:
# Run the bim function
### START CODE HERE ### (approx. 1 lines)
### END CODE HERE ###

In [None]:
perturbations_bim = advs_bim - test_batch
adv_predictions = model(advs_bim.to(device)).argmax(1).cpu()
torch.mean((adv_predictions != test_labels).float())

Again, visualize the adversarial images.

In [None]:
### START CODE HERE ### (approx. 7 lines)
### END CODE HERE ###

Compare the resulting perturbation norm with the perturbation norm of the adversarial images produced by FGSM, and plot both perturbations.

In [None]:
### START CODE HERE ### (approx. 7 lines)
### END CODE HERE ###