# SPML HW3: Breaking Defenses & Black-Box Attacks

In [None]:
name = ''
std_id = ''

In [None]:
import torch
from torch import nn
from torch.optim import Adam
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss
from torch.utils.data import DataLoader

from torchvision import transforms
from torchvision.models import resnet18, mobilenet_v2
from torchvision.datasets.cifar import CIFAR10

from tqdm import trange, tqdm

torch.manual_seed(0)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

# CIFAR10 Dataset (5 points)

In [None]:
norm_mean = (0.4914, 0.4822, 0.4465)
norm_std = (0.2023, 0.1994, 0.2010)
batch_size = 128

mu = torch.tensor(norm_mean).view(3,1,1).to(device)
std = torch.tensor(norm_std).view(3,1,1).to(device)

# TODO: Set the upper limit and lower limit possible for images
upper_limit = ...
lower_limit = ...

transform_train = transforms.Compose([
])

transform_test = transforms.Compose([
])

trainset = CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)

testset = CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)


classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')


# Defensive Distillation (25 points)

[Defensive distillation](https://arxiv.org/abs/1511.04508) proceeds in four steps:

1.   **Train the teacher network**, by setting the temperature of the softmax to T during the
training phase.
2.   **Compute soft labels** by apply the teacher network to each instance in the training set, again evaluating the softmax at temperature T.
3.  **Train the distilled network** (a network with the same shape as the teacher network) on the soft labels, using softmax at temperature T.
4.  Finally, when running the distilled network at test time to classify new inputs, use temperature 1.



## Train the teacher

In [3]:
def train_step(model, dataloader, loss_fn, optimizer, temperature):
    # TODO: Return loss and accuracy for each epoch
    pass


def train_teacher(model, n_epochs, loader=trainloader, temp=100):
    # TODO: Log the accuracy and loss for each epoch
    pass

You can use a pre-trained resnet to speed up the training process.

In [None]:
teacher = ...

train_teacher(teacher, 15)

## Test the teacher

In [6]:
def test_clean(model, dataloader=testloader):
    # TODO: Return the clean accuracy of the model
    pass

Print the clean accuracy of the teacher.

In [None]:
print(f'Teacher Accuracy {test_clean(teacher):.2f}%')

## Train the student

In [26]:
def distill(model, teacher, dataloader, optimizer, T):
    # TODO: Get soft labels from teacher model
    # TODO: Get student model outputs
    # TODO: Compute the distillation loss
    # TODO: Return the accuracy (on real labels) and loss (on soft labels)
    pass


def train_student(model, teacher, n_epochs, loader=trainloader, temp=100):
    # TODO: Log the accuracy and loss for each epoch
    pass

This time use a `resnet18` without the pretrained weights.

In [None]:
student = ...

train_student(student, teacher, 15)

## Test the student

In [None]:
print(f'Student Accuracy {test_clean(student):.2f}%')

# Attack (15 points)

Implement the FGSM attack and the `test_attack` funcion to report the robust accuracy for different values of epsilon.

In [37]:
def attack_fgsm(model, x, y, epsilon):
    # TODO: Return perturbed input
    pass


def attack_pgd(model, x, y, epsilon, alpha=0.2, num_iters=10):
    # TODO: Return perturbed input
    pass


def test_attack(model, epsilon, atttack=attack_fgsm, loader=testloader):
    # TODO: Return the robust accuracy for FGSM or PGD
    pass

Report the robust accuracy of the teacher for `ϵ = [1, 2, 4, 8, 16]`.

In [None]:
epsilons = [1, 2, 4, 8, 16]

for eps in epsilons:
    # TODO:
    print(f'FGSM with ϵ={eps}/255 has Accuracy: {acc:.2f}%')
    print(f'PGD  with ϵ={eps}/255 has Accuracy: {acc:.2f}%')

Do the same for the student:

In [None]:
for eps in epsilons:
    # TODO:
    print(f'FGSM with ϵ={eps}/255 has Accuracy: {acc:.2f}%')
    print(f'PGD  with ϵ={eps}/255 has Accuracy: {acc:.2f}%')

What do you see?

`your response:`

# Transferring Adversarial Examples (15 points)

Train yet another model to be used as the surrogate. (set temperature to 1)

In [None]:
model = ...

Print the surrogate accuracy.

Report the accuracy of the surrogate for `ϵ = [1, 2, 4, 8, 16]`.

Implement the following functions to transfer attacks from a surrogate model to an oracle.

In [47]:
def transfer_attack(oracle, model, eps, loader=testloader):
    # TODO: Attack the model and report the accuracy of the oracle
    pass

Transfer attacks for `ϵ = [1, 2, 4, 8, 16]` from your model to the student.

In [None]:
for eps in epsilons:
    acc = transfer_attack(student, model, eps/255)
    print(f'FGSM with ϵ={eps}/255 has Accuracy: {acc:.2f}%')

- What can be inferred from these results?
- How are the accuracies of the student and the surrogate under attack related?
- Does Defensive Distillation obfuscate the gradients? Why?

`your response:`

# ZOO Based Black-Box Attacks (25 points)

Based on [Black-box Adversarial Attacks with Limited Queries and Information](https://arxiv.org/abs/1804.08598) you must first calculate the estimate of the graidents, and next attack the model based on your estimates.

In [49]:
def nes_gradient_estimate(model, x, y, epsilon, num_samples, sigma):
    # TODO: Return the estimated gradient
    pass

In [None]:
def partial_information_attack(model, x, y, epsilon, num_samples, sigma, num_steps, alpha):
    # TODO: Return the perturbed image
    pass

Now run this attack on your models and report the results. (You **DON'T** need to run the attack for the entire test dataset as this will take a lot of time!)

# Adversarially Robust Distillation (15 points)

In this section we are going to test another type of distillation to see if this method is robust. This technique is [Adversarially Robust Distillation](https://arxiv.org/abs/1905.09747).



1.   We will try to distill a robsut teacher from [Robust Bench](https://robustbench.github.io/) onto a smaller architecture.
2.   We minimize the KL-Divergence between the logits of the student and teacher to ensure fidelity. (You can also incorporate the classification loss as mentioned in the paper but you can choose to ignore it as well)
3.   At each step of the distillation you will attack the student (you can use either FGSM or PGD) and find an adversarial example $X + \delta$ for data point $X$. Next you will minimize $t^2 \times \text{KL}(S(X+\delta), T(X))$ where $S$ and $T$ are the student and teacher networks respectively.



In [None]:
! pip install git+https://github.com/RobustBench/robustbench.git

In [None]:
from robustbench.utils import load_model

teacher = load_model(model_name='Gowal2021Improving_R18_ddpm_100m', dataset='cifar10', threat_model='Linf')

In [None]:
def ard(student, teacher, dataloader, optimizer, eps, attack):
    # TODO
    pass


def adv_train_student(model, teacher, n_epochs, eps=8/255, loader=trainloader):
    # TODO
    pass

In [None]:
student = mobilenet_v2(weights=None)

# TODO: Adjust and train the student


Now report the accuracy of the student on the test dataset.

In [None]:
# TODO: Clean accurcy

# TODO: FGSM with eps=8/255

# TODO: PGD with eps=8/255
