# SPML HW4: Data Poisoning & Backdoor Attacks

It is **strongly recommended** that you read the entire notebook before you start coding as this will help you better understand the tasks and avoid mistakes and redundant implementations.

In [None]:
######### Make sure to RUN this cell #########
name = ''
std_id = ''
##############################################

In [None]:
import numpy as np
from tqdm import trange, tqdm
from sklearn.manifold import TSNE
from matplotlib import pyplot as plt

import torch
from torch import nn
from torch.optim import Adam
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss, MSELoss
from torch.utils.data import DataLoader, TensorDataset, ConcatDataset

from torchvision import transforms
from torchvision.datasets.cifar import CIFAR10
from torchvision.models import resnet18, ResNet18_Weights


device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

# CIFAR10 Dataset (5 points)

Load the CIFAR10 dataset.

In [None]:
# TODO: Load the CIFAR10 dataset

trainloader = ...
testloader = ...

print(f'The trainloader consists of {len(trainloader.dataset)} samples.')
print(f'The testloader consists of {len(testloader.dataset)} samples.')

# Pre-Trained ResNet18 (5 points)

Load the pre-trained resnet18 architecture from torchvision.

In [None]:
class ResNet18(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.feature_extractor = nn.Sequential(
            *list(resnet18(weights=ResNet18_Weights.DEFAULT).children())[:-2]
            )
        self.fc = nn.Linear(512, 10)

    def get_features(self, x):
        features = self.feature_extractor(x)
        return torch.flatten(features, start_dim=1)

    def forward(self, x):
        logits = self.fc(self.get_features(x))
        return logits

In [None]:
# TODO: Load the pre-trained model
clean_model = ...

# Fine-tuning ResNet18 (5 points)

Train and fine-tune the model on CIFAR10 data.

In [None]:
def train_model(model, loader, optimizer, criterion, n_epochs=30):
    # TODO: Train the model on the loader for n_epochs
    pass

In [None]:
# TODO: Define the optimizer, criterion and train the model

criterion = ...

optimizer = ...

train_model(clean_model, trainloader, optimizer, criterion)

# TODO: Save the model weights for future use


# Testing (5 points)

Report the clean accuracy of the model you trained.

In [None]:
def test_model(model, loader):
    # TODO: Return the accuracy on the loader
    pass

In [None]:
acc = test_model(clean_model, testloader)
print(f'Clean accuracy on the clean model is {acc:.2f}%')

# Poisoning Example Generation (20 points)

We want the implement Algorithm 1 from the [Poison Frogs!](https://arxiv.org/abs/1804.00792) paper. The procedure is as follows:


1.   Initialize x: $x_0 \leftarrow b$
2.   Define: $L_p = \| f(x) - f(t) \|^2$
3.   For $i=1$ to `max_iters` do:
  
  3.1. Forward step: $\hat{x}_i = x_{i-1} - \lambda \nabla L_p(x_{i-1})$

  3.2. Backward step: $x_i = \frac{\hat{x}_i + \lambda \beta b}{1 + \beta \lambda}$



In [None]:
def poisoning_example_generation(model, t, b, lr, beta=0.25, max_iters=1000):
    # TODO: Implement the given algorithm
    pass

Visualize the following sample (*don't change the index*). This is the sample we are going to use as our base.

In [None]:
base = testloader.dataset[int(std_id) % 846][0]
# TODO: Visualize the base image and print its class


What class does the model think this base image belongs to?
Print the logits and the predicted class.

If the base image is misclassified increment the index until you find an image which is correctly classified (*Now change the index if necessary!*).

In [None]:
# TODO: Print the output of the model on the base image


Now choose another image as your target. Visualize this target image and prints its correct label and the models prediction.

In [None]:
# TODO: Select, visualize and show the prediction for the target instance


Now use the `poisoning_example_generation` to generate a poison instane using your base and target images. Visualize this poison instance and print the models prediction on it.

In [None]:
# TODO: Generate poison instance, visualize it and predict its label

poison = ...


# Poisoned Dataloader (5 points)

Add the poisoned instance to the trainloader and call the resulting loader `poisoned_loader`.

In [None]:
# TODO: Add poison instance to create the poisoned trainloader

poisoned_loader = ...

print(f'The trainloader consists of {len(trainloader.dataset)} samples.')
print(f'The poisoned trainloader consists of {len(poisoned_loader.dataset)} samples.')

# Poisoned Training (5 points)

 Make a copy of your clean model and call it `attacked_model`. Fine-tune the last layer of the `attacked_model` on the `poisoned_loader`.

In [None]:
# TODO: Fine-tune the whole model using poisoned trainloader

attacked_model = ...



# TODO: Save the model weights for future use


Report the clean accuracy of the `attacked_model` on the testloader.

In [None]:
acc = test_model(attacked_model, testloader)
print(f'Clean accuracy on the attacked model is {acc:.2f}%')

Now report the models prediction on the base, target and poison instances.

In [None]:
# TODO: Predict the label of the base, target, and poison


Was the attack successful? Why? What can we do to improve the attack success rate?


`your response:`

# Feature Space Visualization (20 points)

Using `t-SNE` visualize the feature space of the `clean_model` and the `attacked_model` with data from the base and target classes (use different colors for these classes). Visualize the poison instance as well and label it differently (i.e. you can use stars to show poison samples).

***Note: To avoid redundancy, implement this function is such a way that is supports multiple poison samples as opposed to just one!***

In [None]:
def feature_space_visualizaion(model, loader, poison, base_class, target_class):
    # TODO: Visualize the feature space using t-SNE

In [None]:
# TODO: Visualize the clean model


In [None]:
# TODO: Visualize the attacked model


What do you see? What did you expect? Why?

`your response:`

# Watermark Poisoning (10 points)

A base watermarked image with target opacity $\gamma$ is formed by taking a weighted combination of the base and the target images.
$$t: b \leftarrow \gamma \cdot t + (1 - \gamma ) \cdot b$$

We use this method to boost the power of poison attacks.

In [None]:
def poisoning_watermark_generation(t, b, gamma=0.3):
    # TODO: Perform watermarking
    pass

Generate 100 poisons by adding a low opacity watermark of the target instance to the base and visualize the results. (*samples must be from the same class*)

In [None]:
# TODO: Generate 100 poison samples and visualize the results


# Watermark Dataloader (5 points)

Add the watermark instances to the trainloader and call the resulting loader `watermark_loader`.

In [None]:
# TODO: Add watermark instances to create the watermark trainloader

watermark_loader = ...

print(f'The trainloader consists of {len(trainloader.dataset)} samples.')
print(f'The poisoned trainloader consists of {len(poisoned_loader.dataset)} samples.')
print(f'The watermark loader consists of {len(watermark_loader.dataset)} samples.')

# Watermark Training (5 points)

Repeat the training steps and report the clean accuracy on the newly trained model. Use another copy of the `clean_model` and train the new network on the `watermark_loader` and report the clean accuracy on this model.

In [None]:
# TODO: Repeat training steps for the new model


# Watermarking Results (10 points)

Now check if the poisoning attack is successful.

In [None]:
# TODO: Evaluate the attack


Using the `feature_space_visualizaion` function you wrote earlier (modify it if necessary) visualize the feature space for this model as well.

In [None]:
# TODO: Visualize the attacked model


Summarize your findings.

`your response:`