# Generating adversarial examples

In [None]:
! pip install -U torch torchvision --quiet

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import numpy as np
import matplotlib.pyplot as plt

# Exercise 1: Generating adversarial examples on MNIST data [45 mins]

In this exercise, we will work with a computer vision model and generate adversarial examples. The generation procedure is very similar to the counterfactual generation code that developed last week.

First, follow the process below to load the model and generate some predictions.

(Adapted from [PyTorch](https://docs.pytorch.org/tutorials/beginner/fgsm_tutorial.html))

First, download a pretrained neural network model from [here](https://drive.google.com/file/d/1HJV2nUHJqclXQ8flKvcWmjZ-OU5DGatl/view?usp=drive_link). Simply open the link and press `ctrl+S`.

Next, let us put together the structure of the model.

In [None]:
# LeNet Model definition
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

In [None]:
# If you have a cuda GPU, feel free to set the device to "cuda"
device = "cpu"
print(f"Using {device} device")

# Initialize the network
model = Net().to(device)

model_path = "lenet_mnist_model.pth"  # This is the file we downloaded
# Load the pretrained model
model.load_state_dict(torch.load(model_path, map_location=device, weights_only=True))

# Set the model in evaluation mode. In this case this is for the Dropout layers
model.eval()

Now let us download the data

In [None]:
# MNIST Test dataset and dataloader declaration
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, download=True, transform=transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,)), # Normalize the images to 0 and 1
            ])),
        batch_size=1, shuffle=True)

Some helper functions to convert the images back to the original scale

In [None]:
# restores the tensors to their original scale
def denorm(batch, mean=[0.1307], std=[0.3081]):
    """
    Convert a batch of tensors to their original scale.

    Args:
        batch (torch.Tensor): Batch of normalized tensors.
        mean (torch.Tensor or list): Mean used for normalization.
        std (torch.Tensor or list): Standard deviation used for normalization.

    Returns:
        torch.Tensor: batch of tensors without normalization applied to them.
    """
    if isinstance(mean, list):
        mean = torch.tensor(mean).to(device)
    if isinstance(std, list):
        std = torch.tensor(std).to(device)

    return batch * std.view(1, -1, 1, 1) + mean.view(1, -1, 1, 1)

Let us compute the model accuracy

In [None]:
import tqdm 
labels = []
preds = []
for inp in tqdm.tqdm(test_loader):
    x, y = inp
    y_hat = model(x).argmax(-1).item()
    labels.append(y.item())
    preds.append(y_hat)

In [None]:
labels = np.array(labels)
preds = np.array(preds)
print(f"Accuracy: {(labels==preds).mean()}")

Now let us plot the last image

In [None]:
plt.figure()
plt.imshow(denorm(x).squeeze(0,1), cmap="gray")
plt.xticks([], [])
plt.yticks([], [])
plt.show()


## Your task

Write a function that given an input example and a target label, generates an adversarial example using the FGSM method.

Plot the original and the adversarial method.

In [None]:
# Your code here

# Exercise 2: Counterfactual examples with sparsity [25 mins]

Take the Census Income prediction task from last lecture. Generate counterfactuals with a L1 distance metric as a part of the objective function.

Try different strenghts of the L1 norm. What differences do you observe w.r.t. the original counterfactual?

You can use the solution from the last week. Its already uploaded on Moodle.

In [None]:
# Your code here

# Exercise 3: Feature-aware distance function and post-processing counterfactuals [25 mins]

Now change the distance function so that for categorical features, you do not use the L1 distance, but the $\mathbb{I}[x \neq x_c]$.

Also, post process the counterfactuals. Change as many features as possible to original values while still preserving the counterfactual label.

In [None]:
# Your code here