# About this assignment
In this assignment, you will implement two adversarial attacks against ResNet18 (FGSM and PGD), as well as two defenses against adversarial attacks (adversarial training and SAP). There are three goals for this assignment:
1. Learning about and evaluate base adversarial attacks and defenses in a simple setting.
2. Learning to use Pytorch's Lightning framework to simplify and modularize your code.
3. Learning to use Pytorch to adjust/manipulate the *architecture* of a pretrained model.

# Imports

If you're running this notebook in Colab, you'll want to uncomment and run the following line.

If you're running this notebook locally or on a Grace cluster, you can separately install any packages you use. 

Note: for this assignment, if your local machine is not GPU-compatible, you will probably want to use Colab or a Grace cluster.

In [1]:
!pip install lightning



In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import lightning as L
from torchmetrics import Accuracy
from lightning.pytorch.callbacks import ModelCheckpoint, LearningRateMonitor, EarlyStopping

# Config
Just run the next code block, but double check the one after that.

In [3]:
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
print(device)


CLASSES = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
NUM_CLASSES = len(CLASSES)

cpu


In [4]:
# If you run into memory issues, you can reduce the batch size
BATCH_SIZE = 128

# Change these to the relative paths you'd like to use
# for the CIFAR-10 data and model checkpoints
DATA_PATH = 'data/'
CHECKPOINT_PATH = 'models/checkpoints/'

# The different models we'll be fine-tuning
SAVE_NAMES = [
    'baseline',
    'adv_train',    # Adversarial training a la Madry et al.
    'SAP_conv', # Full SAP post-convolution a la Dhillon et al.
]
SAVE_NAMES = {
    name: os.path.join(CHECKPOINT_PATH, name) for name in SAVE_NAMES
}

# Results dictionary
We set up for storing experiment results here. Just run the following block.

In [5]:
models = {name: None for name in SAVE_NAMES.keys()}
attacks = {
    'id': None,
    'fgsm': None,
    'pgd': None,
}

results_dic = {
    'model': [],
    'attack': [],
    'top_k': [],
    'accuracy': [],
}
results_trainer = L.Trainer(accelerator='auto', devices=1)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


# Data processing
You can just run these three blocks of code. They import the CIFAR10 data from Torchvision and split them into train/validation/test sets.

We also takes a sample for later visualization purposes.

In [6]:
# Pretrained normalization based on https://discuss.pytorch.org/t/how-to-preprocess-input-for-pre-trained-networks/683
means, stds = [0.49139968, 0.48215827, 0.44653124], [0.24703233, 0.24348505, 0.26158768]
means, stds = np.array(means), np.array(stds)

In [7]:
import torchvision.transforms.v2 as transforms

def get_cifar_loaders(batch_size):
    # Transformations applied to images before passing them to the model
    transform = transforms.Compose(
        [
            # transforms.Resize(256),
            # transforms.CenterCrop(224),
            transforms.ToImage(), # Converts to tensor
            transforms.ToDtype(torch.float32, scale=True),
            transforms.Normalize(mean=means, std=stds)
        ])

    trainset = torchvision.datasets.CIFAR10(root=DATA_PATH, train=True,
                                            download=True, transform=transform)
    # The train set is of size 50000
    trainset, valset = torch.utils.data.random_split(trainset, [40000, 10000])
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                            shuffle=True, num_workers=2)
    valloader = torch.utils.data.DataLoader(valset, batch_size=batch_size,
                                            shuffle=False, num_workers=2)

    testset = torchvision.datasets.CIFAR10(root=DATA_PATH, train=False,
                                        download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                            shuffle=False, num_workers=2)
    
    return trainloader, valloader, testloader

In [8]:
trainloader, valloader, testloader = get_cifar_loaders(BATCH_SIZE)
sample_images, sample_labels = next(iter(trainloader))
sample_images, sample_labels = sample_images.to(device), sample_labels.to(device)

Files already downloaded and verified
Files already downloaded and verified


# Base Resnet Class
Here we've implemented a ResNet18 model in the Pytorch Lightning framework. Here is [Lightning's documentation](https://lightning.ai/docs/pytorch/stable/).

The main code to look at are ```__init__``` and ```training_step```. If you'd like to use Lightning on your own project, the other methods may be useful reference, but as always we defer to the documentation.

In [9]:
class LResnet(L.LightningModule):
    def __init__(self, adv_train_method = None): #EDITED
        super().__init__()
        # Set loss module
        self.loss_module = nn.CrossEntropyLoss()
        # Example input for visualizing the graph in Tensorboard
        # CIFAR-10 images are 32x32
        self.example_input_array = torch.zeros((1, 3, 32, 32), dtype=torch.float32)
        self.num_target_classes = 10
        # Accuracy metric for training logs and testing evaluation
        self.accuracy = Accuracy(task="multiclass", num_classes=self.num_target_classes, top_k=1)
        # Adversarial generation method for training
        self.adv_train_method = adv_train_method # EDITED

        # Load pretrained model weights
        self.model = torchvision.models.resnet18(
            weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1
        )
        # Change final layer from 1000 (ImageNet) classes to 10 (CIFAR-10) classes
        self.model.fc = nn.Linear(self.model.fc.in_features, self.num_target_classes)

    def forward(self, imgs):
        return self.model(imgs)

    def configure_optimizers(self):
        optimizer = optim.AdamW(self.parameters(), lr=1e-5, weight_decay=0.1)
        return [optimizer] # Lightning has enables multi-optimizer training, e.g. for GANs
    
    def training_step(self, batch, batch_idx):
        imgs, labels = batch
        if self.adv_train_method is not None:
            opt = self.optimizers()
            opt.zero_grad()
            # Change the images to adversarial examples
            imgs = self.adv_train_method(self.model, imgs, labels)
            # adv_train_method sets the model to eval
            self.model.train()
            # Reset accumulated gradients from adversarial generation
            opt.zero_grad()
        # Once we have the correct training images,
        # we can use the usual Lightning forward pass
        outputs = self.model(imgs)
        loss = self.loss_module(outputs, labels)
        acc = self.accuracy(outputs, labels)
        # Log accuracy and loss per-batch for Tensorboard
        self.log('train_acc', acc, on_step=False, on_epoch=True)
        self.log('train_loss', loss, prog_bar=True)
        return loss
    
    def validation_step(self, batch, batch_idx):
        imgs, labels = batch
        outputs = self.model(imgs)
        loss = self.loss_module(outputs, labels)
        self.log('val_loss', loss)
        # No need to return to call backward() on the loss
    
    def test_step(self, batch, batch_idx):
        imgs, labels = batch
        outputs = self.model(imgs)
        acc = self.accuracy(outputs, labels)
        self.log("test_acc", acc, prog_bar=True)
        # No need to return to call backward() on the loss

## Example training code
Run the following code block. It is an example of how to code a training loop with Lightning. If you change hyperparameters for your experiments later, you will need to comment at the end on the changes you've made.

In [10]:
save_key = 'baseline'
baseline_model = LResnet()
baseline_trainer = L.Trainer(
    default_root_dir = SAVE_NAMES[save_key], # Where to save the model
    accelerator='auto',
    devices=1,
    max_epochs=30,
    callbacks=[
        ModelCheckpoint( # Save the best model by validation loss
            dirpath=SAVE_NAMES[save_key],
            monitor='val_loss',
            save_top_k=1,
            mode='min',
            save_weights_only=True,
            every_n_epochs=1,
        ),
        EarlyStopping( # Stop training early if val_loss doesn't improve
            monitor='val_loss', 
            patience=3, 
            verbose=True, 
            mode='min',
        ),
        LearningRateMonitor('epoch') # Log learning rate each epoch
    ],
)

# These two lines are optional, but they make the Tensorboard logs look nicer
baseline_trainer.logger._log_graph = True  # If True, we plot the computation graph in tensorboard
baseline_trainer.logger._default_hp_metric = None  # Optional logging argument that we don't need

# This is all you need to train the model
baseline_trainer.fit(baseline_model, trainloader, valloader)
# Load best checkpoint after training
baseline_model = LResnet.load_from_checkpoint(
    baseline_trainer.checkpoint_callback.best_model_path
).to(device)

# Store the model in the dictionary
models[save_key] = baseline_model

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/aj/anaconda3/lib/python3.11/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:653: Checkpoint directory /Users/aj/Downloads/models/checkpoints/baseline exists and is not empty.

  | Name        | Type               | Params | In sizes       | Out sizes
--------------------------------------------------------------------------------
0 | loss_module | CrossEntropyLoss   | 0      | ?              | ?        
1 | accuracy    | MulticlassAccuracy | 0      | ?              | ?        
2 | model       | ResNet             | 11.2 M | [1, 3, 32, 32] | [1, 10]  
--------------------------------------------------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.727    Total estimated model params size (MB)


Sanity Checking: |                                        | 0/? [00:00<?, ?it/s]

/Users/aj/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:436: Consider setting `persistent_workers=True` in 'val_dataloader' to speed up the dataloader worker initialization.
/Users/aj/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:436: Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization.


Training: |                                               | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 1.524


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.329 >= min_delta = 0.0. New best score: 1.195


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.159 >= min_delta = 0.0. New best score: 1.036


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.090 >= min_delta = 0.0. New best score: 0.946


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.061 >= min_delta = 0.0. New best score: 0.885


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.043 >= min_delta = 0.0. New best score: 0.843


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.026 >= min_delta = 0.0. New best score: 0.817


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.020 >= min_delta = 0.0. New best score: 0.796


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.011 >= min_delta = 0.0. New best score: 0.785


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.008 >= min_delta = 0.0. New best score: 0.776


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.004 >= min_delta = 0.0. New best score: 0.773


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Monitored metric val_loss did not improve in the last 3 records. Best score: 0.773. Signaling Trainer to stop.


# Adversarial attacks
Implement the FGSM and PGD attacks. These are white-box evasion attacks, and they were covered in class. Make sure that the final outputs are detached!

Once you finish this part and the previous one, you can head to the Experiments section to test your attacks on the baseline (pretrained) model.

In [11]:
# Used as a baseline
def id(model, imgs, labels):
    return imgs.detach()

def fgsm(model, imgs, labels, device='cpu'):
    r"""
    Args:
        model (nn.Module): Model to attack, e.g. self.model in the LResnet definition.
        imgs (Tensor): Tensor of images. Size (BATCH_SIZE, C, H, W). Normalized according to means, stds.
        labels (Tensor): Tensor of labels. Size (BATCH_SIZE,). Each element is an integer in [0, NUM_CLASSES).
    Returns:
        adv_imgs (Tensor): Adversarial images. Same dimensions and normalization as imgs. Detached.
            Each adversarial image in the batch is L_infinity distance at most eps away from the original image.
            Images generated by the Fast Gradient Sign Method (FGSM).
    """
    eps = 8/255 # Maximum perturbation
    model.eval()
    # YOUR CODE HERE
    model.to(device)
    imgs = imgs.to(device)
    labels = labels.to(device)   
    
    imgs.requires_grad = True
    outputs = model(imgs)
    loss = nn.CrossEntropyLoss()(outputs, labels)
    model.zero_grad()
    loss.backward()
    imgs.requires_grad = True
    adv_imgs = imgs + eps * imgs.grad.sign()
    adv_imgs = torch.clamp(adv_imgs, 0, 1).detach()  # Ensure pixel values are valid and detach from the graph
    return adv_imgs

def pgd(model, imgs, labels):
    r"""
    Args:
        model (nn.Module): Model to attack, e.g. self.model in the LResnet definition.
        imgs (Tensor): Tensor of images. Size (BATCH_SIZE, C, H, W). Normalized according to means, stds.
        labels (Tensor): Tensor of labels. Size (BATCH_SIZE,). Each element is an integer in [0, NUM_CLASSES).
    Returns:
        adv_imgs (Tensor): Adversarial images. Same dimensions and normalization as imgs. Detached.
            Each adversarial image in the batch is L_infinity distance at most eps away from the original image.
            Images generated by the Projected Gradient Descent (PGD)
    """
    iters = 20 # Number of steps in PGD
    eps = 8/255 # Maximum perturbation
    alpha = 2/255 # Step size
    adv_imgs = imgs.clone().detach()  # Start with the original images
    adv_imgs = adv_imgs + torch.randn_like(adv_imgs) * eps  # Add initial random perturbation
    adv_imgs = torch.clamp(adv_imgs, 0, 1)  # Ensure still in image range

    for _ in range(iters):
        adv_imgs.requires_grad = True
        outputs = model(adv_imgs)
        model.zero_grad()
        loss = nn.CrossEntropyLoss()(outputs, labels)
        loss.backward()
        with torch.no_grad():
            # Apply perturbation
            adv_imgs = adv_imgs + alpha * adv_imgs.grad.sign()
            # Project back into the epsilon-ball around original image
            delta = torch.clamp(adv_imgs - imgs, min=-eps, max=eps)
            adv_imgs = torch.clamp(imgs + delta, min=0, max=1)
    
    return adv_imgs.detach()
    

attacks['id'] = id
attacks['fgsm'] = fgsm
attacks['pgd'] = pgd

# Adversarial Defenses

## Adversarial Training
Implement the training loop for an adversarially trained model using PGD as the adversarial example generation method.

Your code should look very similar to the baseline example above. Be sure to save your model in the right place and to store your model in the ```models``` dictionary. You can adjust ```max_epochs``` (although early stopping should handle the cases you'd want to) or any other hyperparameters if you'd like. You will need to comment at the end on any changes you've made.

In [12]:
save_key = 'adv_train'
adv_train_model = LResnet(adv_train_method=pgd)

adv_trainer = L.Trainer(
    default_root_dir=SAVE_NAMES[save_key],
    accelerator='auto',
    devices=1,
    max_epochs=30,
    callbacks=[
        ModelCheckpoint(
            dirpath=SAVE_NAMES[save_key],
            monitor='val_loss',
            save_top_k=1,
            mode='min',
            save_weights_only=True,
            every_n_epochs=1,
        ),
        EarlyStopping(
            monitor='val_loss',
            patience=3,
            verbose=True,
            mode='min',
        ),
        LearningRateMonitor('epoch')
    ],
)

adv_trainer.fit(adv_train_model, trainloader, valloader)

# Load best checkpoint after training
adv_train_model = LResnet.load_from_checkpoint(
    adv_trainer.checkpoint_callback.best_model_path
).to(device)

# Store the model in the dictionary
models[save_key] = adv_train_model


GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/aj/anaconda3/lib/python3.11/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:653: Checkpoint directory /Users/aj/Downloads/models/checkpoints/adv_train exists and is not empty.

  | Name        | Type               | Params | In sizes       | Out sizes
--------------------------------------------------------------------------------
0 | loss_module | CrossEntropyLoss   | 0      | ?              | ?        
1 | accuracy    | MulticlassAccuracy | 0      | ?              | ?        
2 | model       | ResNet             | 11.2 M | [1, 3, 32, 32] | [1, 10]  
--------------------------------------------------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.727    Total estimated model params size (MB)


Sanity Checking: |                                        | 0/? [00:00<?, ?it/s]

Training: |                                               | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 7.589


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 2.570 >= min_delta = 0.0. New best score: 5.019


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 1.023 >= min_delta = 0.0. New best score: 3.996


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.863 >= min_delta = 0.0. New best score: 3.132


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.172 >= min_delta = 0.0. New best score: 2.960


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.028 >= min_delta = 0.0. New best score: 2.931


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Monitored metric val_loss did not improve in the last 3 records. Best score: 2.931. Signaling Trainer to stop.


## SAP
### Function implementation
Implement a function that applies [Stochastic Activation Pruning](https://arxiv.org/pdf/1803.01442.pdf) (SAP) to a Tensor. Also read the description from the [Obfuscated Gradients](https://arxiv.org/pdf/1802.00420.pdf) paper (SAP is described in Section 5.3.1).

Roughly, the algorithm keeps each activation from the previous layer (or, generally, Module) with probability proportional to its absolute value, making this choice independently for each activation, and rescales the kept activations so that the average total activation is not changed.

That is:
1. Let the activation being passed in (from a single image, i.e. assuming batch size 1) be $act$.
2. Let $p$ be the same shape as the feature $act$, with values proportional to $|act|$ (absolute value applied element-wise) and sum 1.
3. Let $N$ be the number of entries in the feature. Draw $N$ times *with replacement* from the entries with probability mass function $p$. Set the selected entries to 1 and the remaining entries to 0 in a Tensor $m$ of the same shape as $p$ (and therefore $act$).
4. Apply the mask to get $act \circ m$ (element-wise multiplication). Divide each entry by the probability of keeping that entry (i.e. having corresponding 1 in $m$). Return the result.

Now, the above method runs very slowly. Here's another approach that the authors of Obfuscated Gradients actually use instead:  
- Essentially, if we leave each entry with the same probability of being selected as in the original SAP method, but assume we choose whether or not to keep each entry independently (instead of drawing with replacement from all the entries many times), we get a much faster filter. Specifically, once we get $p$ and $N$, the probability of keeping entry $j$ is $q:=1-e^{-Np_j}$. Consider it an exercise to prove that this is the case :)
- For the reason from the "Bonus" part at the end of this assignment, the authors of Obfuscated Gradients use probability $1-e^{-2Np_j}.$ Do this as well.
- Normalization is easier because $q$ is records precisely the probability of keeping each entry.
- The time-save is mostly in vectorization.

You may use either approach, although the latter is *much* faster.

Read the above papers for more details. You may also find [Erratum](https://arxiv.org/abs/2010.00071) interesting.

In [13]:
def sap(act):
    r"""
    Args:
        act (Tensor): Tensor of activations of shape (K, C, H, W), where K is the batch size.
        The values of C, H, W depend on the layer.
    Returns:
        Tensor of the same shape as act, masked and rescaled according to the SAP method.
    """
    # YOUR CODE HERE
    N = act.numel() / act.shape[0]  # Total number of entries per example in the batch
    abs_act = act.abs()
    
    # Compute probabilities proportional to the absolute value of activations, normalized
    p = abs_act / abs_act.view(act.shape[0], -1).sum(dim=1, keepdim=True).view(act.shape[0], 1, 1, 1)
    
    # Compute the probability of keeping each entry
    q = 1 - torch.exp(-2 * N * p)
    
    # Generate the mask: draw random values and compare to q
    random_vals = torch.rand_like(act)
    mask = (random_vals < q).float()
    
    # Apply the mask and normalize
    pruned_act = act * mask / q.clamp(min=1e-5)  # Clamp q to avoid division by zero
    
    return pruned_act

### Adjusted Model
The change you need to make to apply the defense to a ResNet model is simple: simply replace each Conv2d module with a very similar module that applies SAP immediately after convolution. Run the next block and complete the one after that.

In [14]:
class SAP_Conv2d(nn.Conv2d):
    def __init__(
            self,
            in_channels,
            out_channels,
            kernel_size,
            stride=1,
            padding=0,
            groups=1,
            bias=True,
            dilation=1,
    ):
        super().__init__(in_channels, out_channels, kernel_size, stride,
                         padding, dilation, groups, bias)
        
    # This is the important part
    def _conv_forward(self, input, weight, bias):
        act = super()._conv_forward(input, weight, bias)
        masked_act = sap(act)
        return masked_act

In [15]:
# Transforms LResnet to use SAP_Conv2d instead of nn.Conv2d
def to_sap_conv(model):
    r"""
    Args:
        model (LResnet): Model to modify.
    Returns:
        None. The model is modified in place.
        EVERY nn.Conv2d layer is replaced with SAP_Conv2d.
    """
    for name, module in model.named_children():
        if isinstance(module, nn.Conv2d):
            # Create a new SAP_Conv2d layer with the same parameters as the existing Conv2d layer
            sap_conv = SAP_Conv2d(
                in_channels=module.in_channels,
                out_channels=module.out_channels,
                kernel_size=module.kernel_size,
                stride=module.stride,
                padding=module.padding,
                dilation=module.dilation,
                groups=module.groups,
                bias=(module.bias is not None)
            )
            # Replace the Conv2d layer with the SAP_Conv2d layer
            setattr(model, name, sap_conv)
        else:
            # Recursively apply the same procedure to child modules
            to_sap_conv(module)

In [31]:
def to_sap_conv(model):
    for name, module in model.named_children():
        if isinstance(module, nn.Conv2d):
            sap_conv = SAP_Conv2d(
                in_channels=module.in_channels,
                out_channels=module.out_channels,
                kernel_size=module.kernel_size,
                stride=module.stride,
                padding=module.padding,
                dilation=module.dilation,
                groups=module.groups,
                bias=(module.bias is not None)
            )
            
            sap_conv.weight.data = module.weight.data.clone()
            if module.bias is not None:
                sap_conv.bias.data = module.bias.data.clone()
            
            model._modules[name] = sap_conv
        else:
            to_sap_conv(module)



In [32]:
sap_conv_model = LResnet()
to_sap_conv(sap_conv_model) 

for module in sap_conv_model.modules():
    if isinstance(module, nn.Conv2d):
        print("Found a nn.Conv2d layer that was not replaced.")
        break
else:
    print("All nn.Conv2d layers have been replaced.")


Found a nn.Conv2d layer that was not replaced.


In [33]:
sap_conv_model = LResnet()
to_sap_conv(sap_conv_model)  
for module in sap_conv_model.modules():
    print(module)

LResnet(
  (loss_module): CrossEntropyLoss()
  (accuracy): MulticlassAccuracy()
  (model): ResNet(
    (conv1): SAP_Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): SAP_Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): SAP_Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): SAP_Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=

### Training
Train an LResnet defended by SAP.

Your code should look very similar to the baseline example above. Be sure to save your model in the right place and to store your model in the ```models``` dictionary. You can adjust ```max_epochs``` (although early stopping should handle the cases you'd want to) or any other hyperparameters if you'd like. You will need to comment at the end on any changes you've made.

In [67]:
sap_conv_model = LResnet()
to_sap_conv(sap_conv_model)  


save_key = 'SAP_conv'
sap_trainer = L.Trainer(
    default_root_dir=SAVE_NAMES[save_key],  # Use the SAP_conv save key
    accelerator='auto',
    devices=1,
    max_epochs=50,  # Adjust as necessary
    callbacks=[
        ModelCheckpoint(
            dirpath=SAVE_NAMES[save_key],
            monitor='val_loss',
            save_top_k=1,
            mode='min',
            save_weights_only=True,
            every_n_epochs=1,
        ),
        EarlyStopping(
            monitor='val_loss',
            patience=3,
            verbose=True,
            mode='min',
        ),
        LearningRateMonitor('epoch'),
    ],
)

# Fit the model using the train and validation data loaders
sap_trainer.fit(sap_conv_model, trainloader, valloader)

best_sap_conv_model = LResnet.load_from_checkpoint(
    sap_trainer.checkpoint_callback.best_model_path
)

best_sap_conv_model = best_sap_conv_model.to(device)



# Store the model in the dictionary
models[save_key] = best_sap_conv_model

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/aj/anaconda3/lib/python3.11/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:653: Checkpoint directory /Users/aj/Downloads/models/checkpoints/SAP_conv exists and is not empty.

  | Name        | Type               | Params | In sizes       | Out sizes
--------------------------------------------------------------------------------
0 | loss_module | CrossEntropyLoss   | 0      | ?              | ?        
1 | accuracy    | MulticlassAccuracy | 0      | ?              | ?        
2 | model       | ResNet             | 11.2 M | [1, 3, 32, 32] | [1, 10]  
--------------------------------------------------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.727    Total estimated model params size (MB)


Sanity Checking: |                                        | 0/? [00:00<?, ?it/s]

/Users/aj/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:436: Consider setting `persistent_workers=True` in 'val_dataloader' to speed up the dataloader worker initialization.
/Users/aj/anaconda3/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:436: Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization.


Training: |                                               | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 1.980


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.273 >= min_delta = 0.0. New best score: 1.707


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.151 >= min_delta = 0.0. New best score: 1.556


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.113 >= min_delta = 0.0. New best score: 1.443


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.076 >= min_delta = 0.0. New best score: 1.367


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.074 >= min_delta = 0.0. New best score: 1.293


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.047 >= min_delta = 0.0. New best score: 1.246


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.043 >= min_delta = 0.0. New best score: 1.203


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.040 >= min_delta = 0.0. New best score: 1.164


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.049 >= min_delta = 0.0. New best score: 1.115


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.023 >= min_delta = 0.0. New best score: 1.092


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.030 >= min_delta = 0.0. New best score: 1.062


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.035 >= min_delta = 0.0. New best score: 1.028


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.017 >= min_delta = 0.0. New best score: 1.010


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.025 >= min_delta = 0.0. New best score: 0.986


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.023 >= min_delta = 0.0. New best score: 0.963


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.008 >= min_delta = 0.0. New best score: 0.955


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.036 >= min_delta = 0.0. New best score: 0.919


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.014 >= min_delta = 0.0. New best score: 0.905


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.023 >= min_delta = 0.0. New best score: 0.883


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.002 >= min_delta = 0.0. New best score: 0.881


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.013 >= min_delta = 0.0. New best score: 0.868


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.020 >= min_delta = 0.0. New best score: 0.848


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.007 >= min_delta = 0.0. New best score: 0.841


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.002 >= min_delta = 0.0. New best score: 0.839


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.010 >= min_delta = 0.0. New best score: 0.829


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.013 >= min_delta = 0.0. New best score: 0.816


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.014 >= min_delta = 0.0. New best score: 0.802


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0.0. New best score: 0.800


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.004 >= min_delta = 0.0. New best score: 0.796


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.007 >= min_delta = 0.0. New best score: 0.789


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.789


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.002 >= min_delta = 0.0. New best score: 0.788


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.016 >= min_delta = 0.0. New best score: 0.772


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.004 >= min_delta = 0.0. New best score: 0.768


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.009 >= min_delta = 0.0. New best score: 0.759


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.011 >= min_delta = 0.0. New best score: 0.748


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.003 >= min_delta = 0.0. New best score: 0.746


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.745


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.004 >= min_delta = 0.0. New best score: 0.741


Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.007 >= min_delta = 0.0. New best score: 0.734


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.009 >= min_delta = 0.0. New best score: 0.725


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation: |                                             | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.007 >= min_delta = 0.0. New best score: 0.718


Validation: |                                             | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=50` reached.


# Evaluation

## Methods
These two functions help us modularize the experiments we run. Complete ```eval_attack``` to compute the accuracy of each model (baseline, adversarially trained, SAP) on images. We take every batch in ```loader```, apply ```attack_method``` to the batch, and check the accuracy of ```model``` in predicting the class of each adversarial image. Output a Float between 0 and 1.

```top_k``` describes how we determine accuracy. For example, ```top_k=2``` means if the model predicts the correct class within its two highest-scoring classes, it's counted as correct.

Complete the next code block and just run the one after that.

In [69]:
def eval_attack(model, attack_method, loader, top_k, max_batches=0):
    r"""
    Args:
        model (LResnet): Model to attack.
        attack_method (function): Adversarial generation method. One of id, fgsm, pgd.
        loader (DataLoader): Data loader for the dataset to evaluate on.
        top_k (int): The number of top predictions to check for correctness.
        max_batches (int): Maximum number of batches to evaluate. If 0, evaluate on the entire dataloader.
    Returns:
        float: Accuracy of the model on the (adversarially perturbed) dataset.
    """
    # YOUR CODE HERE
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0

    for batch_idx, (images, labels) in enumerate(loader):
        if max_batches and batch_idx >= max_batches:
            break  # Stop evaluation if max_batches is reached

        images, labels = images.to(device), labels.to(device)
        images.requires_grad = True
        adv_images = attack_method(model, images, labels)  # Generate adversarial examples

        outputs = model(adv_images)  # Get model predictions for adversarial images
        _, pred = outputs.topk(top_k, 1, True, True)
        pred = pred.t()
        correct += pred.eq(labels.view(1, -1).expand_as(pred)).sum().item()

        total += labels.size(0)
    
    accuracy = correct / total
    return accuracy

In [70]:
def run_experiment(model, attack, top_k=2, max_batches=0):
    # If we're re-running an experiment, remove the old results
    for i in range(len(results_dic['model'])):
        if results_dic['model'][i] == model and results_dic['attack'][i] == attack and results_dic['top_k'][i] == top_k:
            results_dic['model'].pop(i)
            results_dic['attack'].pop(i)
            results_dic['top_k'].pop(i)
            results_dic['accuracy'].pop(i)
            break
    # Run the experiment
    acc = eval_attack(
        models[model], 
        attacks[attack], 
        testloader, 
        top_k=top_k, 
        max_batches=max_batches
    )
    # Store the results
    results_dic['model'].append(model)
    results_dic['attack'].append(attack)
    results_dic['top_k'].append(top_k)
    results_dic['accuracy'].append(acc)

## Experiments

### Baseline
The following code runs experiments with all three attacks (including the baseline identity) on the baseline model. Feel free to adjust the parameters or code how you'd like. You will need to comment later on any adjustments you've made.

In [71]:
torch.set_grad_enabled(True)
torch.autograd.set_detect_anomaly(True)

<torch.autograd.anomaly_mode.set_detect_anomaly at 0x2d6876710>

In [72]:
for attack_method in ['id', 'fgsm', 'pgd']:
    print(f"Running experiment baseline with attack {attack_method}...")
    mb = 0
    # I've found 100 batches about matches the time of the other attacks' experiments
    if attack_method == 'pgd':
        mb = 100
    run_experiment('baseline', attack_method, max_batches=mb)

Running experiment baseline with attack id...
Running experiment baseline with attack fgsm...
Running experiment baseline with attack pgd...


### Adversarially trained
Run the same experiments on the adversarially trained model. You should be able to use very similar code.

In [73]:
# YOUR CODE HERE
for attack_method in ['id','pgd', 'fgsm']:
    print(f"Running experiment baseline with attack {attack_method}...")
    mb = 0
    # I've found 100 batches about matches the time of the oth=er attacks' experiments
    if attack_method == 'pgd':
        mb = 100
    run_experiment('adv_train', attack_method, max_batches=mb)

Running experiment baseline with attack id...
Running experiment baseline with attack pgd...
Running experiment baseline with attack fgsm...


### SAP
Run the same experiments on the model defended by SAP. You should be able to use very similar code.

In [74]:
# YOUR CODE HERE
for attack_method in ['id', 'fgsm', 'pgd']:
    print(f"Running experiment baseline with attack {attack_method}...")
    mb = 0
    # I've found 100 batches about matches the time of the other attacks' experiments
    if attack_method == 'pgd':
        mb = 100
    run_experiment('SAP_conv', attack_method, max_batches=mb)

Running experiment baseline with attack id...
Running experiment baseline with attack fgsm...
Running experiment baseline with attack pgd...


### Display results
We've already stored the results in a dictionary. Let's put them in a Pandas DataFrame to make them nicer to look at. Export your results to a CSV to save them.

It might take some manual work, but if you run any training loop more than once you should probably keep track, e.g. in a spreadsheet or in file names, of which one is which. In particular, always ensure you will know which model is the most recently trained: even better, ensure you'll still know in a month or more.

In [None]:
df_results = pd.DataFrame(results_dic)
df_results

In [None]:
# YOUR CODE HERE
df_results.to_csv('model_evaluation_results.csv', index=False)

### Tensorboard
Use the cell below to open a Tensorboard session, and check out the train accuracy/loss and validation loss over the training period. Take screenshots or export images of some salient graphs. Briefly describe what you notice. See [the documentation](https://colab.research.google.com/github/tensorflow/tensorboard/blob/master/docs/tensorboard_in_notebooks.ipynb) to find how to use Tensorboard with Google Colab.

In [48]:
%load_ext tensorboard

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


In [49]:
%reload_ext tensorboard

In [66]:
tensorboard --logdir=models/checkpoints/

(YOUR ANSWER HERE)

### Final question
Comment on your results and any adjustments you've made to the experiments.
1. What did you expect? What met or differed from your expectations?
2. How would you compare the attacks?
3. How would you compare the defenses  
    a. In raw performance?  
    b. In performance against adversarial examples?  
    c. In training time?  

(YOUR ANSWER HERE)

### Bonus
Technically, because SAP is stochastic, the authors average the outputs of 100 runs. Try implementing this. How does the model's regular performance change? How does its performance against adversarial attacks change?