<a href="https://colab.research.google.com/github/catebarry/adversarial-patches/blob/main/adversarial_patches.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AIPI 590 - XAI | Assignment 7: Adversarial Patches
### Catie Barry
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/catebarry/adversarial-patches/blob/main/adversarial_patches.ipynb)


## Description
In this notebook, we will create an adversarial patch. We will use the Torchvision ResNet34 model trained on a small version of ImageNet to test the patch.

Most of this code is replicated from the GitHub cited below. As a creative component, I disguise an adversarial patch for the German short-haired pointer label to look like a sticker for a cafe (with coffee cup logo).


**Note:** You will need access to GPU to run this code.

**Sources:**
- Code and setup in this notebook modified from a [tutorial](https://github.com/AIPI-590-XAI/Duke-AI-XAI/blob/main/adversarial-ai-example-notebooks/adversarial_attacks_patches.ipynb) originally created by Phillip Lippe and modified by Dr. Brinnae Bent.
- AI assistance from noted throughout comments and in statement at bottom of notebook.

In [None]:
# Connects to any needed files from GitHub and Google Drive
import os

# Remove Colab default sample_data
!rm -r ./sample_data

# Clone GitHub files to colab workspace
repo_name = "adversarial-patches"
git_path = 'https://github.com/catebarry/adversarial-patches.git'
!git clone "{git_path}"

# Install dependencies from requirements.txt file
#!pip install -r "{os.path.join(repo_name,'requirements.txt')}" #Add if using requirements.txt

# Change working directory to location of notebook
notebook_dir = ''
path_to_notebook = os.path.join(repo_name,notebook_dir)
%cd "{path_to_notebook}"
%ls

In [None]:
## Standard libraries
import os
import json
import math
import time
import numpy as np
import scipy.linalg

## Imports for plotting
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg', 'pdf') # For export
from matplotlib.colors import to_rgb
import matplotlib
matplotlib.rcParams['lines.linewidth'] = 2.0
import seaborn as sns
sns.set()

## Progress bar
from tqdm.notebook import tqdm

## PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as data
import torch.optim as optim
# Torchvision
import torchvision
from torchvision.datasets import CIFAR10
from torchvision import transforms
import torchvision.transforms.functional as TF
from PIL import Image
import random
import torchvision.transforms as T
import math # Import math for radians
# PyTorch Lightning
try:
    import pytorch_lightning as pl
except ModuleNotFoundError: # Google Colab does not have PyTorch Lightning installed by default. Hence, we do it here if necessary
    !pip install --quiet pytorch-lightning>=1.4
    import pytorch_lightning as pl
from pytorch_lightning.callbacks import LearningRateMonitor, ModelCheckpoint

# Path to the folder where the datasets are/should be downloaded (e.g. MNIST)
DATASET_PATH = "../data"
# Path to the folder where the pretrained models are saved
CHECKPOINT_PATH = "../saved_models/tutorial10"

# Setting the seed
pl.seed_everything(42)

# Ensure that all operations are deterministic on GPU (if used) for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# Fetching the device that will be used throughout this notebook
device = torch.device("cpu") if not torch.cuda.is_available() else torch.device("cuda:0")
print("Using device", device)

We have again a few download statements. This includes both a dataset, and a few pretrained patches we will use later.

In [None]:
import urllib.request
from urllib.error import HTTPError
import zipfile
# Github URL where the dataset is stored for this tutorial
base_url = "https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial10/"
# Files to download
pretrained_files = [(DATASET_PATH, "TinyImageNet.zip"), (CHECKPOINT_PATH, "patches.zip")]
# Create checkpoint path if it doesn't exist yet
os.makedirs(DATASET_PATH, exist_ok=True)
os.makedirs(CHECKPOINT_PATH, exist_ok=True)

# For each file, check whether it already exists. If not, try downloading it.
for dir_name, file_name in pretrained_files:
    file_path = os.path.join(dir_name, file_name)
    if not os.path.isfile(file_path):
        file_url = base_url + file_name
        print(f"Downloading {file_url}...")
        try:
            urllib.request.urlretrieve(file_url, file_path)
        except HTTPError as e:
            print("Something went wrong. Please try to download the file from the GDrive folder, or contact the author with the full output including the following error:\n", e)
        if file_name.endswith(".zip"):
            print("Unzipping file...")
            with zipfile.ZipFile(file_path, 'r') as zip_ref:
                zip_ref.extractall(file_path.rsplit("/",1)[0])

## Setup

For our experiments, we will use common CNN architectures trained on the ImageNet dataset (provided by PyTorch's torchvision package). For the results on the website and default on Google Colab, we use a ResNet34.

In [None]:
# Load CNN architecture pretrained on ImageNet
os.environ["TORCH_HOME"] = CHECKPOINT_PATH
pretrained_model = torchvision.models.resnet34(weights='IMAGENET1K_V1')
pretrained_model = pretrained_model.to(device)

# No gradients needed for the network
pretrained_model.eval()
for p in pretrained_model.parameters():
    p.requires_grad = False

To perform adversarial attacks, we also need a dataset to work on. Given that the CNN model has been trained on ImageNet, it is only fair to perform the attacks on data from ImageNet. For this, we provide a small set of pre-processed images from the original ImageNet dataset (note that this dataset is shared under the same [license](http://image-net.org/download-faq) as the original ImageNet dataset). Specifically, we have 5 images for each of the 1000 labels of the dataset. We can load the data below, and create a corresponding data loader.

In [None]:
# Mean and Std from ImageNet
NORM_MEAN = np.array([0.485, 0.456, 0.406])
NORM_STD = np.array([0.229, 0.224, 0.225])
# No resizing and center crop necessary as images are already preprocessed.
plain_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=NORM_MEAN,
                         std=NORM_STD)
])

# Load dataset and create data loader
imagenet_path = os.path.join(DATASET_PATH, "TinyImageNet/")
assert os.path.isdir(imagenet_path), f"Could not find the ImageNet dataset at expected path \"{imagenet_path}\". " + \
                                     f"Please make sure to have downloaded the ImageNet dataset here, or change the {DATASET_PATH=} variable."
dataset = torchvision.datasets.ImageFolder(root=imagenet_path, transform=plain_transforms)
data_loader = data.DataLoader(dataset, batch_size=32, shuffle=False, drop_last=False, num_workers=8)

# Load label names to interpret the label numbers 0 to 999
with open(os.path.join(imagenet_path, "label_list.json"), "r") as f:
    label_names = json.load(f)

def get_label_index(lab_str):
    assert lab_str in label_names, f"Label \"{lab_str}\" not found. Check the spelling of the class."
    return label_names.index(lab_str)

Before we start with our attacks, we should verify the performance of our model. As ImageNet has 1000 classes, simply looking at the accuracy is not sufficient to tell the performance of a model. A common alternative metric is "Top-5 accuracy", which tells us how many times the true label has been within the 5 most-likely predictions of the model. As models usually perform quite well on those, we report the error (1 - accuracy) instead of the accuracy:

In [None]:
def eval_model(dataset_loader, img_func=None):
    tp, tp_5, counter = 0., 0., 0.
    for imgs, labels in tqdm(dataset_loader, desc="Validating..."):
        imgs = imgs.to(device)
        labels = labels.to(device)
        if img_func is not None:
            imgs = img_func(imgs, labels)
        with torch.no_grad():
            preds = pretrained_model(imgs)
        tp += (preds.argmax(dim=-1) == labels).sum()
        tp_5 += (preds.topk(5, dim=-1)[1] == labels[...,None]).any(dim=-1).sum()
        counter += preds.shape[0]
    acc = tp.float().item()/counter
    top5 = tp_5.float().item()/counter
    print(f"Top-1 error: {(100.0 * (1 - acc)):4.2f}%")
    print(f"Top-5 error: {(100.0 * (1 - top5)):4.2f}%")
    return acc, top5

In [None]:
_ = eval_model(data_loader)

The ResNet34 achives a decent error rate of 4.3% for the top-5 predictions. Next, we can look at some predictions of the model to get more familiar with the dataset. The function below plots an image along with a bar diagram of its predictions. We also prepare it to show adversarial examples for later applications.

In [None]:
def show_prediction(img, label, pred, K=5, adv_img=None, noise=None):

    if isinstance(img, torch.Tensor):
        # Tensor image to numpy
        img = img.cpu().permute(1, 2, 0).numpy()
        img = (img * NORM_STD[None,None]) + NORM_MEAN[None,None]
        img = np.clip(img, a_min=0.0, a_max=1.0)
        label = label.item()

    # Plot on the left the image with the true label as title.
    # On the right, have a horizontal bar plot with the top k predictions including probabilities
    if noise is None or adv_img is None:
        fig, ax = plt.subplots(1, 2, figsize=(10,2), gridspec_kw={'width_ratios': [1, 1]})
    else:
        fig, ax = plt.subplots(1, 5, figsize=(12,2), gridspec_kw={'width_ratios': [1, 1, 1, 1, 2]})

    ax[0].imshow(img)
    ax[0].set_title(label_names[label])
    ax[0].axis('off')

    if adv_img is not None and noise is not None:
        # Visualize adversarial images
        adv_img = adv_img.cpu().permute(1, 2, 0).numpy()
        adv_img = (adv_img * NORM_STD[None,None]) + NORM_MEAN[None,None]
        adv_img = np.clip(adv_img, a_min=0.0, a_max=1.0)
        ax[1].imshow(adv_img)
        ax[1].set_title('Adversarial')
        ax[1].axis('off')
        # Visualize noise
        noise = noise.cpu().permute(1, 2, 0).numpy()
        noise = noise * 0.5 + 0.5 # Scale between 0 to 1
        ax[2].imshow(noise)
        ax[2].set_title('Noise')
        ax[2].axis('off')
        # buffer
        ax[3].axis('off')

    if abs(pred.sum().item() - 1.0) > 1e-4:
        pred = torch.softmax(pred, dim=-1)
    topk_vals, topk_idx = pred.topk(K, dim=-1)
    topk_vals, topk_idx = topk_vals.cpu().numpy(), topk_idx.cpu().numpy()
    ax[-1].barh(np.arange(K), topk_vals*100.0, align='center', color=["C0" if topk_idx[i]!=label else "C2" for i in range(K)])
    ax[-1].set_yticks(np.arange(K))
    ax[-1].set_yticklabels([label_names[c] for c in topk_idx])
    ax[-1].invert_yaxis()
    ax[-1].set_xlabel('Confidence')
    ax[-1].set_title('Predictions')

    plt.show()
    plt.close()

Let's visualize a few images below:

In [None]:
exmp_batch, label_batch = next(iter(data_loader))
with torch.no_grad():
    preds = pretrained_model(exmp_batch.to(device))
for i in range(1,17,5):
    show_prediction(exmp_batch[i], label_batch[i], preds[i])

The bar plot on the right shows the top-5 predictions of the model with their class probabilities. We denote the class probabilities with "confidence" as it somewhat resembles how confident the network is that the image is of one specific class. Some of the images have a highly peaked probability distribution, and we would expect the model to be rather robust against noise for those. However, we will see below that this is not always the case. Note that all of the images are of fish because the data loader doesn't shuffle the dataset. Otherwise, we would get different images every time we run the notebook, which would make it hard to discuss the results on the static version.

## Adversarial Patches

Instead of changing every pixel by a little bit, we can change a small part of the image into whatever values we would like. In other words, we will create a small image patch that covers a minor part of the original image but causes the model to confidentially predict a specific class we choose. This form of attack is an even bigger threat in real-world applications than FSGM.

### How to train an adversarial patch:
- We calculate gradients for the input, and update our adversarial input correspondingly. We do not calculate a gradient for every pixel. Instead, we replace parts of the input image with our patch and then calculate the gradients just for our patch.
- Secondly, we don't just do it for one image, but we want the patch to work with any possible image. Hence, we have a whole training loop where we train the patch using SGD.
- Lastly, image patches are usually designed to make the model predict a specific class, not just any other arbitrary class except the true label.

This following function makes the patch robust to random location, rotation, and scale.

In [None]:
#def place_patch(img, patch):
#    for i in range(img.shape[0]):
#        h_offset = np.random.randint(0,img.shape[2]-patch.shape[1]-1)
#        w_offset = np.random.randint(0,img.shape[3]-patch.shape[2]-1)
#        img[i,:,h_offset:h_offset+patch.shape[1],w_offset:w_offset+patch.shape[2]] = patch_forward(patch)
#    return img

In [None]:
# Include random location, rotation, and scale
# Generated with help from Claude Sonnet 4.5 on 11/2/25 at 8pm

def place_patch(img, patch, rotate_max=25, scale_range=(0.85, 1.15)):
    B, C, H, W = img.shape
    device = patch.device
    img = img.to(device)

    # Convert patch parameter to normalized image tensor
    patch_norm = patch_forward(patch)

    # Random rotation and scale (same for entire batch)
    angle = random.uniform(-rotate_max, rotate_max)
    scale = random.uniform(scale_range[0], scale_range[1])

    # Apply transformations using torchvision (differentiable)
    patch_transformed = TF.rotate(patch_norm, angle, interpolation=TF.InterpolationMode.BILINEAR)

    # Scale by resizing
    ph, pw = patch_norm.shape[-2:]
    new_h = int(ph * scale)
    new_w = int(pw * scale)
    patch_transformed = TF.resize(patch_transformed, [new_h, new_w], interpolation=TF.InterpolationMode.BILINEAR)

    pt_h, pt_w = patch_transformed.shape[-2:]

    # Random location (same for entire batch)
    max_h_off = max(0, H - pt_h)
    max_w_off = max(0, W - pt_w)
    h_off = random.randint(0, max_h_off) if max_h_off > 0 else 0
    w_off = random.randint(0, max_w_off) if max_w_off > 0 else 0

    h_end = min(h_off + pt_h, H)
    w_end = min(w_off + pt_w, W)
    actual_h = h_end - h_off
    actual_w = w_end - w_off

    # Place patch using mask (no in-place operations)
    mask = torch.zeros((1, C, H, W), device=device)
    mask[:, :, h_off:h_end, w_off:w_end] = 1.0

    patch_full = torch.zeros((1, C, H, W), device=device)
    patch_full[:, :, h_off:h_end, w_off:w_end] = patch_transformed[:, :actual_h, :actual_w].unsqueeze(0)

    result = img * (1 - mask) + patch_full * mask

    return result

The patch itself will be an `nn.Parameter` whose values are in the range between $-\infty$ and $\infty$. Images are, however, naturally limited in their range, and thus we write a small function that maps the parameter into the image value range of ImageNet:

In [None]:
TENSOR_MEANS, TENSOR_STD = torch.FloatTensor(NORM_MEAN)[:,None,None], torch.FloatTensor(NORM_STD)[:,None,None]
def patch_forward(patch):
    means = TENSOR_MEANS.to(patch.device)
    std = TENSOR_STD.to(patch.device)
    # Ensure patch is on the same device as TENSOR_MEANS and TENSOR_STD
    # patch = patch.to(TENSOR_MEANS.device)

    # Map patch values from [-infty,infty] to ImageNet min and max
    patch = (torch.tanh(patch) + 1 - 2 * means) / (2 * std)
    return patch

Before looking at the actual training code, we can write a small evaluation function. We evaluate the success of a patch by how many times we were able to fool the network into predicting our target class. A simple function for this is implemented below.

In [None]:
def eval_patch(model, patch, val_loader, target_class):
    model.eval()
    tp, tp_5, counter = 0., 0., 0.
    with torch.no_grad():
        for img, img_labels in tqdm(val_loader, desc="Validating...", leave=False):
            # For stability, place the patch at 4 random locations per image, and average the performance
            for _ in range(4):
                patch_img = place_patch(img, patch)
                patch_img = patch_img.to(device)
                img_labels = img_labels.to(device)
                pred = model(patch_img)
                # In the accuracy calculation, we need to exclude the images that are of our target class
                # as we would not "fool" the model into predicting those
                tp += torch.logical_and(pred.argmax(dim=-1) == target_class, img_labels != target_class).sum()
                tp_5 += torch.logical_and((pred.topk(5, dim=-1)[1] == target_class).any(dim=-1), img_labels != target_class).sum()
                counter += (img_labels != target_class).sum()
    acc = tp/counter
    top5 = tp_5/counter
    return acc, top5

Finally, we can look at the training loop. Given a model to fool, a target class to design the patch for, and a size $k$ of the patch in the number of pixels, we first start by creating a parameter of size $3\times k\times k$. These are the only parameters we will train, and the network itself remains untouched. We use a simple SGD optimizer with momentum to minimize the classification loss of the model given the patch in the image. While we first start with a very high loss due to the good initial performance of the network, the loss quickly decreases once we start changing the patch. In the end, the patch will represent patterns that are characteristic of the class. For instance, if we would want the model to predict a "goldfish" in every image, we would expect the pattern to look somewhat like a goldfish. Over the iterations, the model finetunes the pattern and, hopefully, achieves a high fooling accuracy.

In [None]:
def patch_attack(model, target_class, patch_size=64, num_epochs=5):
    # Leave a small set of images out to check generalization
    # In most of our experiments, the performance on the hold-out data points
    # was as good as on the training set. Overfitting was little possible due
    # to the small size of the patches.
    train_set, val_set = torch.utils.data.random_split(dataset, [4500, 500])
    train_loader = data.DataLoader(train_set, batch_size=32, shuffle=True, drop_last=True, num_workers=8)
    val_loader = data.DataLoader(val_set, batch_size=32, shuffle=False, drop_last=False, num_workers=4)

    # Create parameter and optimizer
    if not isinstance(patch_size, tuple):
        patch_size = (patch_size, patch_size)
    #patch = nn.Parameter(torch.zeros(3, patch_size[0], patch_size[1]), requires_grad=True)
    # ADDED THE FOLLOWING LINE OF CODE
    patch = nn.Parameter(init_param.clone().to(device), requires_grad=True)
    optimizer = torch.optim.SGD([patch], lr=1e-1, momentum=0.8)
    loss_module = nn.CrossEntropyLoss()

    # Training loop
    for epoch in range(num_epochs):
        t = tqdm(train_loader, leave=False)
        batch_idx = 0

        for img, _ in t:
            img = place_patch(img, patch)
            #img = img.to(device)
            pred = model(img)
            labels = torch.zeros(img.shape[0], device=pred.device, dtype=torch.long).fill_(target_class)
            loss = loss_module(pred, labels)

            ##### ADDED CODE FOR REGULARIZERS, generated by Claude Sonnet 4.5 on 11/2/25 at 8pm

            patch_norm = patch_forward(patch)
            # Convert to pixel space for regularization
            patch_pixels = torch.clamp(
                patch_norm * TENSOR_STD.to(patch_norm.device) + TENSOR_MEANS.to(patch_norm.device),
                0.0, 1.0
            )

            lambda_tv = 2e-2      # Smoothness (reduce jagged edges)
            lambda_dark = 5e-2    # Keep it darker
            target_brightness = 0.35  # Slightly darker than middle gray

            # Total variation (smoothness)
            tv_h = (patch_pixels[:,1:,:] - patch_pixels[:,:-1,:]).abs().mean()
            tv_w = (patch_pixels[:,:,1:] - patch_pixels[:,:,:-1]).abs().mean()
            tv = tv_h + tv_w

            # Darkness penalty
            dark = F.relu(patch_pixels.mean() - target_brightness)

            loss = loss + lambda_tv * tv + lambda_dark * dark

            #####

            optimizer.zero_grad()
            #loss.mean().backward()
            loss.backward()

            optimizer.step()
            batch_idx += 1
            t.set_description(f"Epoch {epoch}, Loss: {loss.item():4.2f}")

    # Final validation
    acc, top5 = eval_patch(model, patch, val_loader, target_class)

    return patch.data, {"acc": acc.item(), "top5": top5.item()}

To get some experience with what to expect from an adversarial patch attack, we want to train multiple patches for different classes. As the training of a patch can take one or two minutes on a GPU, we have provided a couple of pre-trained patches including their results on the full dataset. The results are saved in a JSON file, which is loaded below.

In [None]:
# Load evaluation results of the pretrained patches
json_results_file = os.path.join(CHECKPOINT_PATH, "patch_results.json")
json_results = {}
if os.path.isfile(json_results_file):
    with open(json_results_file, "r") as f:
        json_results = json.load(f)

# If you train new patches, you can save the results via calling this function
def save_results(patch_dict):
    result_dict = {cname: {psize: [t.item() if isinstance(t, torch.Tensor) else t
                                   for t in patch_dict[cname][psize]["results"]]
                           for psize in patch_dict[cname]}
                   for cname in patch_dict}
    with open(os.path.join(CHECKPOINT_PATH, "patch_results.json"), "w") as f:
        json.dump(result_dict, f, indent=4)

Additionally, we implement a function to train and evaluate patches for a list of classes and patch sizes. The pretrained patches include the classes *toaster*, *goldfish*, *school bus*, *lipstick*, and *pineapple*. We chose the classes arbitrarily to cover multiple domains (animals, vehicles, fruits, devices, etc.). We trained each class for three different patch sizes: $32\times32$ pixels, $48\times48$ pixels, and $64\times64$ pixels. We can load them in the two cells below.

In [None]:
def get_patches(class_names, patch_sizes):
    result_dict = dict()

    # Loop over all classes and patch sizes
    for name in class_names:
        result_dict[name] = dict()
        for patch_size in patch_sizes:
            c = label_names.index(name)
            file_name = os.path.join(CHECKPOINT_PATH, f"{name}_{patch_size}_patch.pt")
            # Load patch if pretrained file exists, otherwise start training
            if not os.path.isfile(file_name):
                patch, val_results = patch_attack(pretrained_model, target_class=c, patch_size=patch_size, num_epochs=5)
                print(f"Validation results for {name} and {patch_size}:", val_results)
                torch.save(patch, file_name)
            else:
                patch = torch.load(file_name)
            # Load evaluation results if exist, otherwise manually evaluate the patch
            if name in json_results:
                results = json_results[name][str(patch_size)]

            else:
                results = eval_patch(pretrained_model, patch, data_loader, target_class=c)
            # Store results and the patches in a dict for better access
            result_dict[name][patch_size] = {
                "results": results,
                "patch": patch
            }

    return result_dict

The following code initializes the patch to an image of a coffee logo on brown background.

In [None]:
# this block of code was generated with help from AI on 11/2/25 at 9:10pm

# Load coffee_logo.png from repo and convert to init_param
from PIL import Image

PATCH_SIZE = 64
USER_PNG_PATH = "coffee_logo.png"
assert os.path.exists(USER_PNG_PATH), f"File not found: {USER_PNG_PATH}"

img = Image.open(USER_PNG_PATH).convert("RGB") # Convert to RGB directly

ph = pw = PATCH_SIZE

# Resize the image to the patch size
img = img.resize((pw, ph), Image.LANCZOS)

# Convert to 0..1 floats
arr = np.array(img).astype(np.float32) / 255.0  # shape (H,W,3)

# Use existing ImageNet mean/std variables from the notebook; fall back if not present
try:
    MEAN = np.array(NORM_MEAN)
    STD  = np.array(NORM_STD)
except Exception:
    MEAN = np.array([0.485,0.456,0.406], dtype=np.float32)
    STD  = np.array([0.229,0.224,0.225], dtype=np.float32)

# Invert the notebook's patch_forward mapping to get pre-tanh parameter
norm_pix = (arr - MEAN[None,None,:]) / STD[None,None,:]
tanh_p = 2.0 * (STD[None,None,:] * norm_pix) - 1.0 + 2.0 * MEAN[None,None,:]
tanh_p = np.clip(tanh_p, -0.999, 0.999)
init_param_np = 0.5 * np.log((1.0 + tanh_p) / (1.0 - tanh_p))
init_param = torch.tensor(np.transpose(init_param_np.astype(np.float32), (2,0,1)))  # (3,H,W)

# Visual check (optional)
try:
    from IPython.display import display
    display(img.resize((256,256))) # Display the resized image directly
except Exception:
    pass

print("init_param ready, shape:", init_param.shape)

In [None]:
class_names = ['toaster', 'goldfish', 'school bus', 'lipstick', 'pineapple', 'German short-haired pointer']
patch_sizes = [64] #[32, 48, 64]

patch_dict = get_patches(class_names, patch_sizes)
# save_results(patch_dict) # Uncomment if you add new class names and want to save the new results

Before looking at the quantitative results, we can actually visualize the patches.

In [None]:
def show_patches():
    fig, ax = plt.subplots(len(patch_sizes), len(class_names), figsize=(len(class_names)*2.2, len(patch_sizes)*2.2))
    ax = np.atleast_2d(ax)
    for c_idx, cname in enumerate(class_names):
        for p_idx, psize in enumerate(patch_sizes):
            patch = patch_dict[cname][psize]["patch"]
            patch = (torch.tanh(patch) + 1) / 2 # Parameter to pixel values
            patch = patch.cpu().permute(1, 2, 0).numpy()
            patch = np.clip(patch, a_min=0.0, a_max=1.0)
            ax[p_idx][c_idx].imshow(patch)
            ax[p_idx][c_idx].set_title(f"{cname}, size {psize}")
            ax[p_idx][c_idx].axis('off')
    fig.subplots_adjust(hspace=0.3, wspace=0.3)
    plt.show()
show_patches()

You can see that the German short-haired pointer patch is dark and shows the outline of the coffee mug--this is due to the initialization as the coffee cup image with brown background and the regularization terms to encourage smooth, less noisy patches and darkness penalties.
We also trained it to be robust to random rotations, scaling, and locations.

In [None]:
%%html
<!-- Some HTML code to increase font size in the following table -->
<style>
th {font-size: 120%;}
td {font-size: 120%;}
</style>

In [None]:
import tabulate
from IPython.display import display, HTML

def show_table(top_1=True):
    i = 0 if top_1 else 1
    table = [[name] + [f"{(100.0 * patch_dict[name][psize]['results'][i]):4.2f}%" for psize in patch_sizes]
             for name in class_names]
    display(HTML(tabulate.tabulate(table, tablefmt='html', headers=["Class name"] + [f"Patch size {psize}x{psize}" for psize in patch_sizes])))

First, we will create a table of top-1 accuracy, meaning that how many images have been classified with the target class as highest prediction?

In [None]:
show_table(top_1=True)

The German short-haired pointer patch doesn't perform as well in top-1 accuracy, likely because it is put under much more constraints than the other patches--making it less likely to perform consistently as the top target class.

Let's also take a look at the top-5 accuracy:

In [None]:
show_table(top_1=False)

The patch performs much better in the top-5 accuracy, showing that it is still rather robust despite the constraints put on it to disguise the patch.

Finally, let's create some example visualizations of the patch attack in action.

In [None]:
def perform_patch_attack(patch):
    patch_batch = exmp_batch.clone()
    patch_batch = place_patch(patch_batch, patch)
    with torch.no_grad():
        patch_preds = pretrained_model(patch_batch.to(device))
    for i in range(1,17,5):
        show_prediction(patch_batch[i], label_batch[i], patch_preds[i])

In [None]:
perform_patch_attack(patch_dict['German short-haired pointer'][64]['patch'])

## AI Assistance Statement
Portions of this notebook were developed with assistance from Anthropic's Claude Sonnet 4.5 (November 2025). Claude was used for debugging and code development assistance (for robustness augmentations and disguising the patch). All generated code and explanations were reviewed, tested, and edited to ensure correctness and alignment with assignment requirements.

Google Gemini 2.5 Flash was used to generate the coffee mug png on 11/2/25 at 6pm.

Specific dates/times and prompts to generate code blocks are listed below:
| Date/Time | Summary | Code Added / Modified |
|:-----------|:---------|:---------------------|
| **Nov 2, 2025<br>~11:00 PM EST** | Implement random transformations for patch placement | Created `place_patch()` with random rotation, scale, and location augmentation |
| **Nov 2, 2025<br>~11:10 PM EST** | Fix gradient flow and device mismatches | Updated `place_patch()` to handle CUDA/CPU correctly, removed non-differentiable per-image transforms, and replaced in-place ops with safe blending |
| **Nov 2, 2025<br>~11:00 PM EST** | Validate gradient flow and training stability | Simplified patch placement (fixed location) to isolate bugs; confirmed gradients update the patch correctly |
| **Nov 2, 2025<br>~11:00 PM EST** | Finalize robust transformation implementation | Reintroduced differentiable rotation/scale using `torchvision.transforms` (`TF.rotate`, `TF.resize`) |
| **Nov 2, 2025<br>~11:00 PM EST** | Add visual regularization for realistic patch appearance | Added Total Variation (TV) and darkness losses to training loop to encourage smooth, dark, printable patch |
| **Nov 2, 2025<br>~11:00 PM EST** | Implement coffee-logo initialization for disguise | Loaded `coffee_logo.png`, composited over brown badge, converted to `init_param` (pre-tanh) for patch initialization |

---

Key Issues Resolved:
- Fixed device mismatches (CPU â†” GPU)  
- Restored proper gradient flow through patch parameter  
- Eliminated non-differentiable image transforms and in-place ops  
- Ensured evaluation tests patch robustness at multiple locations  
- Added realistic **coffee-logo initialization** for disguise  
- Added **TV + darkness regularization** for visual plausibility and printability

## References

This tutorial was originally created by Phillip Lippe.
[![View notebooks on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/tutorial10/Adversarial_Attacks.ipynb)
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/tutorial10/Adversarial_Attacks.ipynb)  

[1] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." ICLR 2015.

[2] Hendrik Metzen, Jan, et al. "Universal adversarial perturbations against semantic image segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2017.

[3] Anant Jain. "Breaking neural networks with adversarial attacks." [Blog post](https://towardsdatascience.com/breaking-neural-networks-with-adversarial-attacks-f4290a9a45aa) 2019.