# Adversarial Patch (Target = “jellyfish”)

This notebook demonstrates the process of creating an adversarial patch designed to fool a pretrained ResNet34 classifier on ImageNet. The goal is to generate a physical, printable patch that causes the model to misclassify diverse natural images as a specific target class, and in this case, “jellyfish.” The notebook walks through patch initialization, augmentation, optimization, and evaluation, following the adversarial patch framework described in Brown et al. (2017).

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ailina-aniwan/xai-adversarial-patches/blob/main/adversarial_attacks_patches.ipynb)


In [1]:
!pip install -q torch torchvision pillow matplotlib tqdm kornia



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Setup & Imports

In [2]:
import os, glob, io, random, time
import numpy as np
from PIL import Image, ImageOps, ImageFilter
from tqdm import tqdm

import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

import torchvision
from torchvision import transforms, models, datasets
from torchvision.transforms.functional import to_pil_image, to_tensor

import kornia.augmentation as K

# Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Torch:", torch.__version__)
print("Torchvision:", torchvision.__version__)
print("Device:", device)

# ImageNet normalization
IM_MEAN = torch.tensor([0.485, 0.456, 0.406], device=device).view(3,1,1)
IM_STD  = torch.tensor([0.229, 0.224, 0.225], device=device).view(3,1,1)
def norm(x):     # [0,1] -> normalized
    return (x - IM_MEAN) / IM_STD
def inv_norm(x): # normalized -> [0,1]
    return x * IM_STD + IM_MEAN

# Target index lookup for "jellyfish"
weights = models.ResNet34_Weights.IMAGENET1K_V1
categories = weights.meta["categories"]
classes = categories
target_idx = [i for i,c in enumerate(categories) if "jellyfish" in c.lower()][0]
target_name = categories[target_idx]
print(f"Target idx for 'jellyfish': {target_idx}  ({target_name})")


Torch: 2.2.0
Torchvision: 0.17.0
Device: cpu
Target idx for 'jellyfish': 107  (jellyfish)


This section imports the necessary libraries, defines the device, and sets up basic image normalization for the pretrained ResNet34 model. It also specifies the target class ("jellyfish") that the patch will try to induce.

## Path & Image Setup

In [3]:
# Create output folders
os.makedirs("sample_images", exist_ok=True)
os.makedirs("test_images", exist_ok=True)
os.makedirs("patch_outputs", exist_ok=True)

# Load STL10 directly
transform = transforms.Compose([
    transforms.Resize((800, 800)),
    transforms.ToTensor()
])

# Use temporary download path
dataset = datasets.STL10(root="/tmp/stl10_temp", split="train", download=True, transform=transform)
stl10_classes = dataset.classes
print(f"Loaded {len(dataset)} images across {len(stl10_classes)} classes: {stl10_classes}")

# Random subset for training/testing backgrounds
torch.manual_seed(42)
indices = torch.randperm(len(dataset))
train_idx = indices[:45]   # 45 training images
test_idx  = indices[45:55] # 10 held-out test images

# Save JPEGs only
def save_images(indices, out_folder):
    for i, idx in enumerate(indices):
        img_tensor, label = dataset[idx]
        img = transforms.ToPILImage()(img_tensor)
        img = ImageOps.fit(img, (800, 800), method=Image.Resampling.LANCZOS)
        fname = f"{stl10_classes[label]}_{i:03d}.jpg"
        img.save(os.path.join(out_folder, fname), quality=95)
    print(f"Saved {len(indices)} clean images to {out_folder}/")

save_images(train_idx, "sample_images")
save_images(test_idx, "test_images")


Files already downloaded and verified
Loaded 5000 images across 10 classes: ['airplane', 'bird', 'car', 'cat', 'deer', 'dog', 'horse', 'monkey', 'ship', 'truck']
Saved 45 clean images to sample_images/
Saved 10 clean images to test_images/


We use the STL10 dataset only as a convenient source of diverse natural images to serve as backgrounds during patch training. The class labels from STL10 are not used in this experiment.

## Hyperparameters & Print Setting

In [4]:
# Square patch with circular mask inside
PATCH_SIZE = 160
BATCH_SIZE = 8
LR = 0.05
EPOCHS = 200
TV_WEIGHT = 1e-3

# Print/export settings
PRINT_DPI = 300
PRINT_DIAM_CM = 9
PRINT_PX = int((PRINT_DIAM_CM / 2.54) * PRINT_DPI)


Here we define core hyperparameters such as patch size, learning rate, and total variation weight. These values control patch optimization and determine the physical dimensions for printing later.

## Model (ResNet34) & Preprocessing

In [5]:
weights = models.ResNet34_Weights.IMAGENET1K_V1
model = models.resnet34(weights=weights).to(device)
model.eval()

# Use the canonical ImageNet normalization
mean = torch.tensor([0.485, 0.456, 0.406]).view(3,1,1).to(device)
std  = torch.tensor([0.229, 0.224, 0.225]).view(3,1,1).to(device)

# Preprocessing pipeline supplied by the weights
preprocess = weights.transforms()

# Helper lambdas for (un)normalization
inv_norm = lambda x: x * std + mean
norm     = lambda x: (x - mean) / std

print("Model & preprocessing ready.")


Model & preprocessing ready.


We use the official torchvision ResNet34 and its matching normalization pipeline to avoid subtle data drift.

## Mask, TV loss, and patch augmentation

In [6]:
def make_circular_mask(h, w, radius=None, center=None):
    yy, xx = np.mgrid[:h, :w]
    if center is None: center = (w//2, h//2)
    if radius is None: radius = min(h, w)//2
    cx, cy = center
    mask = ((xx - cx)**2 + (yy - cy)**2) <= radius**2
    return torch.from_numpy(mask.astype(np.float32))  # [H,W]

mask = make_circular_mask(PATCH_SIZE, PATCH_SIZE, radius=int(PATCH_SIZE*0.48)).unsqueeze(0).to(device)  # [1,H,W]

def tv_loss(x):  # x: [B,3,H,W]
    dh = torch.mean(torch.abs(x[:,:,1:,:] - x[:,:,:-1,:]))
    dw = torch.mean(torch.abs(x[:,:,:,1:] - x[:,:,:,:-1]))
    return dh + dw

def augment_patch_on_image_batch(imgs_norm, patch_tensor, mask_tensor):
    """
    imgs_norm: [B,3,224,224] normalized
    patch_tensor: [3,H,W] in [0,1]
    mask_tensor: [1,H,W]
    returns: [B,3,224,224] normalized, with patch applied and global augs
    """
    B, C, Himg, Wimg = imgs_norm.shape
    imgs_unnorm = inv_norm(imgs_norm.clone())  # [0,1]

    for i in range(B):
        # Random scale/rotation for the patch
        scale = random.uniform(0.6, 1.2)
        angle = random.uniform(-30, 30)
        pH = max(4, int(PATCH_SIZE * scale)); pW = pH

        # Resize/rotate patch & mask with PIL
        pil_patch = to_pil_image(patch_tensor.detach().cpu().clamp(0,1))
        pil_mask  = to_pil_image(mask_tensor.squeeze(0).detach().cpu())
        pil_patch = pil_patch.resize((pW,pH), resample=Image.BILINEAR).rotate(angle, resample=Image.BILINEAR, expand=False)
        pil_mask  = pil_mask.resize((pW,pH), resample=Image.BILINEAR).rotate(angle, resample=Image.BILINEAR, expand=False)

        pt = to_tensor(pil_patch).to(device)  # [3,pH,pW]
        mt = to_tensor(pil_mask).to(device)   # [1,pH,pW]

        # Random position
        max_x, max_y = Wimg - pW, Himg - pH
        x = 0 if max_x <= 0 else random.randint(0, max_x)
        y = 0 if max_y <= 0 else random.randint(0, max_y)

        # Alpha blend
        region = imgs_unnorm[i, :, y:y+pH, x:x+pW]
        imgs_unnorm[i, :, y:y+pH, x:x+pW] = pt * mt + region * (1 - mt)

        # Occasionally simulate JPEG compression (phone camera)
        if random.random() < 0.2:
            pil_tmp = to_pil_image(imgs_unnorm[i].cpu().clamp(0,1))
            q = random.randint(60, 95)
            buf = io.BytesIO()
            pil_tmp.save(buf, format='JPEG', quality=q)
            buf.seek(0)
            imgs_unnorm[i] = to_tensor(Image.open(buf).convert('RGB')).to(device)

    # Apply global augs (in [0,1] pixel space), then normalize for model input
    aug = K.AugmentationSequential(
        K.RandomResizedCrop((224,224), scale=(0.95,1.0), ratio=(0.9,1.1), p=1.0),
        K.RandomAffine(degrees=10, translate=0.06, scale=(0.98,1.02), p=0.6),
        K.RandomGaussianBlur((3,3), (0.1, 2.0), p=0.25),
        K.ColorJitter(0.25, 0.25, 0.25, 0.05, p=0.6),
        K.RandomGrayscale(p=0.05),
        data_keys=["input"]
    ).to(device)

    imgs_aug = aug(imgs_unnorm)  # Kornia expects [0,1] range
    imgs_back = norm(imgs_aug)   # normalize afterward
    return imgs_back

This section creates a circular mask to constrain the patch area, defines the total variation loss to enforce smoothness, and implements the main augmentation pipeline that applies random scaling, rotation, and color perturbations to improve the patch’s robustness.

## Dataset Loader

In [7]:
# Simple dataset from folder
class SimpleImageFolder(Dataset):
    def __init__(self, root, transform=None):
        self.paths = sorted(glob.glob(os.path.join(root, "*.*")))
        self.transform = transform
    def __len__(self): return len(self.paths)
    def __getitem__(self, idx):
        p = self.paths[idx]
        img = Image.open(p).convert('RGB')
        return (self.transform(img) if self.transform else img), 0

dataset = SimpleImageFolder("sample_images", transform=preprocess)
loader = DataLoader(dataset, batch_size=8, shuffle=True, num_workers=0, drop_last=True)
print("Train images:", len(dataset))

Train images: 45


A simple dataset class loads images from the sample_images folder and applies the same preprocessing as the pretrained model. This provides batches of backgrounds for overlaying the patch during optimization.

## Initialize the Patch & Optimizer

In [8]:
# Initialize patch and optimizer
patch = torch.rand(3, PATCH_SIZE, PATCH_SIZE, device=device, requires_grad=True)
with torch.no_grad():
    patch[0].mul_(0.25).add_(0.15)  # R (lower)
    patch[1].mul_(0.25).add_(0.40)  # G (mid)
    patch[2].mul_(0.25).add_(0.60)  # B (higher)
    patch.clamp_(0,1)

optimizer = torch.optim.Adam([patch], lr=LR)

The patch is initialized as a bluish tensor with slight random noise and a circular mask. The patch values are clamped to the valid [0,1] range and marked as trainable parameters for gradient-based optimization.

## Training Loop

In [9]:
# Patch initialization
PATCH_SIZE = 200

base = torch.tensor([0.30, 0.50, 0.90], device=device).view(3,1,1)
patch = (base + 0.05*torch.randn(3, PATCH_SIZE, PATCH_SIZE, device=device)).clamp(0,1)
patch.requires_grad_(True)

# circular mask (1 inside, 0 outside)
yy, xx = torch.meshgrid(torch.arange(PATCH_SIZE, device=device),
                        torch.arange(PATCH_SIZE, device=device),
                        indexing='ij')
mask = (((yy - PATCH_SIZE/2)**2 + (xx - PATCH_SIZE/2)**2) < (PATCH_SIZE/2)**2).float().unsqueeze(0)


In [10]:
EPOCHS     = 150
LR         = 0.02
TV_WEIGHT  = 1e-2
LOG_EVERY  = 5

optimizer = torch.optim.Adam([patch], lr=LR)
save_dir  = "patch_outputs"; os.makedirs(save_dir, exist_ok=True)

best_prob = 0.0
loss_history, prob_history, grad_history = [], [], []

def tv_loss(x):
    dx = x[:,:,1:,:] - x[:,:,:-1,:]
    dy = x[:,:,:,1:] - x[:,:,:,:-1]
    return (dx.abs().mean() + dy.abs().mean())

def center_overlay(img_norm_bchw, patch_01, mask_01):
    """img_norm_bchw is normalized; convert to [0,1], overlay, renorm."""
    img_pix = inv_norm(img_norm_bchw.clone())
    pt = patch_01.clamp(0,1)
    mt = mask_01

    B, C, H, W = img_pix.shape
    ph, pw = pt.shape[-2:]
    y = (H - ph)//2; x = (W - pw)//2
    img_pix[:, :, y:y+ph, x:x+pw] = pt*mt + img_pix[:, :, y:y+ph, x:x+pw]*(1-mt)
    return norm(img_pix)  # back to normalized for the model

for epoch in range(1, EPOCHS+1):
    if len(loader) == 0:
        print("No images in sample_images/. Add files and re-run.")
        break

    running = 0.0
    for imgs, _ in loader:
        imgs = imgs.to(device)
        # overlay in pixel space, renormalize once
        patched = center_overlay(imgs, patch.unsqueeze(0), mask)

        logits = model(patched)                # [B, 1000]
        tgt = logits[:, target_idx]
        # max logit among other classes
        logits_others = logits.clone()
        logits_others[:, target_idx] = -1e9
        other_max = logits_others.max(dim=1).values
        margin = tgt - other_max
        loss_attack = -margin.mean()           # maximize margin
        loss_tv = TV_WEIGHT * tv_loss(patch.unsqueeze(0))
        loss = loss_attack + loss_tv

        optimizer.zero_grad()
        loss.backward()

        # gradient diagnostics
        grad_norm = patch.grad.detach().norm().item() if patch.grad is not None else 0.0

        optimizer.step()
        with torch.no_grad():
            patch.clamp_(0,1)

        running += loss.item()

    avg_loss = running / len(loader)
    loss_history.append(avg_loss)
    grad_history.append(grad_norm)

    # quick eval on a few training imgs (center overlay)
    if epoch % LOG_EVERY == 0 or epoch == 1:
        eval_paths = sorted(glob.glob("sample_images/*.*"))[:12]
        probs = []
        for p in eval_paths:
            x = preprocess(Image.open(p).convert("RGB")).to(device).unsqueeze(0)
            x_patched = center_overlay(x, patch.unsqueeze(0), mask)  # reuse helper
            with torch.no_grad():
                pr = F.softmax(model(x_patched), dim=1)[0, target_idx].item()
            probs.append(pr)

        mean_prob = float(sum(probs)/len(probs)) if probs else 0.0
        prob_history.append((epoch, mean_prob))

        # save snapshots
        to_pil_image(patch.detach().cpu()).save(os.path.join(save_dir, f"patch_epoch_{epoch:03d}.png"))
        if mean_prob > best_prob:
            best_prob = mean_prob
            torch.save(patch.detach().cpu(), os.path.join(save_dir, "best_patch.pt"))
            to_pil_image(patch.detach().cpu()).save(os.path.join(save_dir, "best_patch.png"))

        print(f"Epoch {epoch:3d}/{EPOCHS} | loss={avg_loss:.4f} | "
              f"meanP(jellyfish)={mean_prob:.5f} | grad={grad_norm:.3e} | best={best_prob:.5f}")


Epoch   1/150 | loss=0.3077 | meanP(jellyfish)=0.99990 | grad=1.067e+01 | best=0.99990
Epoch   5/150 | loss=-40.0440 | meanP(jellyfish)=1.00000 | grad=9.810e+00 | best=1.00000
Epoch  10/150 | loss=-56.6253 | meanP(jellyfish)=1.00000 | grad=7.011e+00 | best=1.00000
Epoch  15/150 | loss=-65.5386 | meanP(jellyfish)=1.00000 | grad=6.642e+00 | best=1.00000
Epoch  20/150 | loss=-72.2801 | meanP(jellyfish)=1.00000 | grad=1.115e+01 | best=1.00000
Epoch  25/150 | loss=-78.1701 | meanP(jellyfish)=1.00000 | grad=7.279e+00 | best=1.00000
Epoch  30/150 | loss=-81.3215 | meanP(jellyfish)=1.00000 | grad=8.425e+00 | best=1.00000
Epoch  35/150 | loss=-84.6282 | meanP(jellyfish)=1.00000 | grad=8.788e+00 | best=1.00000
Epoch  40/150 | loss=-86.3415 | meanP(jellyfish)=1.00000 | grad=1.007e+01 | best=1.00000
Epoch  45/150 | loss=-87.7628 | meanP(jellyfish)=1.00000 | grad=7.351e+00 | best=1.00000
Epoch  50/150 | loss=-89.8029 | meanP(jellyfish)=1.00000 | grad=7.907e+00 | best=1.00000
Epoch  55/150 | loss=-9

The training loop optimizes the patch to increase the ResNet34 model’s confidence in the “jellyfish” class. The loss combines a margin-based adversarial objective with total variation regularization for smoothness. The patch is clamped after each update and periodically saved for inspection.

## Export Print-Ready Patch

In [11]:
PRINT_PX = 600  # adjust for print size

# Load the best saved patch
best_png = os.path.join("patch_outputs", "best_patch.png")
if os.path.exists(best_png):
    best_patch_pil = Image.open(best_png).convert("RGB")
else:
    best_patch_pil = to_pil_image(patch.detach().cpu().clamp(0, 1))
    best_patch_pil.save(os.path.join("patch_outputs", "final_patch.png"))

# Create single large print patch
final_print = best_patch_pil.resize((PRINT_PX, PRINT_PX), resample=Image.LANCZOS)
single_path = os.path.join("patch_outputs", f"jellyfish_patch_print_{PRINT_PX}px.png")
final_print.save(single_path, dpi=(300, 300))

# Create 3x3 sticker sheet layout
gap = 40  # pixel spacing between stickers
sheet_w = PRINT_PX * 3 + gap * 4
sheet_h = PRINT_PX * 3 + gap * 4
canvas = Image.new("RGB", (sheet_w, sheet_h), (255, 255, 255))

for i in range(3):
    for j in range(3):
        x = gap + j * (PRINT_PX + gap)
        y = gap + i * (PRINT_PX + gap)
        canvas.paste(final_print, (x, y))

sheet_path = os.path.join("patch_outputs", "jellyfish_patch_sticker_sheet_3x3.png")
canvas.save(sheet_path, dpi=(300, 300))

print("\nExport complete.")
print("Saved single large patch:", single_path)
print("Saved 3x3 sticker sheet:", sheet_path)



Export complete.
Saved single large patch: patch_outputs/jellyfish_patch_print_600px.png
Saved 3x3 sticker sheet: patch_outputs/jellyfish_patch_sticker_sheet_3x3.png


After training, the best-performing patch is resized for physical printing. Both a single large patch and a 3×3 sticker sheet are created to test the attack under real-world conditions.

## Evaluation on Held-Out Photos

In [12]:
# Evaluate on held-out test_images/
test_paths = sorted(glob.glob("test_images/*.*"))
if not test_paths:
    print("Add photos to test_images/ for evaluation.")
else:
    successes, probs = 0, []
    for pth in test_paths:
        img = preprocess(Image.open(pth).convert("RGB")).to(device).unsqueeze(0)
        with torch.no_grad():
            img_u = inv_norm(img.clone())
            pt = to_tensor(best_patch_pil).to(device)
            mt = mask.to(device)
            _,_,H,W = img_u.shape
            x = (W - pt.shape[2])//2; y = (H - pt.shape[1])//2
            # overlay patch at image center
            img_u[0,:, y:y+pt.shape[1], x:x+pt.shape[2]] = pt*mt + img_u[0,:, y:y+pt.shape[1], x:x+pt.shape[2]]*(1-mt)
            
            # forward through model
            out = model(norm(img_u))
            prob = F.softmax(out, dim=1)[0, target_idx].item()
            pred = torch.argmax(out, dim=1).item()

        # print results (use ImageNet class list)
        print(os.path.basename(pth), "-> pred:", categories[pred], f"prob({target_name})={prob:.3f}")
        probs.append(prob)
        if pred == target_idx: 
            successes += 1

    print("\nTop-1 success rate:", successes / len(test_paths))
    print("Mean target probability:", sum(probs) / len(probs))

bird_005.jpg -> pred: jellyfish prob(jellyfish)=1.000
bird_007.jpg -> pred: jellyfish prob(jellyfish)=1.000
car_008.jpg -> pred: jellyfish prob(jellyfish)=1.000
cat_002.jpg -> pred: jellyfish prob(jellyfish)=1.000
deer_004.jpg -> pred: jellyfish prob(jellyfish)=1.000
dog_001.jpg -> pred: jellyfish prob(jellyfish)=1.000
horse_000.jpg -> pred: jellyfish prob(jellyfish)=1.000
horse_003.jpg -> pred: jellyfish prob(jellyfish)=1.000
monkey_006.jpg -> pred: jellyfish prob(jellyfish)=1.000
monkey_009.jpg -> pred: jellyfish prob(jellyfish)=1.000

Top-1 success rate: 1.0
Mean target probability: 1.0


We evaluate the final patch on unseen test images by overlaying it at the image center and measuring the model’s predicted probabilities for the target class. The output includes predicted labels, target probabilities, and overall success rate.

## References

This tutorial was originally created by Phillip Lippe. 
[![View notebooks on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/tutorial10/Adversarial_Attacks.ipynb)
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/tutorial10/Adversarial_Attacks.ipynb)  

[1] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." ICLR 2015.

[2] Hendrik Metzen, Jan, et al. "Universal adversarial perturbations against semantic image segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2017.

[3] Anant Jain. "Breaking neural networks with adversarial attacks." [Blog post](https://towardsdatascience.com/breaking-neural-networks-with-adversarial-attacks-f4290a9a45aa) 2019.