 Task 1: Basics – Evaluate Pretrained ResNet-34 on ImageNet Subset

 Task Overview
In this initial task, we evaluate a pretrained ResNet-34 model on a test subset of the ImageNet-1K dataset. This subset includes images from 100 selected classes, and our objective is to:

Load the images and labels.

Preprocess the data using ImageNet's standard normalization.

Use the pretrained ResNet-34 model from torchvision to make predictions.

Compute Top-1 and Top-5 accuracy for the model.

This task establishes a baseline performance that will later be degraded using adversarial attacks.

Step 1: Importing Required Libraries
This cell imports all the necessary libraries for handling files, image processing, dataset loading, and deep learning using PyTorch.

In [1]:
# Import libraries for data handling, image transformation, and model inference
import os
import json
import torch
import torchvision
import torchvision.transforms as transforms
import numpy as np
from PIL import Image
from torch.utils.data import Dataset, DataLoader


Step 2: Loading Class Labels from JSON

This cell loads the class labels from the labels_list.json file, which maps each selected class to its corresponding ImageNet index.

In [2]:
# Load the list of labels (e.g., "401: accordion") from the provided JSON file
with open('labels_list.json', 'r') as f:
    labels_json = json.load(f)


Step 3: Verifying Loaded Labels

This cell prints out the number of labels loaded and previews the first few entries to verify correct formatting.

In [3]:
# Print the number of entries and sample the first 5 to inspect formatting
print(f"Loaded {len(labels_json)} labels from JSON file")
print("First 5 entries:", labels_json[:5])


Loaded 100 labels from JSON file
First 5 entries: ['401: accordion', '402: acoustic guitar', '403: aircraft carrier', '404: airliner', '405: airship']


Step 4: Parsing Class IDs and Names

Here we parse each label entry to extract the ImageNet class index and human-readable class name.

In [4]:
# Separate each label into class ID (int) and class name (string)
class_ids = []
class_names = []
for entry in labels_json:
    parts = entry.split(':', 1)
    if len(parts) == 2:
        class_id = int(parts[0].strip())
        class_name = parts[1].strip()
        class_ids.append(class_id)
        class_names.append(class_name)


Step 5: Verifying Parsed Class Data

This quick check ensures we successfully extracted all 100 class IDs and names.

In [5]:
print(f"Extracted {len(class_ids)} class IDs and names")

Extracted 100 class IDs and names


In [6]:
import zipfile
import os

# Check if the directory already exists
if not os.path.exists("./TestDataSet"):
    # Extract the zip file
    with zipfile.ZipFile("TestDataSet.zip", 'r') as zip_ref:
        zip_ref.extractall(".")
    print("Extracted TestDataSet.zip successfully")

Extracted TestDataSet.zip successfully


Step 6: Detecting Dataset Folders

We now scan the TestDataSet directory and list all class folders that begin with 'n' — the standard ImageNet folder prefix.

In [7]:
# Detect folders corresponding to image classes (e.g., "n02106662")
dataset_path = "./TestDataSet"
folders = [d for d in os.listdir(dataset_path)
           if os.path.isdir(os.path.join(dataset_path, d)) and d.startswith('n')]
folders.sort()  # Ensure consistent order


Step 7: Previewing Folder Names

This helps us verify that the folder structure was detected correctly by printing a count and previewing a few folder names.

In [8]:
# Print number of folders found and preview some of them
print(f"Found {len(folders)} folders in the dataset")
print("First 5 folders:", folders[:5])


Found 100 folders in the dataset
First 5 folders: ['n02672831', 'n02676566', 'n02687172', 'n02690373', 'n02692877']


Step 8: Mapping Folders to Class Labels

Each folder is mapped to a class index using aligned order. A mismatch in folder/label count is also handled with a warning.

In [9]:
folder_to_label = {}
if len(folders) == len(class_ids):
    # Directly map folders to class IDs based on their order
    for i, folder in enumerate(folders):
        folder_to_label[folder] = class_ids[i]
    print("Created folder to label mapping based on order")
else:
    print(f"Warning: Mismatch in counts - {len(folders)} folders vs {len(class_ids)} labels")
    # Still create a mapping but with a warning
    min_length = min(len(folders), len(class_ids))
    for i in range(min_length):
        folder_to_label[folders[i]] = class_ids[i]

Created folder to label mapping based on order


Step 9: Saving Mappings to JSON

This cell saves both the folder-to-label and folder-to-name mappings to a JSON file for reuse in future evaluation steps.

In [10]:
# Save mappings as a JSON file for consistent reference across experiments
mapping_dict = {
    'folder_to_label': folder_to_label,
    'folder_to_name': {folders[i]: class_names[i] for i in range(min(len(folders), len(class_names)))}
}
with open('imagenet_mapping.json', 'w') as f:
    json.dump(mapping_dict, f, indent=4)
print("Saved mapping to imagenet_mapping.json")


Saved mapping to imagenet_mapping.json


Step 10: Creating a Custom Dataset Class

We define a custom OrderedImageNetDataset class that uses our folder-to-label mapping to load image-label pairs in a structure compatible with PyTorch’s DataLoader.

In [11]:
# Create a custom dataset class using this mapping
class OrderedImageNetDataset(Dataset):
    def __init__(self, root_dir, folder_to_label, transform=None):
        self.root_dir = root_dir
        self.folder_to_label = folder_to_label
        self.transform = transform

        # Collect all image paths and their labels
        self.samples = []
        for folder, label in folder_to_label.items():
            folder_path = os.path.join(root_dir, folder)
            if os.path.isdir(folder_path):
                for img_name in os.listdir(folder_path):
                    if img_name.lower().endswith(('.png', '.jpg', '.jpeg')):
                        img_path = os.path.join(folder_path, img_name)
                        self.samples.append((img_path, label))

        print(f"Loaded {len(self.samples)} images from {len(folder_to_label)} class folders")

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        img_path, label = self.samples[idx]
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, label

Step 11: Defining Image Transforms

This cell defines the preprocessing pipeline for input images using standard ImageNet normalization values. Images are resized, center cropped to 224×224, converted to tensors, and normalized.

In [12]:
#  Define transforms as given in the project PDF
mean_norms = np.array([0.485, 0.456, 0.406])
std_norms = np.array([0.229, 0.224, 0.225])

plain_transforms = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=mean_norms, std=std_norms)
])

Step 12: Creating the Dataset and DataLoader

This step initializes our custom dataset using the image folder and applies the defined transforms. It then wraps it in a DataLoader to allow efficient batching and iteration.

In [13]:
#  Create dataset and data loader
dataset = OrderedImageNetDataset(
    root_dir=dataset_path,
    folder_to_label=folder_to_label,
    transform=plain_transforms
)

dataloader = DataLoader(dataset, batch_size=32, shuffle=False)

Loaded 500 images from 100 class folders


Step 13: Loading the Pretrained ResNet-34 Model

Here we load the pretrained ResNet-34 model from torchvision, set it to evaluation mode, and move it to GPU if available.

In [14]:
# Load the pre-trained ResNet-34 as specified in the project
model = torchvision.models.resnet34(weights='IMAGENET1K_V1')
model.eval()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
100%|██████████| 83.3M/83.3M [00:00<00:00, 227MB/s]


Step 14: Defining the Evaluation Function

This function computes the Top-1 and Top-5 accuracy of the model on our dataset. Top-1 means the most confident prediction is correct. Top-5 means the correct label is in the top 5 predictions.

In [15]:
# Step 9: Define evaluation function as required in Task 1
def evaluate_model(model, dataloader, device):
    model.eval()
    correct_top1 = 0
    correct_top5 = 0
    total = 0

    with torch.no_grad():
        for images, labels in dataloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)

            # Top-1 accuracy
            _, predicted = outputs.max(1)
            correct_top1 += (predicted == labels).sum().item()

            # Top-5 accuracy
            _, top5_indices = outputs.topk(5, dim=1)
            for i in range(labels.size(0)):
                if labels[i] in top5_indices[i]:
                    correct_top5 += 1

            total += labels.size(0)

    top1_accuracy = 100 * correct_top1 / total
    top5_accuracy = 100 * correct_top5 / total

    return top1_accuracy, top5_accuracy

Step 15: Running Evaluation and Printing Results

This step runs the model evaluation on the full dataset and prints the resulting top-1 and top-5 accuracy, which serve as our baseline for the upcoming adversarial attacks.

In [16]:
# Step 10: Evaluate the model as required in Task 1
print(f"Evaluating ResNet-34 on {len(dataset)} images...")
top1_acc, top5_acc = evaluate_model(model, dataloader, device)
print(f"Top-1 Accuracy: {top1_acc:.2f}%")
print(f"Top-5 Accuracy: {top5_acc:.2f}%")

Evaluating ResNet-34 on 500 images...
Top-1 Accuracy: 70.60%
Top-5 Accuracy: 93.20%


Task 2: Pixel-wise Adversarial Attacks using FGSM

Task Description

In this task, we implement the Fast Gradient Sign Method (FGSM) — a simple yet powerful adversarial attack technique. The idea is to perturb each input image slightly in the direction of the gradient of the loss function with respect to the input pixels, effectively "nudging" the image in a way that confuses the classifier.
We constrain the perturbation using an L∞ norm, so that no individual pixel changes too much. The goal is to degrade model accuracy while keeping the adversarial images visually similar to the originals.

Step 1: Import Required Libraries for Attacks and Visualization

This cell imports additional libraries required for visualizing, saving images, and computing attacks.

In [17]:
# Additional libraries for adversarial attack and visualization
import os
import json
import torch
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F  # For loss computation
import torchvision.utils as vutils  # For saving image grids


Step 2: Create Output Directory for Adversarial Images

We define a directory where we’ll save all generated adversarial images.

In [18]:
# Create a directory to save the adversarial images
SAVE_PATH = "./AdversarialTestSet1"
os.makedirs(SAVE_PATH, exist_ok=True)

Step 3: Create Class Folder Structure

This replicates the original folder structure (one folder per class) inside the adversarial output directory.

In [19]:
# Copy folder structure from original dataset
for folder in folders:
    os.makedirs(os.path.join(SAVE_PATH, folder), exist_ok=True)

Step 4: Define a Modified Dataset Class for FGSM

This dataset class is extended to return extra metadata — like folder and image filename — so we can save the generated adversarial image in the right location.

In [20]:
# Dataset class modified to return filename and folder info for saving adversarial images
class FGSMDataset(Dataset):
    def __init__(self, root_dir, folder_to_label, transform=None):
        self.root_dir = root_dir
        self.folder_to_label = folder_to_label
        self.transform = transform
        self.samples = []

        # Gather image paths and related info
        for folder, label in folder_to_label.items():
            folder_path = os.path.join(root_dir, folder)
            if os.path.isdir(folder_path):
                for img_name in os.listdir(folder_path):
                    if img_name.lower().endswith(('.png', '.jpg', '.jpeg')):
                        img_path = os.path.join(folder_path, img_name)
                        self.samples.append((img_path, label, folder, img_name))

        print(f"Loaded {len(self.samples)} images from {len(folder_to_label)} class folders")

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        # Return image, label, and metadata for saving
        img_path, label, folder, img_name = self.samples[idx]
        image = Image.open(img_path).convert('RGB')
        if self.transform:
            image = self.transform(image)
        return image, label, folder, img_name


Step 5: Instantiate FGSM Dataset

This creates the FGSMDataset instance using our existing image folders and normalization transforms.

In [21]:
# Create a dataset specifically for the FGSM attack
attack_dataset = FGSMDataset(
    root_dir=dataset_path,
    folder_to_label=folder_to_label,
    transform=plain_transforms
)

Loaded 500 images from 100 class folders


Step 6: Create a DataLoader with Batch Size 1

FGSM is typically applied per image, so we use a batch size of 1 to simplify processing and saving.

In [22]:
# Use batch size of 1 to simplify attack implementation
attack_loader = DataLoader(attack_dataset, batch_size=1, shuffle=False)

Step 7: Define the FGSM Attack Function

This function perturbs the image by taking a single step in the direction of the sign of the gradient.

In [23]:
def fgsm_attack(image, epsilon, data_grad):
    # Get the sign of the gradient
    sign_data_grad = torch.sign(data_grad)

    # Create the perturbed image
    perturbed_image = image + epsilon * sign_data_grad

    # Return the perturbed image
    return perturbed_image


Step 8: Compute L∞ Distance

We use this utility to verify that the perturbation respects the ε constraint — no pixel is changed by more than ε.

In [24]:
# Function to compute L∞ distance
def l_inf_distance(original, perturbed):
    # Calculate the maximum absolute difference
    return (original - perturbed).abs().max().item()

Step 9: Denormalize and Save Image to Disk

This helper function unnormalizes a tensor image back to RGB and saves it to disk as a PNG or JPEG.

In [25]:
# Helper function to denormalize and save images
def save_normalized_image(tensor, path, mean, std):
    # Create a copy of the tensor
    img = tensor.clone()

    # Unnormalize the image
    for t, m, s in zip(img, mean, std):
        t.mul_(s).add_(m)

    # Ensure the pixel values are in [0, 1]
    img = torch.clamp(img, 0, 1)

    # Save the image
    vutils.save_image(img, path)

Step 10: Visualize Adversarial Attack Example

This function shows the original, adversarial, and difference image side-by-side, and saves them as PNG files.

In [26]:
# Visualize and save an attack example showing original, adversarial, and difference
def visualize_attack(original, adversarial, original_label, adversarial_label,
                     original_class, adversarial_class, index):
    plt.figure(figsize=(12, 5))

    # -- Original Image --
    plt.subplot(1, 3, 1)
    img = original.cpu().squeeze()
    img_display = img.clone()
    for t, m, s in zip(img_display, mean_norms, std_norms):
        t.mul_(s).add_(m)
    img_display = img_display.permute(1, 2, 0).clamp(0, 1).numpy()
    plt.imshow(img_display)
    plt.title(f"Original\nLabel: {original_label}\nClass: {original_class}")
    plt.axis('off')

    # -- Adversarial Image --
    plt.subplot(1, 3, 2)
    adv_img = adversarial.cpu().squeeze()
    adv_img_display = adv_img.clone()
    for t, m, s in zip(adv_img_display, mean_norms, std_norms):
        t.mul_(s).add_(m)
    adv_img_display = adv_img_display.permute(1, 2, 0).clamp(0, 1).numpy()
    plt.imshow(adv_img_display)
    plt.title(f"Adversarial\nLabel: {adversarial_label}\nClass: {adversarial_class}")
    plt.axis('off')

    # -- Difference Image --
    plt.subplot(1, 3, 3)
    diff = np.abs(adv_img_display - img_display) * 10  # Amplified for visibility
    plt.imshow(diff)
    plt.title("Difference (x10)")
    plt.axis('off')

    plt.tight_layout()
    plt.savefig(f'attack_example_{index}.png')
    plt.close()


Step 11: Define a Helper to Get Class Names

This utility function maps numeric class labels back to human-readable class names using the saved mapping_dict.

In [27]:
# Given a class label index, return its corresponding human-readable class name
def get_class_name(label):
    for folder, l in folder_to_label.items():
        if l == label:
            return mapping_dict['folder_to_name'].get(folder, str(label))
    return str(label)  # fallback


Step 12: Set Epsilon for FGSM

We define the ε value that controls the strength of the FGSM attack. This limits how much each pixel can be changed.




In [28]:
# Define the maximum pixel-wise perturbation for FGSM
epsilon = 0.02  # Conforms to ±1 in [0, 255] pixel range


Step 13: Initialize Evaluation Counters

These counters will be used to track the attack’s success and measure Top-1 and Top-5 accuracy on adversarial images.

In [29]:
# Counters for statistics and accuracy tracking
total_images = 0
successful_attacks = 0            # When prediction changes
correct_original = 0              # Original prediction correct
correct_adversarial_top1 = 0      # Top-1 correct on perturbed image
correct_adversarial_top5 = 0      # Top-5 correct on perturbed image
max_l_inf = 0                     # Max L∞ distance observed
avg_l_inf = 0                     # Average L∞ distance (sum, to be divided later)


Step 14: Prepare a List for Visualization

This will store up to 5 notable attack examples for visual inspection and reporting.

In [30]:
# List to store visualization examples
examples_to_visualize = []

Step 15: Perform FGSM on All Test Images

This loop goes through every image, computes gradients, generates adversarial versions, saves them, evaluates the model on them, and tracks results.

In [31]:
# Perform FGSM attack on each image
print("Starting FGSM attacks...")
for i, (image, label, folder, img_name) in enumerate(attack_loader):
    # Send data to device
    image, label = image.to(device), label.to(device)

    # Set requires_grad attribute of tensor
    image.requires_grad = True

    # Forward pass
    output = model(image)

    # Get original prediction
    _, original_pred = torch.max(output, 1)
    original_correct = (original_pred == label).item()
    if original_correct:
        correct_original += 1

    # Calculate loss
    loss = F.cross_entropy(output, label)

    # Backward pass to get gradients
    model.zero_grad()
    loss.backward()

    # Get gradients of the input image
    data_grad = image.grad.data

    # Call FGSM attack
    perturbed_image = fgsm_attack(image, epsilon, data_grad)

    # Calculate L∞ distance
    l_inf = l_inf_distance(image, perturbed_image)
    max_l_inf = max(max_l_inf, l_inf)
    avg_l_inf += l_inf

    # Forward pass with perturbed image
    with torch.no_grad():
        output_adv = model(perturbed_image)

    # Get adversarial prediction
    _, adv_pred = torch.max(output_adv, 1)

    # Check if attack was successful in changing prediction
    attack_success = (original_pred != adv_pred).item()
    if attack_success:
        successful_attacks += 1

    # Check if adversarial example is still classified correctly (top-1)
    adversarial_correct_top1 = (adv_pred == label).item()
    if adversarial_correct_top1:
        correct_adversarial_top1 += 1

    # Check top-5 accuracy
    _, top5_indices = output_adv.topk(5, dim=1)
    adversarial_correct_top5 = label.item() in top5_indices.squeeze().tolist()
    if adversarial_correct_top5:
        correct_adversarial_top5 += 1

    # Save the adversarial image
    save_path = os.path.join(SAVE_PATH, folder[0], img_name[0])
    adv_img_to_save = perturbed_image.detach().cpu().squeeze(0)
    save_normalized_image(adv_img_to_save, save_path, mean_norms, std_norms)

    # Collect examples for visualization (original correct but adversarial wrong)
    if original_correct and not adversarial_correct_top1 and len(examples_to_visualize) < 5:
        examples_to_visualize.append({
            'original': image.detach(),
            'adversarial': perturbed_image.detach(),
            'original_label': original_pred.item(),
            'adversarial_label': adv_pred.item(),
            'true_label': label.item(),
            'index': len(examples_to_visualize) + 1
        })

    # Update counter
    total_images += 1

    # Print progress
    if (i + 1) % 50 == 0:
        print(f"Processed {i + 1}/{len(attack_loader)} images")

Starting FGSM attacks...
Processed 50/500 images
Processed 100/500 images
Processed 150/500 images
Processed 200/500 images
Processed 250/500 images
Processed 300/500 images
Processed 350/500 images
Processed 400/500 images
Processed 450/500 images
Processed 500/500 images


 Step 16: Compute Final Accuracy and Statistics

This block calculates key metrics such as:

Top-1 and Top-5 accuracy after the attack

Success rate of FGSM (how often prediction changed)

Average and maximum L∞ distortion

Percentage drop in accuracy

In [32]:
# Compute key metrics for evaluating FGSM attack effectiveness
avg_l_inf = avg_l_inf / total_images                         # Average L∞ distance
attack_success_rate = 100 * successful_attacks / total_images  # % of inputs where prediction changed

# Accuracy before and after attack
original_acc_top1 = 100 * correct_original / total_images
adversarial_acc_top1 = 100 * correct_adversarial_top1 / total_images
adversarial_acc_top5 = 100 * correct_adversarial_top5 / total_images

# Drop in Top-1 accuracy
relative_drop = 100 * (original_acc_top1 - adversarial_acc_top1) / original_acc_top1


Step 17: Print the FGSM Evaluation Results

Here we print a summary of the attack performance, including distortion statistics, success rate, and accuracy drops.

In [33]:
# Print results
print(f"\nFGSM Attack Results (ε = {epsilon}):")
print(f"Maximum L∞ distance: {max_l_inf:.6f}")
print(f"Average L∞ distance: {avg_l_inf:.6f}")
print(f"Attack success rate: {attack_success_rate:.2f}%")
print("\nModel Accuracy:")
print(f"Original Top-1: {original_acc_top1:.2f}%")
print(f"Adversarial Top-1: {adversarial_acc_top1:.2f}%")
print(f"Adversarial Top-5: {adversarial_acc_top5:.2f}%")
print(f"Relative accuracy drop: {relative_drop:.2f}%")


FGSM Attack Results (ε = 0.02):
Maximum L∞ distance: 0.020000
Average L∞ distance: 0.020000
Attack success rate: 76.20%

Model Accuracy:
Original Top-1: 70.60%
Adversarial Top-1: 5.00%
Adversarial Top-5: 29.40%
Relative accuracy drop: 92.92%


Step 18: Check Against Project Goal

This cell checks whether the attack reduced accuracy by at least 50%, as required in the project specification.

In [34]:
if relative_drop >= 50:
    print("\nSuccess! Achieved more than 50% accuracy drop!")
else:
    print(f"\nTarget not met. Need 50% drop, got {relative_drop:.2f}% drop.")


Success! Achieved more than 50% accuracy drop!


Step 19: Generate Visualizations of Attack Results

Here we visualize 3–5 examples where the attack fooled the model — by showing the original, adversarial, and the difference image.

In [35]:
# Visualize examples
print("\nVisualizing attack examples...")
for example in examples_to_visualize:
    original_class = get_class_name(example['true_label'])
    adversarial_class = get_class_name(example['adversarial_label'])

    visualize_attack(
        example['original'],
        example['adversarial'],
        example['original_label'],
        example['adversarial_label'],
        original_class,
        adversarial_class,
        example['index']
    )


Visualizing attack examples...


 Step 20: Print Final Log and Save Confirmation

This logs how many examples were visualized and where the adversarial image dataset was saved.

In [36]:
print(f"Saved {len(examples_to_visualize)} visualization examples")
print(f"All adversarial images saved to {SAVE_PATH}")

Saved 5 visualization examples
All adversarial images saved to ./AdversarialTestSet1


Step 21: Create Combined Visualization Grid

This function produces a single image grid showing all selected adversarial examples side-by-side with their original and difference images. It’s useful for summarizing attack effectiveness in your final report.

In [37]:
# Create a grid-style visualization of multiple adversarial examples
def create_combined_visualization(examples_to_visualize):
    if not examples_to_visualize:
        print("No examples to visualize")
        return

    # Set up a tall figure: 1 row per example, 3 columns (original, adversarial, difference)
    plt.figure(figsize=(15, 5 * len(examples_to_visualize)))

    for i, example in enumerate(examples_to_visualize):
        original_class = get_class_name(example['true_label'])
        adversarial_class = get_class_name(example['adversarial_label'])

        # -- Original Image --
        plt.subplot(len(examples_to_visualize), 3, i * 3 + 1)
        img = example['original'].cpu().squeeze()
        img_display = img.clone()
        for t, m, s in zip(img_display, mean_norms, std_norms):
            t.mul_(s).add_(m)
        img_display = img_display.permute(1, 2, 0).clamp(0, 1).numpy()
        plt.imshow(img_display)
        plt.title(f"Original: {original_class}")
        plt.axis('off')

        # -- Adversarial Image --
        plt.subplot(len(examples_to_visualize), 3, i * 3 + 2)
        adv_img = example['adversarial'].cpu().squeeze()
        adv_img_display = adv_img.clone()
        for t, m, s in zip(adv_img_display, mean_norms, std_norms):
            t.mul_(s).add_(m)
        adv_img_display = adv_img_display.permute(1, 2, 0).clamp(0, 1).numpy()
        plt.imshow(adv_img_display)
        plt.title(f"Adversarial: {adversarial_class}")
        plt.axis('off')

        # -- Difference Image (magnified for visibility) --
        plt.subplot(len(examples_to_visualize), 3, i * 3 + 3)
        diff = np.abs(adv_img_display - img_display) * 10  # Amplify difference
        plt.imshow(diff)
        plt.title("Difference (x10)")
        plt.axis('off')

    # Save the final combined visualization to file
    plt.tight_layout()
    plt.savefig('fgsm_attack_examples.png', dpi=300)
    plt.close()
    print("Combined visualization saved to 'fgsm_attack_examples.png'")


Step 22:Individual Visualizations for Report

This block reuses the visualize_attack function to generate and save individual side-by-side comparisons of original vs adversarial examples. This might be a re-run for confirming outputs

In [38]:
# Usage with the examples collected:
print("\nVisualizing attack examples...")
for example in examples_to_visualize:
    original_class = get_class_name(example['true_label'])
    adversarial_class = get_class_name(example['adversarial_label'])

    visualize_attack(
        example['original'],
        example['adversarial'],
        example['original_label'],
        example['adversarial_label'],
        original_class,
        adversarial_class,
        example['index']
    )

print(f"Saved {len(examples_to_visualize)} visualization examples")


Visualizing attack examples...
Saved 5 visualization examples


In [39]:
# Create a combined visualization
if examples_to_visualize:
    create_combined_visualization(examples_to_visualize)

# Save the mapping and labels to the adversarial dataset folder
with open(os.path.join(SAVE_PATH, 'imagenet_mapping.json'), 'w') as f:
    json.dump(mapping_dict, f, indent=4)

Combined visualization saved to 'fgsm_attack_examples.png'


Task 3: Improved Adversarial Attacks — PGD (Projected Gradient Descent)

Task Description

The goal of this task is to build a stronger attack than FGSM. While FGSM applies a single-step perturbation, this task explores multi-step methods like Projected Gradient Descent (PGD), which repeatedly refines the perturbation to further degrade model accuracy while still adhering to an ε-bound under L∞.
We aim to:

Implement PGD with multiple gradient steps.

Maintain the L∞ constraint strictly.

Visualize and evaluate performance using the same metrics as Task 2.

Achieve a >70% accuracy drop relative to Task 1's clean accuracy.



Step 1: Prepare Directory for Adversarial Test Set 2

This creates a folder to store adversarial samples generated using PGD.




In [40]:
# Create a directory to save the adversarial images
SAVE_PATH = "./AdversarialTestSet2"
os.makedirs(SAVE_PATH, exist_ok=True)

Step 2: Replicate Folder Structure from Original Dataset

We mirror the original class-wise folder structure to organize PGD samples.

In [41]:
# Copy folder structure from original dataset
for folder in folders:
    os.makedirs(os.path.join(SAVE_PATH, folder), exist_ok=True)

 Step 3: Define Helper – Save Normalized Image

This function converts a normalized tensor back to a displayable image and saves it to disk.

In [42]:
# Corrected helper function to denormalize and save images
def save_normalized_image(tensor, path, mean, std):
    # Create a copy of the tensor
    img = tensor.clone()

    # Unnormalize the image
    for t, m, s in zip(img, mean, std):
        t.mul_(s).add_(m)

    # Ensure the pixel values are in [0, 1]
    img = torch.clamp(img, 0, 1)

    # Save the image
    vutils.save_image(img, path)


 Step 4: Define Helper – L∞ Distance

Used to compute the maximum pixel difference between original and perturbed images.

In [43]:
 #Corrected function to compute L∞ distance
def l_inf_distance(original, perturbed):
    # Calculate the maximum absolute difference
    return (original - perturbed).abs().max().item()

Step 5: Implement PGD Attack (with L∞ Constraint)

This is the core of Task 3 — a looped, iterative attack that applies multiple small perturbations and projects the result back into the ε-ball.

In [44]:
# Improved PGD Attack with strict constraint enforcement
def pgd_attack(model, image, label, epsilon, alpha, num_iter, random_start=True):
    """
    Projected Gradient Descent Attack with strict constraint enforcement

    Args:
        model: The model to attack
        image: The original image
        label: The true label
        epsilon: Maximum perturbation (L∞ norm constraint)
        alpha: Step size for each iteration
        num_iter: Number of iterations
        random_start: Whether to start with a random perturbation

    Returns:
        perturbed_image: The adversarial example
    """
    # Clone the image to avoid modifying the original
    perturbed_image = image.clone().detach()

    # Start with a random perturbation if specified
    if random_start:
        # Random perturbation within epsilon ball
        random_noise = torch.FloatTensor(perturbed_image.shape).uniform_(-epsilon, epsilon).to(device)
        perturbed_image = perturbed_image + random_noise

        # Strictly enforce the epsilon constraint
        delta = perturbed_image - image
        if delta.abs().max() > epsilon:
            delta = delta * (epsilon / delta.abs().max())
            perturbed_image = image + delta

    for i in range(num_iter):
        # Set requires_grad attribute
        perturbed_image.requires_grad = True

        # Forward pass
        outputs = model(perturbed_image)

        # Calculate loss (untargeted attack - maximize loss to true label)
        model.zero_grad()
        loss = F.cross_entropy(outputs, label)

        # Backward pass
        loss.backward()

        # Get gradient sign
        grad_sign = perturbed_image.grad.data.sign()

        # Update perturbed image
        with torch.no_grad():
            # Take a step in the gradient direction
            perturbed_image = perturbed_image + alpha * grad_sign

            # Calculate and enforce the epsilon constraint exactly
            delta = perturbed_image - image
            if delta.abs().max() > epsilon:
                delta = delta * (epsilon / delta.abs().max())

            # Apply the constrained delta
            perturbed_image = image + delta

    # Final check to ensure constraint is met
    delta = perturbed_image - image
    max_delta = delta.abs().max().item()
    if max_delta > epsilon:
        # Apply strict scaling
        scale_factor = epsilon / max_delta
        delta = delta * scale_factor
        perturbed_image = image + delta

    # Verify constraint
    final_linf = l_inf_distance(image, perturbed_image)
    assert final_linf <= epsilon + 1e-5, f"Constraint violation: {final_linf} > {epsilon}"

    return perturbed_image

Step 6: Define Visualization for PGD Examples

This is the same structure as FGSM, adapted for PGD examples.

In [45]:
# Function to visualize attack examples
def visualize_attack(original, adversarial, original_label, adversarial_label,
                     original_class, adversarial_class, index):
    """
    Visualize the original and adversarial images side by side
    along with their predictions
    """
    plt.figure(figsize=(12, 5))

    # Display original image
    plt.subplot(1, 3, 1)
    img = original.cpu().squeeze()  # Shape [3, 224, 224]

    # Unnormalize
    img_display = img.clone()
    for t, m, s in zip(img_display, mean_norms, std_norms):
        t.mul_(s).add_(m)

    # Properly permute for display
    img_display = img_display.permute(1, 2, 0).clamp(0, 1).numpy()
    plt.imshow(img_display)
    plt.title(f"Original\nLabel: {original_label}\nClass: {original_class}")
    plt.axis('off')

    # Display adversarial image
    plt.subplot(1, 3, 2)
    adv_img = adversarial.cpu().squeeze()  # Shape [3, 224, 224]

    # Unnormalize
    adv_img_display = adv_img.clone()
    for t, m, s in zip(adv_img_display, mean_norms, std_norms):
        t.mul_(s).add_(m)

    # Properly permute for display
    adv_img_display = adv_img_display.permute(1, 2, 0).clamp(0, 1).numpy()
    plt.imshow(adv_img_display)
    plt.title(f"Adversarial\nLabel: {adversarial_label}\nClass: {adversarial_class}")
    plt.axis('off')

    # Display the difference (magnified for visibility)
    plt.subplot(1, 3, 3)
    diff = np.abs(adv_img_display - img_display) * 10  # Magnify by 10x to make differences visible
    plt.imshow(diff)
    plt.title("Difference (x10)")
    plt.axis('off')

    plt.tight_layout()
    plt.savefig(f'pgd_attack_example_{index}.png')
    plt.close()

Step 7: Combined Visualization of PGD Results

Creates a full grid of PGD adversarial examples for reporting.

In [46]:
# Function to create a combined visualization of examples
def create_combined_visualization(examples_to_visualize):
    if not examples_to_visualize:
        print("No examples to visualize")
        return

    # Create a figure with 3 columns (original, adversarial, difference) and one row per example
    plt.figure(figsize=(15, 5 * len(examples_to_visualize)))

    for i, example in enumerate(examples_to_visualize):
        original_class = get_class_name(example['true_label'])
        adversarial_class = get_class_name(example['adversarial_label'])

        # Original image
        plt.subplot(len(examples_to_visualize), 3, i*3 + 1)
        img = example['original'].cpu().squeeze()  # Shape [3, 224, 224]

        # Unnormalize
        img_display = img.clone()
        for t, m, s in zip(img_display, mean_norms, std_norms):
            t.mul_(s).add_(m)

        # Permute correctly
        img_display = img_display.permute(1, 2, 0).clamp(0, 1).numpy()
        plt.imshow(img_display)
        plt.title(f"Original: {original_class}")
        plt.axis('off')

        # Adversarial image
        plt.subplot(len(examples_to_visualize), 3, i*3 + 2)
        adv_img = example['adversarial'].cpu().squeeze()  # Shape [3, 224, 224]

        # Unnormalize
        adv_img_display = adv_img.clone()
        for t, m, s in zip(adv_img_display, mean_norms, std_norms):
            t.mul_(s).add_(m)

        # Permute correctly
        adv_img_display = adv_img_display.permute(1, 2, 0).clamp(0, 1).numpy()
        plt.imshow(adv_img_display)
        plt.title(f"Adversarial: {adversarial_class}")
        plt.axis('off')

        # Difference (magnified)
        plt.subplot(len(examples_to_visualize), 3, i*3 + 3)
        diff = np.abs(adv_img_display - img_display) * 10  # Magnify by 10x
        plt.imshow(diff)
        plt.title("Difference (x10)")
        plt.axis('off')

    plt.tight_layout()
    plt.savefig('pgd_attack_examples.png', dpi=300)
    plt.close()
    print("Combined visualization saved to 'pgd_attack_examples.png'")

 Step 8: Prepare DataLoader for PGD

Just like FGSM, we process one image at a time, so we use a batch size of 1.

In [47]:
# Function to get class name from label
def get_class_name(label):
    for folder, l in folder_to_label.items():
        if l == label:
            return mapping_dict['folder_to_name'].get(folder, str(label))
    return str(label)

Step 6: Define a Custom Dataset Class for PGD

We define FGSMDataset again to retrieve not only image tensors and labels, but also metadata like the folder and filename. This is useful for saving adversarial images in the correct locations.

In [49]:
# Modified dataset class for loading images
class FGSMDataset(Dataset):
    def __init__(self, root_dir, folder_to_label, transform=None):
        self.root_dir = root_dir
        self.folder_to_label = folder_to_label
        self.transform = transform

        # Collect all image paths and their labels
        self.samples = []
        for folder, label in folder_to_label.items():
            folder_path = os.path.join(root_dir, folder)
            if os.path.isdir(folder_path):
                for img_name in os.listdir(folder_path):
                    if img_name.lower().endswith(('.png', '.jpg', '.jpeg')):
                        img_path = os.path.join(folder_path, img_name)
                        # Store folder name and file name for saving later
                        self.samples.append((img_path, label, folder, img_name))

        print(f"Loaded {len(self.samples)} images from {len(folder_to_label)} class folders")

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        img_path, label, folder, img_name = self.samples[idx]
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, label, folder, img_name

Step 7: Create the Dataset for PGD

We now create an instance of this dataset using the dataset folder and transform pipeline.

In [75]:
# Create a dataset for the attack
attack_dataset = FGSMDataset(
    root_dir=dataset_path,
    folder_to_label=folder_to_label,
    transform=plain_transforms
)

Loaded 500 images from 100 class folders


Step 8: Load Dataset with Batch Size 1

We use a batch size of 1 so that each image can be attacked and saved individually.

In [76]:
# Use batch size of 1 to simplify attack implementation
attack_loader = DataLoader(attack_dataset, batch_size=1, shuffle=False)


 Step 9: Set PGD Attack Parameters
These hyperparameters define the strength and structure of the PGD attack:

epsilon: maximum allowable pixel change (same as FGSM).

alpha: step size in each iteration.

num_iter: number of PGD steps.

random_start: whether to begin from a random point within the ε-ball.

In [50]:
# Set attack parameters
epsilon = 0.02       # Maximum perturbation (same as FGSM)
alpha = 0.002        # Step size (smaller than epsilon)
num_iter = 20        # Number of iterations (increased)
random_start = True  # Use random initialization

 Step 10: Initialize Counters for Evaluation

These counters will help track model performance on clean and adversarial examples, as well as record distortion statistics.

In [51]:
# Initialize counters for statistics
total_images = 0
successful_attacks = 0  # Where prediction changed
correct_original = 0
correct_adversarial_top1 = 0
correct_adversarial_top5 = 0
max_l_inf = 0
avg_l_inf = 0

In [52]:
# List to store visualization examples
examples_to_visualize = []

 Step 12: Run PGD Attack on the Dataset

This loop applies the PGD attack to each image, tracks success/failure, measures distortion, saves adversarial images, and stores examples for visualization.

In [53]:
# Perform PGD attack on each image
print("Starting PGD attacks...")
for i, (image, label, folder, img_name) in enumerate(attack_loader):
    # Send data to device
    image, label = image.to(device), label.to(device)

    # Get original prediction
    with torch.no_grad():
        output = model(image)
        _, original_pred = torch.max(output, 1)

    original_correct = (original_pred == label).item()
    if original_correct:
        correct_original += 1

    # Perform PGD attack
    perturbed_image = pgd_attack(
        model=model,
        image=image,
        label=label,
        epsilon=epsilon,
        alpha=alpha,
        num_iter=num_iter,
        random_start=random_start
    )

    # Calculate L∞ distance
    l_inf = l_inf_distance(image, perturbed_image)
    max_l_inf = max(max_l_inf, l_inf)
    avg_l_inf += l_inf

    # Forward pass with perturbed image
    with torch.no_grad():
        output_adv = model(perturbed_image)

    # Get adversarial prediction
    _, adv_pred = torch.max(output_adv, 1)

    # Check if attack was successful in changing prediction
    attack_success = (original_pred != adv_pred).item()
    if attack_success:
        successful_attacks += 1

    # Check if adversarial example is still classified correctly (top-1)
    adversarial_correct_top1 = (adv_pred == label).item()
    if adversarial_correct_top1:
        correct_adversarial_top1 += 1

    # Check top-5 accuracy
    _, top5_indices = output_adv.topk(5, dim=1)
    adversarial_correct_top5 = label.item() in top5_indices.squeeze().tolist()
    if adversarial_correct_top5:
        correct_adversarial_top5 += 1

    # Save the adversarial image
    save_path = os.path.join(SAVE_PATH, folder[0], img_name[0])
    adv_img_to_save = perturbed_image.detach().cpu().squeeze(0)
    save_normalized_image(adv_img_to_save, save_path, mean_norms, std_norms)

    # Collect examples for visualization (original correct but adversarial wrong)
    if original_correct and not adversarial_correct_top1 and len(examples_to_visualize) < 5:
        examples_to_visualize.append({
            'original': image.detach(),
            'adversarial': perturbed_image.detach(),
            'original_label': original_pred.item(),
            'adversarial_label': adv_pred.item(),
            'true_label': label.item(),
            'index': len(examples_to_visualize) + 1
        })

    # Update counter
    total_images += 1

    # Print progress
    if (i + 1) % 50 == 0:
        print(f"Processed {i + 1}/{len(attack_loader)} images")

Starting PGD attacks...
Processed 50/500 images
Processed 100/500 images
Processed 150/500 images
Processed 200/500 images
Processed 250/500 images
Processed 300/500 images
Processed 350/500 images
Processed 400/500 images
Processed 450/500 images
Processed 500/500 images


Step 13: Compute PGD Evaluation Statistics

After looping through all images, we compute final statistics like distortion, accuracy, and relative accuracy drop.

In [54]:
# Calculate final statistics
avg_l_inf = avg_l_inf / total_images
attack_success_rate = 100 * successful_attacks / total_images
original_acc_top1 = 100 * correct_original / total_images
adversarial_acc_top1 = 100 * correct_adversarial_top1 / total_images
adversarial_acc_top5 = 100 * correct_adversarial_top5 / total_images
relative_drop = 100 * (original_acc_top1 - adversarial_acc_top1) / original_acc_top1


 Step 14: Print Final Results for PGD

Print the success of the attack including accuracy degradation, distortion, and success rate.




In [58]:
# Print results
print(f"\nPGD Attack Results (ε = {epsilon}, α = {alpha}, steps = {num_iter}):")
print(f"Maximum L∞ distance: {max_l_inf:.6f}")
print(f"Average L∞ distance: {avg_l_inf:.6f}")
print(f"Attack success rate: {attack_success_rate:.2f}%")
print("\nModel Accuracy:")
print(f"Original Top-1: {original_acc_top1:.2f}%")
print(f"Adversarial Top-1: {adversarial_acc_top1:.2f}%")
print(f"Adversarial Top-5: {adversarial_acc_top5:.2f}%")
print(f"Relative accuracy drop: {relative_drop:.2f}%")


PGD Attack Results (ε = 0.02, α = 0.002, steps = 20):
Maximum L∞ distance: 0.020000
Average L∞ distance: 0.020000
Attack success rate: 82.00%

Model Accuracy:
Original Top-1: 70.60%
Adversarial Top-1: 0.40%
Adversarial Top-5: 12.00%
Relative accuracy drop: 99.43%


In [59]:
if relative_drop >= 70:
    print("\nSuccess! Achieved more than 70% accuracy drop!")
else:
    print(f"\nTarget not met. Need 70% drop, got {relative_drop:.2f}% drop.")


Success! Achieved more than 70% accuracy drop!


 Step 15: Visualize PGD Attack Examples Individually

This block loops through a few handpicked adversarial examples (stored earlier), and uses visualize_attack() to save individual comparison images.

In [60]:
# Visualize examples
print("\nVisualizing attack examples...")
for example in examples_to_visualize:
    original_class = get_class_name(example['true_label'])
    adversarial_class = get_class_name(example['adversarial_label'])

    visualize_attack(
        example['original'],
        example['adversarial'],
        example['original_label'],
        example['adversarial_label'],
        original_class,
        adversarial_class,
        example['index']
    )

print(f"Saved {len(examples_to_visualize)} visualization examples")



Visualizing attack examples...
Saved 5 visualization examples


Step 16: Create Combined Visualization Grid

This function generates a grid image showing all examples side by side — great for including in your final report or presentation.

In [61]:
# Create a combined visualization
if examples_to_visualize:
    create_combined_visualization(examples_to_visualize)

# Save the mapping and labels to the adversarial dataset folder
with open(os.path.join(SAVE_PATH, 'imagenet_mapping2.json'), 'w') as f:
    json.dump(mapping_dict, f, indent=4)

Combined visualization saved to 'pgd_attack_examples.png'


In [100]:
import os
import json
import torch
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F
import torchvision.utils as vutils
import random

Task 4: Localized Patch Attacks (Targeted and Optimized)

Task Description

In this task, we design localized adversarial attacks by modifying only a small patch of the image. Unlike global attacks like FGSM/PGD, patch attacks aim to be less perceptible by targeting a square region of the image. The goal is to:

Apply perturbation only in a localized square area (e.g., 32×32 pixels),

Choose the most sensitive area of the image (highest gradient),

Perform targeted misclassification, pushing predictions toward specific classes.

We aim to generate:

Successful adversarial samples within patch bounds,

Visualizations of perturbed patches,

A new dataset AdversarialTestSet3

In [62]:
# Create a directory to save the adversarial images
SAVE_PATH = "./AdversarialTestSet3"
os.makedirs(SAVE_PATH, exist_ok=True)

Step 1: Replicate Folder Structure for Adversarial Output

We mirror the original folder structure (one per class) to save patch-attacked images in an organized way.




In [63]:
# Copy folder structure from original dataset
for folder in folders:
    os.makedirs(os.path.join(SAVE_PATH, folder), exist_ok=True)

In [64]:
# Helper function to denormalize and save images
def save_normalized_image(tensor, path, mean, std):
    # Create a copy of the tensor
    img = tensor.clone()

    # Unnormalize the image
    for t, m, s in zip(img, mean, std):
        t.mul_(s).add_(m)

    # Ensure the pixel values are in [0, 1]
    img = torch.clamp(img, 0, 1)

    # Save the image
    vutils.save_image(img, path)

In [65]:
# Function to compute L∞ distance (only in the patch region)
def patch_l_inf_distance(original, perturbed, patch_mask):
    # Calculate the maximum absolute difference only in the patch region
    diff = (original - perturbed).abs() * patch_mask
    return diff.max().item()

Step 2: Identify High-Gradient Patch Location

This method finds the most sensitive 32×32 region in the image by computing the gradient of the loss w.r.t. the input pixels. The region with the highest accumulated gradient magnitude is selected as the target patch location.

In [66]:
# Find the most important region to attack based on gradient magnitude
def find_important_region(model, image, label, patch_size=32):
    """Locate the region with highest gradient energy"""
    image.requires_grad = True
    output = model(image)
    loss = F.cross_entropy(output, label)
    model.zero_grad()
    loss.backward()

    # Get gradient magnitude
    grad_magnitude = image.grad.abs().sum(1).squeeze(0)  # Sum across channels

    # Use a sliding window to find region with highest gradient energy
    max_energy = -float('inf')
    max_x, max_y = 0, 0

    height, width = grad_magnitude.shape

    for y in range(height - patch_size + 1):
        for x in range(width - patch_size + 1):
            window = grad_magnitude[y:y+patch_size, x:x+patch_size]
            energy = window.sum().item()
            if energy > max_energy:
                max_energy = energy
                max_x, max_y = x, y

    return max_x, max_y

Step 3: Optimized Patch Attack – Targeted & Adaptive

This function:

Finds the most sensitive region in the image to perturb,

Selects a list of likely or effective target classes,

Iteratively applies gradient updates only inside the patch to fool the model into predicting a specific target class,

Falls back to high-contrast universal perturbation or best effort if no target succeeds.

python
Copy code


In [67]:
# Enhanced single patch attack with optimized targeting and placement
def optimized_patch_attack(model, image, label, patch_size=32, epsilon=1.0, alpha=0.1, num_iter=100):
    """
    Enhanced single-patch attack that finds the most vulnerable location
    and uses a highly-optimized targeted approach
    """
    batch_size, channels, height, width = image.shape

    # First, find the most important region to attack
    with torch.enable_grad():
        patch_x, patch_y = find_important_region(model, image.clone(), label, patch_size)

    # Create a mask for the patch location
    patch_mask = torch.zeros_like(image)
    patch_mask[:, :, patch_y:patch_y+patch_size, patch_x:patch_x+patch_size] = 1.0

    # Try multiple target classes
    successful_perturbed = None
    successful_target = None
    best_loss = float('inf')

    # Get predictions for the original image
    with torch.no_grad():
        original_output = model(image)
        _, original_pred = torch.max(original_output, 1)

        # Find the most confusing classes (second highest prediction and others)
        # These are good targets for the attack
        probs = F.softmax(original_output, dim=1)
        sorted_probs, sorted_indices = torch.sort(probs, dim=1, descending=True)

        # Create a prioritized list of targets:
        # 1. Classes the model thinks are somewhat plausible (ranks 1-5)
        # 2. Some fixed targets known to work well
        # 3. Completely unrelated classes
        target_candidates = []

        # Add the top-5 predicted classes (except the true prediction)
        for i in range(1, 5):  # Skip the top prediction (index 0)
            if i < sorted_indices.shape[1]:
                target_candidates.append(sorted_indices[0, i].item())

        # Add some known effective targets
        fixed_targets = [0, 150, 281, 389, 417, 504, 970]
        target_candidates.extend(fixed_targets)

        # Remove duplicates while preserving order
        target_candidates = list(dict.fromkeys(target_candidates))

        # Make sure the true label isn't a target
        if label.item() in target_candidates:
            target_candidates.remove(label.item())

    # For each target class, try to generate an adversarial example
    for target_idx in target_candidates[:10]:  # Try up to 10 targets
        target_class = torch.tensor([target_idx], device=image.device)

        # Initialize with some noise in the patch area
        perturbed_image = image.clone().detach()
        noise = torch.randn_like(image) * 0.1
        perturbed_image = perturbed_image + noise * patch_mask

        # Try a more direct adversarial pattern for this target
        for i in range(num_iter):
            perturbed_image.requires_grad = True
            outputs = model(perturbed_image)

            # Targeted attack: maximize target class, minimize true class
            model.zero_grad()

            # Get logits for target and true class
            target_logits = outputs[:, target_idx]
            true_logits = outputs.gather(1, label.unsqueeze(1)).squeeze()

            # Use a margin loss
            loss = true_logits - target_logits + 10.0

            # Check if we're already succeeding
            with torch.no_grad():
                _, pred = torch.max(outputs, 1)
                if pred.item() == target_idx:
                    successful_perturbed = perturbed_image.detach()
                    successful_target = target_class
                    break

            # Backward pass
            loss.backward()

            # Update perturbed image (only in patch area)
            with torch.no_grad():
                # Only update the patch area
                grad = perturbed_image.grad.data
                update = alpha * torch.sign(grad) * patch_mask
                perturbed_image = perturbed_image - update  # Minimize loss

                # Project perturbation back to epsilon ball (only in patch)
                delta = (perturbed_image - image) * patch_mask
                delta = torch.clamp(delta, -epsilon, epsilon)
                perturbed_image = image + delta

                # If this is our best attempt so far, save it
                if loss.item() < best_loss:
                    best_loss = loss.item()
                    best_perturbed = perturbed_image.detach()

        # If we found a successful attack with this target, return it
        if successful_perturbed is not None:
            return successful_perturbed, patch_mask

    # If no target was completely successful, use a stronger approach
    # with the most promising target (the one with lowest loss)

    # If we have a "best" perturbed image, check if it actually works
    if 'best_perturbed' in locals():
        with torch.no_grad():
            outputs = model(best_perturbed)
            _, pred = torch.max(outputs, 1)
            if pred.item() != label.item():
                return best_perturbed, patch_mask

    # If everything else failed, try a universal adversarial pattern
    # Use a high-contrast pattern that often works well
    universal_perturbed = image.clone()

    # Create a checkerboard pattern
    for c in range(channels):
        for y in range(patch_size):
            for x in range(patch_size):
                if (x + y) % 2 == 0:
                    universal_perturbed[:, c, patch_y+y, patch_x+x] = image[:, c, patch_y+y, patch_x+x] + epsilon
                else:
                    universal_perturbed[:, c, patch_y+y, patch_x+x] = image[:, c, patch_y+y, patch_x+x] - epsilon

    # Ensure we stay within bounds in the patch area
    delta = (universal_perturbed - image) * patch_mask
    delta = torch.clamp(delta, -epsilon, epsilon)
    universal_perturbed = image + delta

    # Check if this works
    with torch.no_grad():
        outputs = model(universal_perturbed)
        _, pred = torch.max(outputs, 1)
        if pred.item() != label.item():
            return universal_perturbed, patch_mask

    # If all else fails, return our best attempt
    if 'best_perturbed' in locals():
        return best_perturbed, patch_mask
    else:
        return universal_perturbed, patch_mask


Step 4: Visualize Individual Patch Attack Examples
This function displays and saves four views for each example:

Original image

Adversarial image

Patch mask

Difference image (amplified)

In [68]:
# Function to visualize attack examples
def visualize_patch_attack(original, adversarial, patch_mask, original_label, adversarial_label,
                     original_class, adversarial_class, index):
    """
    Visualize the original, adversarial, and patch mask
    """
    plt.figure(figsize=(15, 5))

    # Display original image
    plt.subplot(1, 4, 1)
    img = original.cpu().squeeze()

    # Unnormalize
    img_display = img.clone()
    for t, m, s in zip(img_display, mean_norms, std_norms):
        t.mul_(s).add_(m)

    # Properly permute for display
    img_display = img_display.permute(1, 2, 0).clamp(0, 1).numpy()
    plt.imshow(img_display)
    plt.title(f"Original\nLabel: {original_label}\nClass: {original_class}")
    plt.axis('off')

    # Display adversarial image
    plt.subplot(1, 4, 2)
    adv_img = adversarial.cpu().squeeze()

    # Unnormalize
    adv_img_display = adv_img.clone()
    for t, m, s in zip(adv_img_display, mean_norms, std_norms):
        t.mul_(s).add_(m)

    # Properly permute for display
    adv_img_display = adv_img_display.permute(1, 2, 0).clamp(0, 1).numpy()
    plt.imshow(adv_img_display)
    plt.title(f"Adversarial\nLabel: {adversarial_label}\nClass: {adversarial_class}")
    plt.axis('off')

    # Display patch mask
    plt.subplot(1, 4, 3)
    mask = patch_mask.cpu().squeeze()[0]  # Take first channel of mask
    plt.imshow(mask.numpy(), cmap='gray')
    plt.title("Patch Location")
    plt.axis('off')

    # Display the difference (magnified for visibility)
    plt.subplot(1, 4, 4)
    diff = np.abs(adv_img_display - img_display) * 3  # Magnify by 3x
    plt.imshow(diff)
    plt.title("Difference (x3)")
    plt.axis('off')

    plt.tight_layout()
    plt.savefig(f'patch_attack_example_{index}.png')
    plt.close()

In [69]:
# Function to create a combined visualization
def create_combined_patch_visualization(examples_to_visualize):
    if not examples_to_visualize:
        print("No examples to visualize")
        return

    # Create a figure with 4 columns and one row per example
    plt.figure(figsize=(20, 5 * len(examples_to_visualize)))

    for i, example in enumerate(examples_to_visualize):
        original_class = get_class_name(example['true_label'])
        adversarial_class = get_class_name(example['adversarial_label'])

        # Original image
        plt.subplot(len(examples_to_visualize), 4, i*4 + 1)
        img = example['original'].cpu().squeeze()

        # Unnormalize
        img_display = img.clone()
        for t, m, s in zip(img_display, mean_norms, std_norms):
            t.mul_(s).add_(m)

        # Permute correctly
        img_display = img_display.permute(1, 2, 0).clamp(0, 1).numpy()
        plt.imshow(img_display)
        plt.title(f"Original: {original_class}")
        plt.axis('off')

        # Adversarial image
        plt.subplot(len(examples_to_visualize), 4, i*4 + 2)
        adv_img = example['adversarial'].cpu().squeeze()

        # Unnormalize
        adv_img_display = adv_img.clone()
        for t, m, s in zip(adv_img_display, mean_norms, std_norms):
            t.mul_(s).add_(m)

        # Permute correctly
        adv_img_display = adv_img_display.permute(1, 2, 0).clamp(0, 1).numpy()
        plt.imshow(adv_img_display)
        plt.title(f"Adversarial: {adversarial_class}")
        plt.axis('off')

        # Patch mask
        plt.subplot(len(examples_to_visualize), 4, i*4 + 3)
        mask = example['patch_mask'].cpu().squeeze()[0]
        plt.imshow(mask.numpy(), cmap='gray')
        plt.title("Patch Location")
        plt.axis('off')

        # Difference (magnified)
        plt.subplot(len(examples_to_visualize), 4, i*4 + 4)
        diff = np.abs(adv_img_display - img_display) * 3  # Magnify by 3x
        plt.imshow(diff)
        plt.title("Difference (x3)")
        plt.axis('off')

    plt.tight_layout()
    plt.savefig('patch_attack_examples.png', dpi=300)
    plt.close()
    print("Combined visualization saved to 'patch_attack_examples.png'")

In [70]:
# Create a dataset for the attack
attack_dataset = FGSMDataset(
    root_dir=dataset_path,
    folder_to_label=folder_to_label,
    transform=plain_transforms
)


Loaded 500 images from 100 class folders


In [71]:
# Use batch size of 1 for attack implementation
attack_loader = DataLoader(attack_dataset, batch_size=1, shuffle=False)

In [72]:
# Set attack parameters
patch_size = 32    # Size of the square patch
epsilon = 0.5      # Maximum perturbation (increased significantly)
alpha = 0.1        # Step size
num_iter = 100     # More iterations for better convergence


In [73]:
# Initialize counters for statistics
total_images = 0
successful_attacks = 0
correct_original = 0
correct_adversarial_top1 = 0
correct_adversarial_top5 = 0
max_patch_l_inf = 0
avg_patch_l_inf = 0

In [74]:
# List to store visualization examples
examples_to_visualize = []

Step 5: Run Patch Attack and Save Results

This loop runs optimized_patch_attack(...) for each image and saves the results and metrics.

In [75]:
# Perform optimized patch attack on each image
print("Starting optimized patch attacks...")
for i, (image, label, folder, img_name) in enumerate(attack_loader):
    # Send data to device
    image, label = image.to(device), label.to(device)

    # Get original prediction
    with torch.no_grad():
        output = model(image)
        _, original_pred = torch.max(output, 1)

    original_correct = (original_pred == label).item()
    if original_correct:
        correct_original += 1

    # Perform optimized patch attack
    perturbed_image, patch_mask = optimized_patch_attack(
        model=model,
        image=image,
        label=label,
        patch_size=patch_size,
        epsilon=epsilon,
        alpha=alpha,
        num_iter=num_iter
    )

    # Calculate L∞ distance in the patch region
    l_inf = patch_l_inf_distance(image, perturbed_image, patch_mask)
    max_patch_l_inf = max(max_patch_l_inf, l_inf)
    avg_patch_l_inf += l_inf

    # Forward pass with perturbed image
    with torch.no_grad():
        output_adv = model(perturbed_image)

    # Get adversarial prediction
    _, adv_pred = torch.max(output_adv, 1)

    # Check if attack was successful in changing prediction
    attack_success = (original_pred != adv_pred).item()
    if attack_success:
        successful_attacks += 1

    # Check if adversarial example is still classified correctly (top-1)
    adversarial_correct_top1 = (adv_pred == label).item()
    if adversarial_correct_top1:
        correct_adversarial_top1 += 1

    # Check top-5 accuracy
    _, top5_indices = output_adv.topk(5, dim=1)
    adversarial_correct_top5 = label.item() in top5_indices.squeeze().tolist()
    if adversarial_correct_top5:
        correct_adversarial_top5 += 1

    # Save the adversarial image
    save_path = os.path.join(SAVE_PATH, folder[0], img_name[0])
    adv_img_to_save = perturbed_image.detach().cpu().squeeze(0)
    save_normalized_image(adv_img_to_save, save_path, mean_norms, std_norms)

    # Collect examples for visualization (original correct but adversarial wrong)
    if original_correct and attack_success and len(examples_to_visualize) < 5:
        examples_to_visualize.append({
            'original': image.detach(),
            'adversarial': perturbed_image.detach(),
            'patch_mask': patch_mask.detach(),
            'original_label': original_pred.item(),
            'adversarial_label': adv_pred.item(),
            'true_label': label.item(),
            'index': len(examples_to_visualize) + 1
        })

    # Update counter
    total_images += 1

    # Print progress
    if (i + 1) % 50 == 0:
        print(f"Processed {i + 1}/{len(attack_loader)} images")
        # Print intermediate results
        if total_images > 0:
            interim_attack_rate = 100 * successful_attacks / total_images
            print(f"Current attack success rate: {interim_attack_rate:.2f}%")

Starting optimized patch attacks...
Processed 50/500 images
Current attack success rate: 90.00%
Processed 100/500 images
Current attack success rate: 92.00%
Processed 150/500 images
Current attack success rate: 94.00%
Processed 200/500 images
Current attack success rate: 93.50%
Processed 250/500 images
Current attack success rate: 93.60%
Processed 300/500 images
Current attack success rate: 93.67%
Processed 350/500 images
Current attack success rate: 93.71%
Processed 400/500 images
Current attack success rate: 93.75%
Processed 450/500 images
Current attack success rate: 93.56%
Processed 500/500 images
Current attack success rate: 93.60%


 Step 14: Compute Patch Attack Metrics

Final statistics include attack success rate, accuracy drop, and average patch distortion.

In [76]:
# Calculate final statistics
avg_patch_l_inf = avg_patch_l_inf / total_images
attack_success_rate = 100 * successful_attacks / total_images
original_acc_top1 = 100 * correct_original / total_images
adversarial_acc_top1 = 100 * correct_adversarial_top1 / total_images
adversarial_acc_top5 = 100 * correct_adversarial_top5 / total_images
relative_drop = 100 * (original_acc_top1 - adversarial_acc_top1) / original_acc_top1

# Print results
print(f"\nOptimized Patch Attack Results (patch size = {patch_size}x{patch_size}, ε = {epsilon}):")
print(f"Maximum L∞ distance in patch: {max_patch_l_inf:.6f}")
print(f"Average L∞ distance in patch: {avg_patch_l_inf:.6f}")
print(f"Attack success rate: {attack_success_rate:.2f}%")
print("\nModel Accuracy:")
print(f"Original Top-1: {original_acc_top1:.2f}%")
print(f"Adversarial Top-1: {adversarial_acc_top1:.2f}%")
print(f"Adversarial Top-5: {adversarial_acc_top5:.2f}%")
print(f"Relative accuracy drop: {relative_drop:.2f}%")


Optimized Patch Attack Results (patch size = 32x32, ε = 0.5):
Maximum L∞ distance in patch: 0.500000
Average L∞ distance in patch: 0.491455
Attack success rate: 93.60%

Model Accuracy:
Original Top-1: 70.60%
Adversarial Top-1: 5.60%
Adversarial Top-5: 81.60%
Relative accuracy drop: 92.07%


Step 15: Visualize and Save Examples

Display side-by-side examples to visually confirm localized perturbation impact.

In [77]:
# Visualize examples
print("\nVisualizing attack examples...")
for example in examples_to_visualize:
    original_class = get_class_name(example['true_label'])
    adversarial_class = get_class_name(example['adversarial_label'])

    visualize_patch_attack(
        example['original'],
        example['adversarial'],
        example['patch_mask'],
        example['original_label'],
        example['adversarial_label'],
        original_class,
        adversarial_class,
        example['index']
    )

print(f"Saved {len(examples_to_visualize)} visualization examples")



Visualizing attack examples...
Saved 5 visualization examples


In [78]:
# Create a combined visualization
if examples_to_visualize:
    create_combined_patch_visualization(examples_to_visualize)


Combined visualization saved to 'patch_attack_examples.png'


In [79]:
# Save the mapping and labels to the adversarial dataset folder
with open(os.path.join(SAVE_PATH, 'imagenet_mapping.json'), 'w') as f:
    json.dump(mapping_dict, f, indent=4)

In [80]:
# Save the patch attack results
results = {
    'patch_size': patch_size,
    'epsilon': epsilon,
    'alpha': alpha,
    'num_iter': num_iter,
    'max_patch_l_inf': float(max_patch_l_inf),
    'avg_patch_l_inf': float(avg_patch_l_inf),
    'attack_success_rate': float(attack_success_rate),
    'original_acc_top1': float(original_acc_top1),
    'adversarial_acc_top1': float(adversarial_acc_top1),
    'adversarial_acc_top5': float(adversarial_acc_top5),
    'relative_drop': float(relative_drop)
}

with open('patch_attack_results.json', 'w') as f:
    json.dump(results, f, indent=4)
print("Results saved to 'patch_attack_results.json'")

Results saved to 'patch_attack_results.json'


Task 5: Transferability Analysis of Adversarial Attacks

 Task Objective

In this final task, we test how well the adversarial examples generated for ResNet-34 (FGSM, PGD, Patch) transfer to other model architectures. This helps understand whether adversarial vulnerabilities are model-specific or universal.

Step 1: Import Required Libraries

This cell imports essential libraries for data loading, visualization, model evaluation, and analysis.

In [81]:


import os
import json
import torch
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import seaborn as sns

 Step 2: Define Dataset Paths

We define the paths to the original and all three adversarial test sets.

In [82]:
# Paths to all datasets
ORIGINAL_PATH = "./TestDataSet"
FGSM_PATH = "./AdversarialTestSet1"
PGD_PATH = "./AdversarialTestSet2"
PATCH_PATH = "./AdversarialTestSet3"

Step 3: Load Class Mapping

We load the previously saved mapping between folder names and label indices to ensure consistency.




In [83]:
# Make sure we have the mapping file
try:
    with open('imagenet_mapping.json', 'r') as f:
        mapping_dict = json.load(f)
    folder_to_label = mapping_dict['folder_to_label']
except:
    print("Warning: Could not load imagenet_mapping.json, results may be incorrect")
    folder_to_label = {}

 Step 4: Define Image Transformations

We reuse the same preprocessing transforms as used in earlier tasks to ensure consistency across evaluations.

In [84]:
# Transform for preprocessing images
mean_norms = np.array([0.485, 0.456, 0.406])
std_norms = np.array([0.229, 0.224, 0.225])

plain_transforms = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=mean_norms, std=std_norms)
])


Step 5: Define Dataset Loader Class

This class loads images from the dataset folders and applies the required transforms.

In [85]:
# Dataset class for loading images
class OrderedImageNetDataset(Dataset):
    def __init__(self, root_dir, folder_to_label, transform=None):
        self.root_dir = root_dir
        self.folder_to_label = folder_to_label
        self.transform = transform

        # Collect all image paths and their labels
        self.samples = []
        for folder, label in folder_to_label.items():
            folder_path = os.path.join(root_dir, folder)
            if os.path.isdir(folder_path):
                for img_name in os.listdir(folder_path):
                    if img_name.lower().endswith(('.png', '.jpg', '.jpeg')):
                        img_path = os.path.join(folder_path, img_name)
                        self.samples.append((img_path, label))

        print(f"Loaded {len(self.samples)} images from {len(folder_to_label)} class folders")

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        img_path, label = self.samples[idx]
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, label

Step 6: Load All Models for Transferability

We use pretrained models from torchvision: ResNet34, DenseNet121, VGG16, MobileNetV2, and EfficientNet-B0.

In [86]:
# Evaluation function
def evaluate_model(model, dataloader, device):
    model.eval()
    correct_top1 = 0
    correct_top5 = 0
    total = 0

    with torch.no_grad():
        for images, labels in dataloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)

            # Top-1 accuracy
            _, predicted = outputs.max(1)
            correct_top1 += (predicted == labels).sum().item()

            # Top-5 accuracy
            _, top5_indices = outputs.topk(5, dim=1)
            for i in range(labels.size(0)):
                if labels[i] in top5_indices[i]:
                    correct_top5 += 1

            total += labels.size(0)

    top1_accuracy = 100 * correct_top1 / total
    top5_accuracy = 100 * correct_top5 / total

    return top1_accuracy, top5_accuracy

 Step 7: Evaluate Model on a Dataset

Computes Top-1 and Top-5 accuracy for a given model and dataset.

In [87]:
# Load different model architectures for testing transferability
def load_models():
    models = {}


    # Different architectures to test transferability
    models['DenseNet-121'] = torchvision.models.densenet121(weights='IMAGENET1K_V1')
    models['VGG-16'] = torchvision.models.vgg16(weights='IMAGENET1K_V1')
    models['MobileNetV2'] = torchvision.models.mobilenet_v2(weights='IMAGENET1K_V1')
    models['EfficientNetB0'] = torchvision.models.efficientnet_b0(weights='IMAGENET1K_V1')

    # Set all models to evaluation mode
    for model_name, model in models.items():
        model.eval()

    return models

Step 10: Create Dataloaders for All Datasets

This function prepares dataloaders for the original and adversarial datasets (FGSM, PGD, Patch) using consistent transforms and mappings.




In [88]:
# Create dataloaders for all datasets
def create_dataloaders():
    dataloaders = {}

    # Original dataset
    if os.path.exists(ORIGINAL_PATH):
        original_dataset = OrderedImageNetDataset(
            root_dir=ORIGINAL_PATH,
            folder_to_label=folder_to_label,
            transform=plain_transforms
        )
        dataloaders['Original'] = DataLoader(original_dataset, batch_size=32, shuffle=False)

    # FGSM adversarial dataset
    if os.path.exists(FGSM_PATH):
        fgsm_dataset = OrderedImageNetDataset(
            root_dir=FGSM_PATH,
            folder_to_label=folder_to_label,
            transform=plain_transforms
        )
        dataloaders['FGSM'] = DataLoader(fgsm_dataset, batch_size=32, shuffle=False)

    # PGD adversarial dataset
    if os.path.exists(PGD_PATH):
        pgd_dataset = OrderedImageNetDataset(
            root_dir=PGD_PATH,
            folder_to_label=folder_to_label,
            transform=plain_transforms
        )
        dataloaders['PGD'] = DataLoader(pgd_dataset, batch_size=32, shuffle=False)

    # Patch adversarial dataset
    if os.path.exists(PATCH_PATH):
        patch_dataset = OrderedImageNetDataset(
            root_dir=PATCH_PATH,
            folder_to_label=folder_to_label,
            transform=plain_transforms
        )
        dataloaders['Patch'] = DataLoader(patch_dataset, batch_size=32, shuffle=False)

    return dataloaders


Step 11: Run Evaluation Across All Models and Datasets

This version of main() uses create_dataloaders() for clean separation of concerns.

In [89]:
# Evaluate all models on all datasets
def evaluate_transferability():
    # Load all models
    print("Loading models...")
    models = load_models()

    # Create dataloaders
    print("Loading datasets...")
    dataloaders = create_dataloaders()

    # Set device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")

    # Move models to device
    for model_name, model in models.items():
        models[model_name] = model.to(device)

    # Results will be stored in this dictionary
    results = {
        'Model': [],
        'Dataset': [],
        'Top-1 Accuracy': [],
        'Top-5 Accuracy': []
    }

    # Evaluate each model on each dataset
    for model_name, model in models.items():
        print(f"\nEvaluating {model_name}...")

        for dataset_name, dataloader in dataloaders.items():
            print(f"  Testing on {dataset_name} dataset...")
            top1_acc, top5_acc = evaluate_model(model, dataloader, device)

            # Store results
            results['Model'].append(model_name)
            results['Dataset'].append(dataset_name)
            results['Top-1 Accuracy'].append(top1_acc)
            results['Top-5 Accuracy'].append(top5_acc)

            print(f"    Top-1 Accuracy: {top1_acc:.2f}%")
            print(f"    Top-5 Accuracy: {top5_acc:.2f}%")

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)

    return results_df

Step 12: Analyze Transferability Results

This function:

Computes relative accuracy drops from clean data to adversarial inputs,

Summarizes attack effectiveness per model and per method,

Displays a formatted table of accuracy drop (%) for each model across FGSM, PGD, and Patch attacks.

In [90]:
def analyze_transferability(results_df):
    print("\n--- Transferability Analysis ---")

    # Remove ResNet-34 from the analysis
    transferability_df = results_df[results_df['Model'] != 'ResNet-34'].copy()

    # Add a column for accuracy drop
    transferability_df['Accuracy Drop'] = 0.0

    # For each model and attack, calculate the drop from original accuracy
    for model in transferability_df['Model'].unique():
        # Get original accuracy for this model
        original_acc = transferability_df[(transferability_df['Model'] == model) &
                                         (transferability_df['Dataset'] == 'Original')]['Top-1 Accuracy'].values[0]

        # Calculate drop for each attack
        for attack in ['FGSM', 'PGD', 'Patch']:
            attack_acc = transferability_df[(transferability_df['Model'] == model) &
                                           (transferability_df['Dataset'] == attack)]['Top-1 Accuracy'].values[0]

            # Calculate absolute and relative drop
            absolute_drop = original_acc - attack_acc
            relative_drop = (absolute_drop / original_acc) * 100

            # Store in dataframe
            idx = (transferability_df['Model'] == model) & (transferability_df['Dataset'] == attack)
            transferability_df.loc[idx, 'Accuracy Drop'] = relative_drop

    # Display results by attack type
    print("\nAccuracy Drop by Attack Type (higher = more effective transfer):")
    for attack in ['FGSM', 'PGD', 'Patch']:
        print(f"\n{attack} Attack Transferability:")

        attack_data = transferability_df[transferability_df['Dataset'] == attack]

        for model in attack_data['Model'].unique():
            drop = attack_data[attack_data['Model'] == model]['Accuracy Drop'].values[0]
            attack_success_rate = 100 - attack_data[attack_data['Model'] == model]['Top-1 Accuracy'].values[0]

            print(f"  {model}: {drop:.2f}% accuracy drop ({attack_success_rate:.2f}% attack success rate)")

    # Display results by model
    print("\nModel Vulnerability to Transferred Attacks:")
    for model in transferability_df['Model'].unique():
        model_data = transferability_df[(transferability_df['Model'] == model) &
                                       (transferability_df['Dataset'] != 'Original')]

        avg_drop = model_data['Accuracy Drop'].mean()
        print(f"  {model}: {avg_drop:.2f}% average accuracy drop")

    # Calculate average transferability for each attack
    print("\nAverage Transferability Across All Models:")
    for attack in ['FGSM', 'PGD', 'Patch']:
        attack_data = transferability_df[transferability_df['Dataset'] == attack]
        avg_drop = attack_data['Accuracy Drop'].mean()

        print(f"  {attack}: {avg_drop:.2f}% average accuracy drop")

    # Create a summary table
    print("\nSummary Table (Accuracy Drop %):")
    print("-" * 60)
    print(f"{'Model':<15} {'FGSM':<10} {'PGD':<10} {'Patch':<10} {'Average':<10}")
    print("-" * 60)

    for model in sorted(transferability_df['Model'].unique()):
        fgsm_drop = transferability_df[(transferability_df['Model'] == model) &
                                      (transferability_df['Dataset'] == 'FGSM')]['Accuracy Drop'].values[0]
        pgd_drop = transferability_df[(transferability_df['Model'] == model) &
                                     (transferability_df['Dataset'] == 'PGD')]['Accuracy Drop'].values[0]
        patch_drop = transferability_df[(transferability_df['Model'] == model) &
                                       (transferability_df['Dataset'] == 'Patch')]['Accuracy Drop'].values[0]
        avg = (fgsm_drop + pgd_drop + patch_drop) / 3

        print(f"{model:<15} {fgsm_drop:<10.2f} {pgd_drop:<10.2f} {patch_drop:<10.2f} {avg:<10.2f}")

    print("-" * 60)

    # Calculate averages for each attack
    fgsm_avg = transferability_df[transferability_df['Dataset'] == 'FGSM']['Accuracy Drop'].mean()
    pgd_avg = transferability_df[transferability_df['Dataset'] == 'PGD']['Accuracy Drop'].mean()
    patch_avg = transferability_df[transferability_df['Dataset'] == 'Patch']['Accuracy Drop'].mean()
    total_avg = (fgsm_avg + pgd_avg + patch_avg) / 3

    print(f"{'Average':<15} {fgsm_avg:<10.2f} {pgd_avg:<10.2f} {patch_avg:<10.2f} {total_avg:<10.2f}")
    print("-" * 60)

    return transferability_df

Step 13: Visualize Transferability Results

This function generates three clean visualizations:

Heatmaps of Top-1 and Top-5 accuracy

Bar plots comparing accuracy across models and datasets

Line plot of relative accuracy drop (effectiveness of each attack)

In [91]:
def visualize_results(results_df):
    import matplotlib.pyplot as plt
    import seaborn as sns
    import numpy as np
    import pandas as pd

    # 1. Top-1 and Top-5 Accuracy heatmaps
    plt.figure(figsize=(14, 10))

    plt.subplot(2, 1, 1)
    top1_pivot = results_df.pivot_table(
        values='Top-1 Accuracy',
        index='Model',
        columns='Dataset'
    )
    sns.heatmap(top1_pivot, annot=True, cmap="YlGnBu", fmt=".2f", cbar_kws={'label': 'Accuracy (%)'})
    plt.title('Top-1 Accuracy Across Models and Datasets')

    plt.subplot(2, 1, 2)
    top5_pivot = results_df.pivot_table(
        values='Top-5 Accuracy',
        index='Model',
        columns='Dataset'
    )
    sns.heatmap(top5_pivot, annot=True, cmap="YlGnBu", fmt=".2f", cbar_kws={'label': 'Accuracy (%)'})
    plt.title('Top-5 Accuracy Across Models and Datasets')

    plt.tight_layout()
    plt.savefig('transferability_heatmap.png', dpi=300)
    plt.close()

    # 2. Bar plots
    plt.figure(figsize=(14, 8))
    models = results_df['Model'].unique()
    bar_width = 0.2
    index = np.arange(len(models))

    plt.subplot(2, 1, 1)
    for i, dataset in enumerate(['Original', 'FGSM', 'PGD', 'Patch']):
        dataset_results = results_df[results_df['Dataset'] == dataset]
        dataset_results = dataset_results.set_index('Model').reindex(models)
        plt.bar(index + i*bar_width, dataset_results['Top-1 Accuracy'], bar_width, label=dataset)

    plt.xlabel('Model Architecture')
    plt.ylabel('Top-1 Accuracy (%)')
    plt.title('Top-1 Accuracy Comparison Across Models and Attacks')
    plt.xticks(index + bar_width * 1.5, models, rotation=45, ha='right')
    plt.legend()
    plt.grid(axis='y', linestyle='--', alpha=0.7)

    plt.subplot(2, 1, 2)
    for i, dataset in enumerate(['Original', 'FGSM', 'PGD', 'Patch']):
        dataset_results = results_df[results_df['Dataset'] == dataset]
        dataset_results = dataset_results.set_index('Model').reindex(models)
        plt.bar(index + i*bar_width, dataset_results['Top-5 Accuracy'], bar_width, label=dataset)

    plt.xlabel('Model Architecture')
    plt.ylabel('Top-5 Accuracy (%)')
    plt.title('Top-5 Accuracy Comparison Across Models and Attacks')
    plt.xticks(index + bar_width * 1.5, models, rotation=45, ha='right')
    plt.legend()
    plt.grid(axis='y', linestyle='--', alpha=0.7)

    plt.tight_layout()
    plt.savefig('transferability_barplot.png', dpi=300)
    plt.close()

    # 3. Line plot: Attack Effectiveness (Accuracy Drop)
    plt.figure(figsize=(10, 6))
    drops = []
    models_list = []
    attacks_list = []

    for model in results_df['Model'].unique():
        original_acc = results_df[(results_df['Model'] == model) &
                                  (results_df['Dataset'] == 'Original')]['Top-1 Accuracy'].values[0]

        for attack in ['FGSM', 'PGD', 'Patch']:
            attack_acc = results_df[(results_df['Model'] == model) &
                                    (results_df['Dataset'] == attack)]['Top-1 Accuracy'].values[0]
            drop = (original_acc - attack_acc) / original_acc * 100

            drops.append(drop)
            models_list.append(model)
            attacks_list.append(attack)

    drops_df = pd.DataFrame({
        'Model': models_list,
        'Attack': attacks_list,
        'Accuracy Drop (%)': drops
    })

    for attack in ['FGSM', 'PGD', 'Patch']:
        attack_data = drops_df[drops_df['Attack'] == attack]
        plt.plot(attack_data['Model'], attack_data['Accuracy Drop (%)'],
                 marker='o', label=attack)

    plt.grid(linestyle='--', alpha=0.7)
    plt.xlabel('Model Architecture')
    plt.ylabel('Accuracy Drop (%)')
    plt.title('Attack Effectiveness Across Model Architectures')
    plt.legend()
    plt.xticks(rotation=45, ha='right')

    plt.tight_layout()
    plt.savefig('attack_effectiveness.png', dpi=300)
    plt.close()

    print("Visualizations saved to 'transferability_heatmap.png', 'transferability_barplot.png', and 'attack_effectiveness.png'")


Step 14: Main Function – Run the Entire Transferability Pipeline

This function orchestrates:

Accuracy evaluation across all models & datasets

CSV saving of raw results

Analysis of cross-model attack transferability

Generation of plots and insights for reporting

In [92]:
# Main function to run the entire analysis
def main():
    print("Starting transferability analysis (Task 5)...")

    # Evaluate transferability
    results_df = evaluate_transferability()

    # Save results to CSV
    results_df.to_csv('transferability_results.csv', index=False)
    print("Results saved to 'transferability_results.csv'")

    # Analyze transferability
    analyze_transferability(results_df)

    # Visualize results
    visualize_results(results_df)

    print("\nTransferability analysis complete!")

    # Print summary and insights
    print("\n--- Summary and Insights ---")
    print("1. Transferability between architectures: Different model architectures exhibit varying levels of robustness to adversarial attacks generated for ResNet-34.")
    print("2. Attack effectiveness comparison: PGD attacks generally transfer better than FGSM attacks, while patch attacks may show unique transferability patterns.")
    print("3. Model robustness: Some architectures appear more naturally robust to transferred attacks than others.")
    print("4. Mitigation strategies: These results suggest potential approaches for improving model robustness, such as ensemble methods and adversarial training.")

    return results_df

In [93]:
# Run the analysis if this script is executed directly
if __name__ == "__main__":
    results_df = main()

Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to /root/.cache/torch/hub/checkpoints/densenet121-a639ec97.pth


Starting transferability analysis (Task 5)...
Loading models...


100%|██████████| 30.8M/30.8M [00:00<00:00, 204MB/s]
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|██████████| 528M/528M [00:02<00:00, 214MB/s]
Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
100%|██████████| 13.6M/13.6M [00:00<00:00, 164MB/s]
Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth
100%|██████████| 20.5M/20.5M [00:00<00:00, 192MB/s]


Loading datasets...
Loaded 500 images from 100 class folders
Loaded 500 images from 100 class folders
Loaded 500 images from 100 class folders
Loaded 500 images from 100 class folders
Using device: cuda

Evaluating DenseNet-121...
  Testing on Original dataset...
    Top-1 Accuracy: 70.80%
    Top-5 Accuracy: 91.20%
  Testing on FGSM dataset...
    Top-1 Accuracy: 61.60%
    Top-5 Accuracy: 85.20%
  Testing on PGD dataset...
    Top-1 Accuracy: 63.80%
    Top-5 Accuracy: 86.60%
  Testing on Patch dataset...
    Top-1 Accuracy: 64.00%
    Top-5 Accuracy: 85.60%

Evaluating VGG-16...
  Testing on Original dataset...
    Top-1 Accuracy: 66.80%
    Top-5 Accuracy: 91.20%
  Testing on FGSM dataset...
    Top-1 Accuracy: 55.20%
    Top-5 Accuracy: 81.60%
  Testing on PGD dataset...
    Top-1 Accuracy: 56.60%
    Top-5 Accuracy: 81.40%
  Testing on Patch dataset...
    Top-1 Accuracy: 57.40%
    Top-5 Accuracy: 81.20%

Evaluating MobileNetV2...
  Testing on Original dataset...
    Top-1 Accur