# **Visualizing Adversarial Attacks on Image Classification Models**

This notebook helps visualize the adversarial attack that you performed as part of your assignment by comparing original images and their adversarially perturbed versions.  
For each image, we show:
- The **original image** and its correct label.
- The **adversarial image** generated by an attack.
- The **new label predicted by the model after the attack**.

This helps understand how adversarial attacks can deceive deep learning models and change their predictions.

In [8]:
import torch
import matplotlib.pyplot as plt
import json

# Inverse normalization based on ResNet50_Weights.DEFAULT normalization parameters
def inv_normalize(tensor):
    mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1).to(tensor.device)
    std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1).to(tensor.device)
    inv_tensor = tensor * std + mean
    return torch.clamp(inv_tensor, 0, 1)


In [9]:
def show_comparison(original, adversarial, title_original="Original", title_adv="Adversarial"):
    # Move tensors to CPU, convert to NumPy arrays and transpose to HWC format
    orig_np = original.cpu().detach().numpy().transpose(1, 2, 0)
    adv_np = adversarial.cpu().detach().numpy().transpose(1, 2, 0)

    fig, axs = plt.subplots(1, 2, figsize=(10, 5))
    axs[0].imshow(orig_np)
    axs[0].set_title(title_original)
    axs[0].axis('off')
    axs[1].imshow(adv_np)
    axs[1].set_title(title_adv)
    axs[1].axis('off')
    plt.show()

In [10]:
# Load ImageNet labels from the JSON file
with open("../data/imagenet_labels.json", "r") as f:
    imagenet_classes = json.load(f)

# Function to get class label
def get_label(class_index):
    return imagenet_classes[class_index] if 0 <= class_index < len(imagenet_classes) else f"Class {class_index}"


## **Load and Visualize Adversarial Images**
This step loads the saved results and visualizes:
- The original image with its true label.
- The adversarial image with the label **misclassified** by the model.

### **Expected Outputs**
Each pair of images will have:
1. **Original Image** with its correct label.
2. **Adversarial Image** with the label predicted after the attack.


In [11]:
# Load the results from the attack
results_path = "path/to/results"
num_images = 5
results = torch.load(results_path)

adv_images = results["adv_images"] 
original_images = results["clean_images"] 
labels = results["labels"] 
adv_labels_predictions = results["adv_labels_predictions"]

num_show = min(num_images, adv_images.shape[0])

# de-normalize both sets for proper visualization.
original_images = torch.stack([inv_normalize(img) for img in original_images])
adv_images_to_show = torch.stack([inv_normalize(img) for img in adv_images[:num_show]])

# Loop over the pairs and display them side by side.
for i in range(num_show):
        original_label = imagenet_classes[labels[i].item()]
        adv_label = imagenet_classes[adv_labels_predictions[i]]

        show_comparison(original_images[i], adv_images_to_show[i],
                        title_original=f"Original: {original_label}",
                        title_adv=f"Adversarial: {adv_label}")

FileNotFoundError: [Errno 2] No such file or directory: 'path/to/results'

# **Conclusion** 
By comparing the original and adversarial images, we can see how small perturbations can trick deep learning models into making incorrect predictions.

### **Key Takeaways**
- **Adversarial examples** are visually similar to original images but mislead the model.
- **Even state-of-the-art models like ResNet50 can be fooled** by adversarial perturbations.

Feel free to experiment with different adversarial attack methods and analyze their effects!
