<a href="https://colab.research.google.com/github/AB123456kkkkk/streamlit-workshop/blob/main/Workshop_7_Adversarial_Attacks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This workshop is based on an original notebook here: https://savan77.github.io/blog/imagenet_adv_examples.html

# Step 1: Installation & Setup

All of the code blocks in this step are already correct; you just need to run them to set things up.

This code block imports all the necessary python libraries so that we can use them later:

In [None]:
import torch
import torch.nn
import torch.nn.functional as F
import torchvision.models as models
from torchvision.utils import save_image
from PIL import Image
from torchvision import transforms
import numpy as np
import requests, io
import matplotlib.pyplot as plt
from torch.autograd import Variable
import json
%matplotlib inline

This code block downloads a list of class labels that will be handy in the future:

In [None]:
!gdown https://drive.google.com/drive/folders/1GpzV817YM1-1tqglgsUVISbtcgitvzbP?usp=share_link -O ./data --folder

This code block creates a "visualize" function that helps us compare the original image to the adversarial image:

In [None]:
def visualize(x, x_adv, x_grad, epsilon, clean_pred, adv_pred, clean_prob, adv_prob):
  x = x.squeeze(0)     #remove batch dimension # B X C H X W ==> C X H X W
  x = x.mul(torch.FloatTensor(std).view(3,1,1)).add(torch.FloatTensor(mean).view(3,1,1)) #reverse of normalization op- "unnormalize"
  save_image(x, "original.png")
  x = np.transpose(x.numpy(), (1,2,0))   # C X H X W  ==>   H X W X C
  x = np.clip(x, 0, 1)
  
  x_adv = x_adv.squeeze(0)
  x_adv = x_adv.mul(torch.FloatTensor(std).view(3,1,1)).add(torch.FloatTensor(mean).view(3,1,1)) #reverse of normalization op
  save_image(x_adv, "adversarial.png")
  x_adv = np.transpose( x_adv.numpy() , (1,2,0))   # C X H X W  ==>   H X W X C
  x_adv = np.clip(x_adv, 0, 1)
  
  x_grad = x_grad.squeeze(0).numpy()
  x_grad = np.transpose(x_grad, (1,2,0))
  x_grad = np.clip(x_grad, 0, 1)
  
  figure, ax = plt.subplots(1,3, figsize=(18,8))
  ax[0].imshow(x)
  ax[0].set_title('Clean Example', fontsize=20)
  
  ax[1].imshow(x_grad)
  ax[1].set_title('Perturbation', fontsize=20)
  ax[1].set_yticklabels([])
  ax[1].set_xticklabels([])
  ax[1].set_xticks([])
  ax[1].set_yticks([])

  
  ax[2].imshow(x_adv)
  ax[2].set_title('Adversarial Example', fontsize=20)
  
  ax[0].axis('off')
  ax[2].axis('off')

  ax[0].text(1.1,0.5, "+{}*".format(round(epsilon,3)), size=15, ha="center", 
            transform=ax[0].transAxes)
  
  ax[0].text(0.5,-0.13, "Prediction: {}\n Probability: {}".format(clean_pred, clean_prob), size=15, ha="center", 
        transform=ax[0].transAxes)
  
  ax[1].text(1.1,0.5, " = ", size=15, ha="center", transform=ax[1].transAxes)

  ax[2].text(0.5,-0.13, "Prediction: {}\n Probability: {}".format(adv_pred, adv_prob), size=15, ha="center", 
        transform=ax[2].transAxes)

  plt.show()

This code block creates a `get_image` function that gets an image from a URL and returns it in three different formats, two of which depend on PyTorch.

In [None]:
# When normalizing images, use this mean and standard deviation
# (Don't change these.)
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

def get_image(url):
  # Get image from URL
  response = requests.get(url)
  img = Image.open(io.BytesIO(response.content)).convert('RGB')

  # Preprocess image into desired tensor format
  preprocess = transforms.Compose([
    transforms.Resize((299,299)),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
  ])

  img_tensor = preprocess(img)
  img_tensor = img_tensor.unsqueeze(0) # add batch dimension (C*H*W => B*C*H*W)

  img_variable = Variable(img_tensor, requires_grad=True)

  return (img, img_tensor, img_variable)

Make sure you have run all 4 of these code blocks before moving on to the next step.

# Step 2: View ImageNet Gallery

We are going to be attacking the Inception v3 model, which was trained by Google to classify images into one of 1,000 different categories. The model was trained using a dataset called "ImageNet" which is a huge collection of millions of images, each pre-categorized into those 1,000 classes.

The following gallery shows one example image from the ImageNet dataset for each of the 1,000 classes so that you can see what all the different classes are.

Open the gallery and spend some time exploring the dataset:

https://github.com/PullJosh/imagenet-sample-images/blob/master/gallery.md#gallery-of-imagenet-sample-images

# Step 3: Download & Test the "Inception v3" Model

## Step 3A: Download the model

Since we're using someone else's pre-trained model, we can just download it! (We don't need to train the model ourselves.) Run the following code block to download the model.

In [None]:
# Download the Inception v3 model, pre-trained so that
# it works right out of the box.
inceptionv3 = models.inception_v3(weights=models.Inception_V3_Weights.DEFAULT)

# Set the model to "evaluation" mode, because we want to *use*
# the model, not train it.
inceptionv3.eval()

print("Model downloaded!")

## Step 3B: Download the label names

This model classifies images into one of 1,000 different classes. Each class has a name ("label"), but we have to download them ourselves. The following code block stores the labels in the `label` variable and prints them out.

**You should be able to see that the label numbers correspond to the numbers shown in [the gallery](https://github.com/PullJosh/imagenet-sample-images/blob/master/gallery.md#gallery-of-imagenet-sample-images).**

In [None]:
with open("data/labels.json", "r") as file:
  # Get the labels from the file
  labels = json.load(file)
  
  # Convert label indicies from strings to numbers
  labels = {int(index):label for index, label in labels.items()}

print("Labels:", labels)

## Step 3C: Get an image from the internet

This part is fun! We need an image to classify, so we will download one from the internet. The following code grabs an image from a URL and then displays it.

**You can change the image URL if you want! Just make sure you choose something that the AI will be able to classify (it needs to belong to one of the 1,000 different classes.)** And it's best *NOT* to use one of the training images, so don't take one directly from the gallery. Use Google Images instead.

In [None]:
(img, img_tensor, img_variable) = get_image("https://i.imgur.com/AcVntvT.png")
img # Show image

## Step 3D: Ask the model to classify the image

Let's pass our image into the `inceptionv3` model and see what the output is:

In [None]:
# Get neural network's predictions for this image
output = inceptionv3.forward(img_variable)

print(output)

As you can see, the model gives a huge list of 1,000 numbers as output. Each of the 1,000 numbers corresponds to one of the classes, and a large positive value means the model thinks this class is a good fit for your image. A negative number means the model thinks it's a bad fit.

Let's convert these positive and negative numbers into a probability distribution, which is 1,000 numbers that are all between 0 and 1 (corresponding to 0% likelihood or 100% likelyhood).

In [None]:
# Convert output to a probability distribution using softmax
# (The probability distribution tells you how likely it is that
# the image belongs to each of the 1000 different classes.)
output_probs = F.softmax(output, dim=1)

print(output_probs)

As you can see, these numbers are all positive now. (Although most are scaled to be extremely small... like $10^{-5}$ small.)

Finally, we can figure out the best class for the image by finding which class has the highest probability. The following code does that:

In [None]:
# Get the index of the most likely class (the predicted label for this image)
label_index = int(torch.argmax(output_probs))
label = labels[label_index]

# Get probability of most likely class
pred_prob = float(output_probs[0][label_index])

print(f"I predict with {round(pred_prob * 100, 2)}% confidence that this image belongs to class {label_index} ({label})")

If you stuck with the original cat image we gave you, it should say 74.89% confidence that the image belongs to class 282 (tiger cat).

If you chose your own image, hopefully the output you got seems reasonable.

# Method 1: Move Away from the Correct Answer

Remember, in Method 1 we are going to *increase* the loss of the model, because that means it will be less accurate at classifying the image.

The gradient tells us how to adjust each pixel in order to increase the loss, as seen here:

<center>
  <img src="https://miro.medium.com/max/1400/1*jR_MrEHPtGlcn1PRvxN6xw.jpeg" width="400px" />
</center>

That's exactly what we're about to do.

## Method 1A: Set the `target` to move AWAY from

In [None]:
# Set the target to be 282 (tiger cat), because that's the
# CORRECT classification of the input image. (We will try
# to make the model predict anything OTHER than tiger cat.)

# TODO: If you are using a different image, change this number to
# be the CORRECT classification of that image.
target = Variable(torch.LongTensor([282]), requires_grad=False)
print(target)

## Method 1B: Calculate the loss & gradient based on the chosen target

This code computes the gradient for the entire network. In particular, we care that it computes the gradient for the input image.

In [None]:
# The loss function for the model is called Cross Entropy Loss
# and it is used when you're classifying something
loss = torch.nn.CrossEntropyLoss()

# In this case, the calculated loss should measure the difference
# between what the model predicted and the correct output value `target`
# which we just specified.
loss_cal = loss(output, target)

# Calculate the gradients for everything (including the input image,
# which is what we care about) in terms of this loss function.
if img_variable.grad is not None:
  img_variable.grad.zero_() # Flush (reset) gradients first if needed
loss_cal.backward(retain_graph=True)

## Method 1C: Generate the adversarial image

Now that we have the gradient computed, we're ready to adjust the input image.

The variables at your disposal are `img_variable.data`, `epsilon`, and `gradient`. You want to set `adv_img_tensor` to be the adversarial image. Can you find the simple equation to do this? **Edit the code below with your equation.**

![Slide screenshot](https://i.imgur.com/tSnryEX.png)

In [None]:
epsilon = 0.02

# Get the computed gradient for the input image
gradient = torch.sign(img_variable.grad.data)

# TODO: Generate the adversarial image using
# img_variable.data, epsilon, and gradient.
# You should fill this in using a very simple equation
# with just those three variables.
adv_img_tensor = # ???

Hopefully your `adv_img_tensor` is correct! We'll find out in the next step.

## Method 1D: Ask the model to classify the adversarial image

**The following code is incomplete!** In step 3D we asked the model to classify the original image. Now we want to ask it to classify the adversarial image, `adv_img_variable`. Using step 3D as a reference, can you fill in all the `# ???` spaces in the code?

Once you do it successfully, you'll be able to see what the model thinks of your adversarial image vs. the input image.

In [None]:
# Get neural network's predictions for this adversarial image
adv_img_variable = Variable(adv_img_tensor)
adv_output = # ???

adv_output_probs = # ???

adv_label_index = # ???
adv_label = # ???

adv_pred_prob = # ???

print(f"I predict with {round(pred_prob * 100, 2)}% confidence that the original image belongs to class {label_index} ({label})")
print(f"I predict with {round(adv_pred_prob * 100, 2)}% confidence that the newly-generated adversarial image belongs to class {adv_label_index} ({adv_label})")
print()

visualize(img_tensor, adv_img_tensor, gradient, epsilon, label, adv_label, pred_prob, adv_pred_prob)

Hopefully you were able to successfully trick the model into making the wrong prediction.

As a final summary of what we did, take a look at this image:

![]()

# Method 2: Move Toward a Specific "Target" Wrong Answer

## Method 2A: Set the target to move TOWARD

**It's decision time!** With method 2, you get to choose what output label you want to trick the model into believing. We've included 288 (leopard) as a default, but you can change it to anything you want. Refer to the [ImageNet gallery](https://github.com/PullJosh/imagenet-sample-images/blob/master/gallery.md#gallery-of-imagenet-sample-images) for a reminder of which number corresponds to which label. Choose your favorite and put it in the code!

In [None]:
# Set the target to be 288 (leopard), because that's the
# TARGET classification that we want to trick the model
# into believing.

# You can choose any target you want! Ideally the computer
# will be able to generate an adversarial image that tricks
# the model into believing that this is the correct output class.
target = Variable(torch.LongTensor([288]), requires_grad=False)
print(target)

## Method 2B: Calculate the loss & gradient based on the chosen target

Just like in Method 1, we're going to compute the gradient of the loss function. But this time, it's the loss based on the target you chose above, which means that in step 2C we will want to *decrease* the loss rather than increase it.

**This code is already correct. You just need to run it.**

In [None]:
# In this case, the calculated loss should measure the difference
# between what the model predicted and the correct output value `target`
# which we just specified.
loss_cal = loss(output, target)

# Calculate the gradients for everything (including the input image,
# which is what we care about) in terms of this loss function.
if img_variable.grad is not None:
  img_variable.grad.zero_() # Flush (reset) gradients first if needed
loss_cal.backward(retain_graph=True)

## Method 2C: Generate the adversarial image

Aha! Time to generate the adversarial image. This is very similar to Method 1C, but this time we want to *decrease* the loss rather than increase it, which means that we need to do the *opposite* of what the gradient tells us to do.

**You should insert an equation here that is similar to what you did in 1C, but updated to go *against* the gradient rather than with it.**

In [None]:
epsilon = 0.02

# Get the computed gradient for the input image
gradient = torch.sign(img_variable.grad.data)

# TODO: Generate the adversarial image using
# img_variable.data, epsilon, and gradient.
# (Remember that this time you want to MINIMIZE the loss,
# moving closer to the target)
adv_img_tensor = # ???

print(gradient)

## Method 2D: Ask the model to classify the adversarial image

Once again, we have generated an adversarial image and are ready to see what the model thinks.

**This code is already complete.** Run it to see the model's prediction. Did you trick the model?

**Then, try this method again but with a different chosen target in Method 2A.** Does it give you the output you wanted? Does it not? These things don't always work, but you should be able to play around with it and get a nice result eventually.

In [None]:
# Get neural network's predictions for this adversarial image
adv_img_variable = Variable(adv_img_tensor)
adv_output = inceptionv3.forward(adv_img_variable)

adv_output_probs = F.softmax(adv_output, dim=1)

adv_label_index = int(torch.argmax(adv_output_probs))
adv_label = labels[adv_label_index]

adv_pred_prob = float(adv_output_probs[0][adv_label_index])

print(f"I predict with {round(pred_prob * 100, 2)}% confidence that the original image belongs to class {label_index} ({label})")
print(f"I predict with {round(adv_pred_prob * 100, 2)}% confidence that the newly-generated adversarial image belongs to class {adv_label_index} ({adv_label})")
print()

visualize(img_tensor, adv_img_tensor, gradient, epsilon, label, adv_label, pred_prob, adv_pred_prob)

# Method 3: Iteratively Move Toward a Specific Wrong Answer

## Method 3A: Set the target to move TOWARD

Just like in method 2, you can choose whatever target you want and the computer will try to produce an image that the model classifies as the target you set.

In [None]:
# Choose whatever target you want! 9 is ostrich.
target = Variable(torch.LongTensor([9]), requires_grad=False)

## Method 3B: Iteratively calculate the gradient & apply it to the image many times

This step works just like 2B & 2C, but it performs multiple nudges iteratively to get a better result.

**We've completed the code for you because it's pretty annoying to do yourself.** Writing this code requires a bit of battling against PyTorch to make sure all the gradients are calculated correctly, and it's not super fun. Definitely try to understand it though!

**However, there are things to tweak!** Try changing `epsilon`, `num_steps`, and `alpha` and watch how your results change.

In [None]:
# In past methods, we have left img_variable as is and
# created a new output variable called adv_img_variable
# containing our output tensor.

# This time, since we're using an iterative method, we will
# repeatedly edit the contents of img_variable to make it
# more and more like to be classified as the desired `target`

# In case you run this cell multiple times, reset img_variable
# to its initial value before any changes were made to it:
img_variable.data = img_tensor

epsilon = 0.25
num_steps = 5
alpha = 0.025

for i in range(num_steps):
  # Reset gradient
  if img_variable.grad is not None:
    img_variable.grad.zero_()

  # Classify this work-in-progress image
  output = inceptionv3.forward(img_variable)

  # Compute the loss value compared to the target output we're aiming for
  loss = torch.nn.CrossEntropyLoss()
  loss_cal = loss(output, target)
  loss_cal.backward()

  # Update the image based on the gradient
  x_grad = alpha * torch.sign(img_variable.grad.data)
  adv_temp = img_variable.data - x_grad
  total_grad = adv_temp - img_tensor
  total_grad = torch.clamp(total_grad, -epsilon, epsilon)
  x_adv = img_tensor + total_grad
  img_variable.data = x_adv

## Method 3C: Ask the model to classify the adversarial image

Once again, we can ask the model to classify the generated image and see what it thinks.

In [None]:
output_adv = inceptionv3.forward(img_variable)
x_adv_pred = labels[int(torch.max(output_adv.data, 1)[1][0])]
output_adv_probs = F.softmax(output_adv, dim=1)
x_adv_pred_prob = (torch.max(output_adv_probs.data, 1)[0][0]) * 100
visualize(img_tensor, img_variable.data, total_grad, epsilon, label, x_adv_pred, pred_prob, x_adv_pred_prob)

# Further Reading

[Pytorch Adversarial Attack Tutorials](https://pytorch.org/tutorials/beginner/fgsm_tutorial.html)
If you want to work at your own pace at hands on examples of adversarial attacks, check out this series of tutorials

[Ian Goodfellow Presents Adversarial Examples (video)](https://www.youtube.com/watch?v=CIfsB_EYsVI)
Learn more about FGSM as well as alternative attacks. He talks about the broader implications of Adversarial Attacks.