#14757 Homework 3 (150 pts)

## **Due:** Wednesday October 23 at 3pm ET / 12 noon PT

## Submission Instructions

*   Download your completed notebook by clicking File->Download .ipynb and submit it on Gradescope
*   Check your submission on Gradescope to make sure that all your code, code output and written responses appear correctly

## Tips for getting started

*   Make sure to set your Colab runtime type to GPU hardware acceleration.

*   You will require knowledge of PyTorch to complete this assignment. We recommend that you read all four sections of the following tutorial. The starter code for Problem 2 is adapted from the last section.

  https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

*   As you work on this assignment you will likely need to refer to the PyTorch documentation.

  https://pytorch.org/docs/stable/index.html

## Problem 1: Gradescope Autograder Placeholder (0 pts)

Gradescope requires that problem 1 be autograded for code submissions, but there are no autograded problems. Continue to problem 2.

## Problem 2: Tweaking a Convolutional Neural Network (90 pts)

In this problem you will tweak an existing convolutional neural network image classifier that runs on the CIFAR-10 data set.

The starter code is adapted from the tutorial at https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html.

First run the cell below to observe the following steps.

* Load and normalize the CIFAR-10 training and test datasets using ``torchvision``
* Define a Convolutional Neural Network
* Define a loss function
* Train the network on the training data
* Test the network on the test data

Now tweak the code in the cell below as follows.

* (20 pts) Train the model on a GPU
* (20 pts) Add one more convolutional layer to the model
* (20 pts) Apply dropout to the input layer
* (10 pts) Increase the batch size for training
* (10 pts) Increase the number of training epochs
* (10 pts) Change the optimization algorithm for training

We will not grade you on the accuracy of your classifier, just on the tweaks themselves.

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# Load and normalize the CIFAR10 training and test datasets

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# Define a Convolutional Neural Network

class Net(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(3, 6, 5)
    self.pool = nn.MaxPool2d(2, 2)
    self.conv2 = nn.Conv2d(6, 16, 5)
    self.fc1 = nn.Linear(16 * 5 * 5, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

  def forward(self, x):
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))
    x = x.view(-1, 16 * 5 * 5)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

net = Net()

# Define a loss function

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# Train the network on the training data

print('Starting Training')

for epoch in range(2):  # loop over the dataset multiple times

  running_loss = 0.0
  for i, data in enumerate(trainloader, 0):
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data

    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    # print statistics
    running_loss += loss.item()
    if i % 2000 == 1999:    # print every 2000 mini-batches
      print('[%d, %5d] loss: %.3f' %
            (epoch + 1, i + 1, running_loss / 2000))
      running_loss = 0.0

print('Finished Training', end='\n\n')

# Test the network on the test data

correct = 0
total = 0
with torch.no_grad():
  for data in testloader:
    images, labels = data
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

## Problem 3: Attacking a Neural Network Classifier (60 pts)

In this problem you will implement two evasion attacks against a neural network classifier for the MNIST handwritten digits data set.

The starter code is adapted from the following tutorial that you may find useful to read:

https://pytorch.org/tutorials/beginner/fgsm_tutorial.html

Run the cell below to

*   load the MNIST handwritten digit test set
*   define the neural network structure and load pretrained parameters
*   define test and display helper functions



In [None]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import numpy as np
import matplotlib.pyplot as plt

# MNIST Test dataset and dataloader declaration
test_loader = torch.utils.data.DataLoader(
  datasets.MNIST('../data', train=False, download=True,
                 transform=transforms.Compose([transforms.ToTensor(),])),
                 batch_size=1, shuffle=True)

# LeNet Model definition
class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
    self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
    self.conv2_drop = nn.Dropout2d()
    self.fc1 = nn.Linear(320, 50)
    self.fc2 = nn.Linear(50, 10)

  def forward(self, x):
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
    x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
    x = x.view(-1, 320)
    x = F.relu(self.fc1(x))
    x = F.dropout(x, training=self.training)
    x = self.fc2(x)
    return F.log_softmax(x, dim=1)

# Define what device we are using
print("CUDA Available: ",torch.cuda.is_available())
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize the network
model = Net().to(device)

# Load the pretrained model
!mkdir ./data/
!wget http://www.andrew.cmu.edu/user/dvaroday/14757/data/hw3/lenet_mnist_model.pth
!mv lenet_mnist_model.pth ./data
pretrained_model = "data/lenet_mnist_model.pth"
model.load_state_dict(torch.load(pretrained_model, map_location='cpu'))

# Set the model in evaluation mode. In this case this is for the dropout layers.
model.eval()

# HELPER FUNCTIONS

# Test attack_fn() on test data
def test(attack_fn, model, device, test_loader, epsilon):
  # Accuracy counter
  correct = 0
  adv_examples = []

  # Loop over all examples in test set
  for data, target in test_loader:

    # Send the data and label to the device
    data, target = data.to(device), target.to(device)
    # Set requires_grad attribute of tensor. Important for Attack
    data.requires_grad = True
    # Call attack function
    perturbed_data = attack_fn(data, target, epsilon)
    # Re-classify the perturbed image
    output = model(perturbed_data)

    # Check for success

    # Get the index of the max log-probability
    final_pred = output.max(1, keepdim=True)[1]
    if final_pred.item() == target.item():
      correct += 1
      # Special case for saving 0 epsilon examples
      if (epsilon == 0) and (len(adv_examples) < 5):
        adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
        adv_examples.append( (target.item(), final_pred.item(), adv_ex) )
    else:
      # Save some adv examples for visualization later
      if len(adv_examples) < 5:
        adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
        adv_examples.append( (target.item(), final_pred.item(), adv_ex) )

  # Calculate final accuracy for this epsilon
  final_acc = correct/float(len(test_loader))
  print("Epsilon: {}\tTest Accuracy = {} / {} = {}".format(epsilon, correct, len(test_loader), final_acc))

  # Return the accuracy and adversarial examples
  return final_acc, adv_examples

# Display adversarial examples
def display_examples(epsilons, examples):
  cnt = 0
  plt.figure(figsize=(8,10))
  for i in range(len(epsilons)):
    for j in range(len(examples[i])):
      cnt += 1
      plt.subplot(len(epsilons),len(examples[0]),cnt)
      plt.xticks([], [])
      plt.yticks([], [])
      if j == 0:
        plt.ylabel("Eps: {}".format(epsilons[i]), fontsize=14)
      orig,adv,ex = examples[i][j]
      plt.title("{} -> {}".format(orig, adv))
      plt.imshow(ex, cmap="gray")
  plt.tight_layout()
  plt.show()

### Fast gradient sign method (FGSM) attack

Run the cell below to

*   define `fgsm_attack()` as an implementation of FGSM
*   run the FGSM attack against all images in the test set for several values of `epsilon`
*   collect accuracy statistics and display sample adversarial examples




In [None]:
# FGSM attack code
def fgsm_attack(image, target, epsilon):
  # Forward pass the data through the model
  output = model(image)
  # Calculate the loss
  loss = F.nll_loss(output, target)
  # Zero all existing gradients
  model.zero_grad()
  # Calculate gradients of model in backward pass
  loss.backward()
  # Collect the element-wise sign of the data gradient
  sign_data_grad = image.grad.data.sign()
  # Create the perturbed image by adjusting each pixel of the input image
  perturbed_image = image + epsilon*sign_data_grad
  # Clip to maintain [0,1] range
  perturbed_image = torch.clamp(perturbed_image, 0, 1)
  # Return the perturbed image
  return perturbed_image

# Epsilon values for FGSM attack
# Pixel value range is [0,1] so epsilon < 1
epsilons = [0, .05, .1, .15, .2, .25, .3]

accuracies_fgsm = []
examples_fgsm = []

# Run test for each epsilon
for eps in epsilons:
  acc, ex = test(fgsm_attack, model, device, test_loader, eps)
  accuracies_fgsm.append(acc)
  examples_fgsm.append(ex)

# Plot several examples of adversarial samples at each epsilon
display_examples(epsilons, examples_fgsm)

### Iterative method attack

**3.1** (30 pts) Complete the function `iterative_attack()` to implement the iterative method attack described in the lecture titled *Evasion Attacks on Neural Network Classifiers.* Use the provided values of `num_iterations = 3` and `alpha = epsilon/2`.

Note that you should **not** modify the argument `image` because it is a special object that maintains a field called `grad` necessary for backpropagation. Instead we have initialized a tensor called `delta` that you can use to perturb the image.

In [None]:
# Iterative method attack code
def iterative_attack(image, target, epsilon):

  num_iterations = 3
  alpha = epsilon/2
  delta = torch.zeros(image.shape).to(device)

  # START EDITING HERE - DON'T REMOVE THIS COMMENT

  # END EDITING HERE - DON'T REMOVE THIS COMMENT

  # Return the perturbed image
  perturbed_image = torch.clamp(image+delta, 0, 1)
  return perturbed_image

accuracies_iterative = []
examples_iterative = []

# Run test for each epsilon
for eps in epsilons:
  acc, ex = test(iterative_attack, model, device, test_loader, eps)
  accuracies_iterative.append(acc)
  examples_iterative.append(ex)

# Plot several examples of adversarial samples at each epsilon
display_examples(epsilons, examples_iterative)

Run the cell below to compare the accuracies under FGSM and iterative method attacks. You should see that the iterative method degrades accuracy more significantly.

In [None]:
plt.figure(figsize=(5,5))
plt.plot(epsilons, accuracies_fgsm, 'b*-', label='FGSM')
plt.plot(epsilons, accuracies_iterative, 'ro-', label='iterative method')
plt.yticks(np.arange(0, 1.1, step=0.1))
plt.xticks(np.arange(0, .35, step=0.05))
plt.legend(loc='lower left')
plt.xlabel('epsilon')
plt.ylabel('accuracy')
plt.show()

### Targeted iterative method

**3.2** (30 pts) Complete the function `targeted_iterative_attack()` to implement the targeted iterative method attack described in the lecture titled *Evasion Attacks on Neural Network Classifiers.* Your attack should attempt to perturb `image` so that it gets classified as `target_digit = 8`. Continue to use the provided values of `num_iterations = 3` and `alpha = epsilon/2`.

If your implementation is correct you will see convincing evidence among the adversarial images displayed by this cell.

In [None]:
# Targeted iterative method attack code
def targeted_iterative_attack(image, target, epsilon):

  target_digit = 8

  num_iterations = 3
  alpha = epsilon/2
  delta = torch.zeros(image.shape).to(device)

  # START EDITING HERE - DON'T REMOVE THIS COMMENT

  # END EDITING HERE - DON'T REMOVE THIS COMMENT

  # Return the perturbed image
  perturbed_image = torch.clamp(image+delta, 0, 1)
  return perturbed_image

accuracies_targeted = []
examples_targeted = []

# Run test for each epsilon
for eps in epsilons:
  acc, ex = test(targeted_iterative_attack, model, device, test_loader, eps)
  accuracies_targeted.append(acc)
  examples_targeted.append(ex)

# Plot several examples of adversarial samples at each epsilon
display_examples(epsilons, examples_targeted)