# Adversarial Attack Experiment

#### Link to ReadMe Section:
https://gitlab.cs.vt.edu/sdeepti/facial-expression-recognition#adversarial-attack

#### Citations:
- https://pytorch.org/tutorials/beginner/fgsm_tutorial.html
- https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial10/Adversarial_Attacks.html

**Motivation**:
Adversarial machine learning, a technique that attempts to fool models with deceptive data, is a growing threat in the AI and machine learning research community. Therefore, to test our model's robustness, we used Fast Gradient Signed Method (FGSM). FGSM is a white-box attack as it leverages an internal component of the architecture which is its gradients.

#### 1. Initial Set-Up

This adds all the imports that are necessary for the code to run smoothly. It involves importing 'torch' which is necessary to work with our model and retrieve our datasets. Additionally, 'sklearn' is used for evaluation metrics to be reported. On top of these basic imports, we import `torch.nn.functional` in order to use non-linear activation functions.

In [None]:
import os
import numpy as np

import torch
from torch.utils.data import DataLoader
from torchvision import transforms, models
import torch.nn.functional as F

# For plotting
import matplotlib.pyplot as plt
# For metrics
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score

This sets up some of the constant parameters necessary throughout the code such as the pre-trained model path, dataset directory path and more. It also creates the array of epsilon values for which this experiment will investigate on. 

In [None]:
# path to trained model
model_path = '../main_resnet50/FEC_resnet50_trained_face_images_80_10_10.pt'
# directory to dataset
data_dir = '../face_images_80_10_10'

num_classes = 7
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Mean and Std from ImageNet
NORM_MEAN = np.array([0.485, 0.456, 0.406])
NORM_STD = np.array([0.229, 0.224, 0.225])

#List of epsilon values to use for the run.
#It is important to keep 0 in the list because it represents the model performance on the original test set.
#Also, intuitively we would expect the larger the epsilon, the more noticeable the perturbations
#but the more effective the attack in terms of degrading model accuracy.

epsilons = [0, .05, .1, .15, .2, .25, .3]

#### 2. Perform Pre-Processing Steps

This performs the desired pre-processing steps for the images that belong to the testing dataset. They will be resized and normalized as per the ImageNet standards. The necessary data loader will be created for the test set.

In [None]:
# transforms for test set
test_transforms = transforms.Compose([
	transforms.Resize(size=(224, 224)),
	transforms.ToTensor(),
	# use ImageNet standard mean and std dev for transfer learning
	transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

batch_size = 32
# load test dataset and create dataloader
test_set = datasets.ImageFolder(os.path.join(data_dir, 'test'), transform=test_transforms)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=True)

#map indexes to class names
idx_to_class = {v: k for k, v in test_set.class_to_idx.items()}

#### 3. Create Function to Perform FGSM Attack

Given an image, we create an adversarial example by the following expression:

*pertubed image = image + epsilon * sign(data_grad)*


The term *sign(data_grad)* represents the loss of the network for classifying input image  as label ;  is the intensity of the noise, and 
the final adversarial example. The equation resembles SGD and is actually nothing else than that. We change the input image  in the direction of maximizing the loss . This is exactly the other way round as during training, where we try to minimize the loss. The sign function and  can be seen as gradient clipping and learning rate specifically. We only allow our attack to change each pixel value by . You can also see that the attack can be performed very fast, as it only requires a single forward and backward pass. The implementation is as shown below:

In [None]:
def fgsm_attack(model, imgs, labels, epsilon):
	# Collect the element-wise sign of the data gradient
	inp_imgs = imgs.clone().requires_grad_()
	preds = model(inp_imgs.to(device))
	preds = F.log_softmax(preds, dim=-1)
    # Calculate loss by NLL
	loss = -torch.gather(preds, 1, labels.to(device).unsqueeze(dim=-1))
	loss.sum().backward()
	# Update image to adversarial example as written above
	noise_grad = torch.sign(inp_imgs.grad.to(imgs.device))
	fake_imgs = imgs + epsilon * noise_grad
	fake_imgs.detach_()
	return fake_imgs, noise_grad

#### 4. Function to Visualize Adversarial Images & Model Confidence

The function below plots an image (including adversarial images) along with a bar diagram of its predictions for the different possible classes and confidence score. It is visualized by showing the true image first and then the image of the added noise and the perturbed image once that noise is added. On the left, it finally shows the bar diagram.

In [None]:
def graph_fgsm_confidence(model, device, image_batch, label_batch, epsilon):
    print(f'epsilon is {epsilon}')
    adv_images, noise_grad = fgsm_attack(model, image_batch, label_batch, epsilon)

    with torch.no_grad():
        adv_preds = model(adv_images.to(device))
    for i in range(1, 489, 60):
        filename = f'adversarial_sample_{i}_{epsilon}.png'
        show_prediction(image_batch[i], label_batch[i], adv_preds[i], filename, epsilon, adv_img=adv_images[i], noise=noise_grad[i])


def show_prediction(img, label, pred, filename, epsilon, K=5, adv_img=None, noise=None):

	if isinstance(img, torch.Tensor):
		# Tensor image to numpy
		img = img.cpu().permute(1, 2, 0).numpy()
		img = (img * NORM_STD[None,None]) + NORM_MEAN[None,None]
		img = np.clip(img, a_min=0.0, a_max=1.0)
		label = label.item()

	# Plot on the left the image with the true label as title.
	# On the right, have a horizontal bar plot with the top k predictions including probabilities
	if noise is None or adv_img is None:
		fig, ax = plt.subplots(1, 2, figsize=(10,2), gridspec_kw={'width_ratios': [1, 1]})
	else:
		fig, ax = plt.subplots(1, 5, figsize=(12,2), gridspec_kw={'width_ratios': [1, 1, 1, 1, 2]})

	ax[0].imshow(img)
	ax[0].set_title(idx_to_class[label])
	ax[0].axis('off')

	if adv_img is not None and noise is not None:
		# Visualize adversarial images
		adv_img = adv_img.cpu().permute(1, 2, 0).numpy()
		adv_img = (adv_img * NORM_STD[None,None]) + NORM_MEAN[None,None]
		adv_img = np.clip(adv_img, a_min=0.0, a_max=1.0)
		ax[1].imshow(adv_img)
		ax[1].set_title(f'Adversarial (epsilon={epsilon})')
		ax[1].axis('off')
		# Visualize noise
		noise = noise.cpu().permute(1, 2, 0).numpy()
		noise = noise * 0.5 + 0.5 # Scale between 0 to 1
		ax[2].imshow(noise)
		ax[2].set_title('Noise')
		ax[2].axis('off')
		# buffer
		ax[3].axis('off')

	if abs(pred.sum().item() - 1.0) > 1e-4:
		pred = torch.softmax(pred, dim=-1)
	topk_vals, topk_idx = pred.topk(K, dim=-1)
	topk_vals, topk_idx = topk_vals.cpu().numpy(), topk_idx.cpu().numpy()
	ax[-1].barh(np.arange(K), topk_vals*100.0, align='center', color=["C0" if topk_idx[i]!=label else "C2" for i in range(K)])
	ax[-1].set_yticks(np.arange(K))
	ax[-1].set_yticklabels([idx_to_class[c].title() for c in topk_idx])
	ax[-1].invert_yaxis()
	ax[-1].set_xlabel('Confidence')
	ax[-1].set_title('Predictions')

	plt.tight_layout()
	plt.savefig(filename, bbox_inches='tight')
	plt.show()
	#plt.close()

<div align="center">
<img height=150 src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/Images/epsilon_0.png">
</div>

<div align="center">
<img height=150 src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/Images/epsilon_0.01.png">
</div>

#### 5. Function to Evaluate Model Performance

It is important that we verify the performance of our model. Sometimes, simply looking at the accuracy of a model is not sufficient to make clear conclusions about the model's performance. A common alternative metric is “Top-5 accuracy”, which tells us how many times the true label has been within the 5 most-likely predictions of the model. As models usually perform quite well on those, we report the error (1 - accuracy) instead of the accuracy:

In [None]:
def eval_model(model, device, test_loader, img_func=None):
    tp, tp_5 = 0.0, 0.0
    counter = 0.0

    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)

        if img_func is not None:
            images = img_func(images, labels)
        with torch.no_grad():
            preds = model(images)
        tp += (preds.argmax(dim=-1) == labels).sum()
        tp_5 += (preds.topk(5, dim=-1)[1] == labels[...,None]).any(dim=-1).sum()
        counter += preds.shape[0]
    acc = tp.float().item()/counter
    top5 = tp_5.float().item()/counter
    print(f'Top-1 error: {(100.0 * (1 - acc)):4.2f}%')
    print(f"Top-5 error: {(100.0 * (1 - top5)):4.2f}%")
    return acc, top5


print('performance with no attack:')
eval_model(model, device, test_loader)

#### 6. Call to Evaluate the Model on FGSM Attack

In [None]:
for eps in epsilons:
    print(f'evaluating epsilon: {eps}')
    _ = eval_model(model, device, test_loader, img_func=lambda x, y: fgsm_attack(model, x, y, epsilon=eps)[0])

### Results

<div align="center">
<img src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/Images/epsilon_graph_small.png">
</div>

The results show that even a small epsilon value can have quite a drastic impact on the performance of the model. An epsilon value of 0 gives us our original accuracy of 96% but an epsilon of 0.1 drops the accuracy significantly to 4.7%.

<div align="center">
<img height=200 src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/Images/epsilon_0.3.png">
</div>

Moreover, when the epsilon value is too high, such as 0.3, it acts similarly to adding noise to the image.  This is because the model still misclassifies the image but it is no longer confident in one label and is more confused which causes it to have some confidence in numerous labels

These results further prove that our model might not be the most robust against such white box attacks. However, it is important to note that it is particularly hard to make a model defend itself against these types of attacks since the attacker has access to the model's internal architecture and parameters.