### Set UP
* Robustbench: this library is used for loading robust classifer, For more information visit: https://github.com/RobustBench/robustbench

* foolbox: this library is used for adversarial example generation. For more information visit: https://github.com/bethgelab/foolbox

In [1]:
from robustbench.utils import clean_accuracy
from robustbench.utils import load_model
import matplotlib.pyplot as plt
from torch import unique
import foolbox as fb
import numpy as np
import pickle
import torch
import os

### Download and preprocess the data:

* We will use 1000 test examples from the cifar 10 dataset. These images are new to the model as it hasn't seen them in the training phase. We want to fool the model on its predictions for new images!

In [2]:
import gdown
output_file = 'cifar10.pt'
file_id = "1A5gQCE0bHZhBlfcLQ2fFP5UygpgVkdAX"
gdown.download(f"https://drive.google.com/uc?id={file_id}", output_file)

Downloading...
From: https://drive.google.com/uc?id=1A5gQCE0bHZhBlfcLQ2fFP5UygpgVkdAX
To: c:\Users\Elyas\OneDrive - The University of Colorado Denver\Desktop\Projects\decoy_challenge\cifar10.pt
100%|██████████| 12.3M/12.3M [00:00<00:00, 17.5MB/s]


'cifar10.pt'

In [3]:
cifar_data = torch.load('cifar10.pt')

  cifar_data = torch.load('cifar10.pt')


In [4]:
# Extract the images and labels tensors
x_test = cifar_data['images'] / 255.0
y_test = cifar_data['labels']
orig_x_test = x_test

print(unique(y_test, return_counts=True))

(tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), tensor([100, 100, 100, 100, 100, 100, 100, 100, 100, 100]))


In [5]:
print(x_test.shape, y_test.shape)
print(torch.max(x_test), torch.min(x_test))

torch.Size([1000, 3, 32, 32]) torch.Size([1000])
tensor(1.) tensor(0.)


### Loading the robust model

* IMPORTANT: You shouldn't change this part of the code as your final generated examples will be evaluated how successful you are at fooling this model!

In [6]:
model = load_model(model_name='Kireev2021Effectiveness_RLATAugMix', dataset='cifar10', threat_model='corruptions')

  checkpoint = torch.load(model_path, map_location=torch.device('cpu'))


### GPU Utilization

* For shorter running time, let's utilize GPU!

In [7]:
# Check if GPU is available and set the device accordingly
if torch.cuda.is_available():
    device = torch.device('cuda')
    print("Using GPU:", torch.cuda.get_device_name(0))
else:
    device = torch.device('cpu')
    print("Using CPU")

model = model.to(device)
x_test = x_test.to(device)
y_test = y_test.to(device)

Using GPU: NVIDIA GeForce RTX 3090


### Adversarial Example Generation -- Adversarial Perturbation

* Here for a baseline, we use the PGD algorithm from foolbox library. This is the most important part of the challenge. What algorithm is gonna work best?

* There are many algorithms and many other adversarial example generation algorithms. Don't forget to check out other libraries!

  * One other very popular library among many others is Adversarial Robustness Toolbox (ART)!
  * There are many more algorithms out there, your task is to find the ones that works best based on our evaluation metrics.

In [8]:
model_fb = fb.PyTorchModel(model, bounds=(0, 1))

In [9]:
import torch
import foolbox as fb
import itertools

# Carlini Wagner hyperparameters
confidence_grid = [0]  # Adjust this to vary the attack confidence
cw_steps_grid = [2000]  # Number of steps for the attack
binary_search_steps = [9]  # Binary search steps for C&W attack
stepsize = [0.001]  # Step size for each attack iteration
initial_const = [0.1]  # Initial constant for C&W attack
epsilon_grid = [None]  # C&W doesn't rely on a fixed epsilon

# Combine all hyperparameters into a grid
cw_grid = list(itertools.product(confidence_grid, cw_steps_grid, binary_search_steps, stepsize, initial_const, epsilon_grid))

# Function to get confidence of predictions
def get_confidence(logits, labels):
    # Apply softmax to convert logits to probabilities
    probabilities = torch.softmax(logits, dim=-1)

    # Get the confidence of the predicted class
    predicted_confidences, predicted_classes = probabilities.max(dim=-1)

    # Return the confidence for each prediction
    return predicted_confidences

# Function to run the Carlini & Wagner attack and calculate confidence
def run_cw(model_fb, x_test, y_test, binary_search_steps, confidence, steps, step_size, initial_const, epsilon):
    # Initialize the C&W attack
    attack = fb.attacks.L2CarliniWagnerAttack(
        binary_search_steps=binary_search_steps,
        steps=steps,
        stepsize=step_size,
        confidence=confidence,
        initial_const=initial_const
    )


    # Run the attack
    _, advs, success = attack(model_fb, x_test, y_test, epsilons=epsilon)

    return advs, success

# Iterate through the attack hyperparameter grid
results = []

# Carlini & Wagner grid search
for params in cw_grid:
    confidence, cw_step, binary_search_step, step_size, initial_const, epsilon = params
    print(f"Running Carlini & Wagner with params: confidence={confidence}, steps={cw_step}, binary_search_steps={binary_search_step}, stepsize={step_size}, initial_const={initial_const}, epsilon={epsilon}")

    # Run the C&W attack with current parameters
    advs, success = run_cw(model_fb, x_test, y_test, binary_search_step, confidence, cw_step, step_size, initial_const, epsilon)
    print(success)
    
    # Evaluate on the clean test data to get clean confidences
    clean_logits = model_fb(x_test)
    clean_confidences = get_confidence(clean_logits, y_test)

    # Evaluate on the adversarial examples to get adversarial confidences
    adv_logits = model_fb(advs)
    adv_confidences = get_confidence(adv_logits, y_test)

    # Get confidence of incorrect adversarial predictions
    incorrect_mask = success.bool()
    incorrect_confidence = adv_confidences[incorrect_mask].mean().item()  # Average confidence for incorrect predictions

    # Calculate confidence gap (difference between clean and adversarial confidences)
    confidence_gap = (clean_confidences[incorrect_mask] - adv_confidences[incorrect_mask]).mean().item()

    # Calculate the perturbation magnitude (L2 norm between original and adversarial examples)
    perturbation_magnitude = torch.norm(advs - x_test, p=2, dim=(1, 2, 3)).mean().item()


    # Store results
    results.append({
        'attack': 'CW',
        'confidence': confidence,
        'steps': cw_step,
        'binary_search_steps': binary_search_step,
        'stepsize': step_size,
        'initial_const': initial_const,
        'confidence_gap': confidence_gap,
        'avg_confidence_incorrect': incorrect_confidence,
    })
    print(f"Average Confidence Incorrect: {incorrect_confidence}, Confidence Gap: {confidence_gap}, Perturbation Magnitude: {perturbation_magnitude}, Score: {1 - success.float().mean()}")

# After completing both grid searches, find the best set of parameters based on confidence and confidence gap
best_result = max(results, key=lambda x: x['avg_confidence_incorrect'])
print(f"Best result: {best_result}")


Running Carlini & Wagner with params: confidence=0, steps=30, binary_search_steps=9, stepsize=0.001, initial_const=0.1, epsilon=None
tensor([False, False,  True, False,  True,  True,  True, False, False,  True,
        False, False,  True, False,  True, False, False,  True,  True, False,
         True,  True, False,  True,  True,  True,  True,  True,  True,  True,
        False, False, False,  True, False, False, False,  True, False,  True,
        False,  True, False,  True,  True, False,  True,  True, False, False,
        False, False,  True, False, False,  True,  True,  True,  True,  True,
         True, False, False, False, False, False, False,  True, False,  True,
        False, False,  True, False,  True,  True,  True,  True,  True, False,
        False,  True,  True,  True, False,  True,  True, False,  True, False,
         True,  True,  True,  True, False, False,  True, False,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         

### Let's compare the accuracies before and after perturbation!

In [13]:
print('Robust accuracy: {:.1%}'.format(1 - success.float().mean()))
print(clean_accuracy(model, x_test, y_test))

Robust accuracy: 4.6%
0.941


### Save Perturbation

In [11]:
advs = [advs]
print(advs[0].shape)

# Create the 'challenge' directory if it doesn't exist
os.makedirs('challenge', exist_ok=True)

# Path to save the adversarial examples
file_path = os.path.join('challenge', 'advs.pkl')

# Save the 'advs' object
with open(file_path, 'wb') as f:
    pickle.dump(advs, f)

AttributeError: 'list' object has no attribute 'shape'

### Load pkl model

In [12]:
import pickle

filename = 'challenge/CW230058.pkl'

with open(filename, 'rb') as file:
    data = pickle.load(file)
    print(data[0].shape)
    
print('Robust accuracy: {:.1%}'.format(1 - success.float().mean()))
print(clean_accuracy(model, x_test, y_test))

torch.Size([1000, 3, 32, 32])
Robust accuracy: 4.6%
0.941
