# Machine Learning Systems Task 1
_Daniel Jones, Praful Chunchu, Ravi Patel and Austin Staton_

**Objective**: Generating adversarial attacks in the context of a zero-knowledge threat model.

We will be exploring various adversarial attacks, including: _Projected Gradient Descent_, _Fast Gradient Sign Method_, and the _Basic Iterative Method_. We will be generating these adversarial examples with a specific set of tuned parameters for each attack, to also allow for a demonstration of the effectiveness of different machine learning models.
 
### Experimental Design
We will be attacking three different models: an undefended model, the [vanilla Athena](https://github.com/softsys4ai/athena) with a manually selected ensemble of 5 weak defenses, and PGD-ADT. 

In order to effectively determine the differences in their success (or rather, their differences in errors) between each different approach, identical parameters will be sent to each different model, respectively by adversarial attack. Meaning, for any _one_ attack, (PGD, FGSM, BIM) the parameters testing the attack's efficacy will remain constistent across the three differently independant models. 

All  generated adversarial examples will utilize the same epsilon values of `0.03`, `0.06`, `0.12`, and `0.48`. This will allow us to measure and analyze the effectiveness of our adversarial examples in relation to the other AE's we generated. In the case of BIM attacks, each epsilon used will be performed at 50 iterations and 100 iterations to explore the impact iterations can have on the error rate generated by an adversarial example.

We expect this to give some experimental consistency to our results.


#### Subsampling
To increase execution speed, we opted to generate subsamples of data using the provided `subsample.py` script provided.This sctipt generated subsamples at a ratio of `0.1` and the subsamples/sublabels used for this experiment can be found in the `/task1/SubSample` folder.


 # Relevant Files
All of the adversarial examples generated with `craft_adversarial_examples.py` are located in the `/task1/Adversarial_Examples` folder. There are three different attack methods, each with 5 different epsilon values, named according to their variant. BIM has 10 attack methods due to running them at 50 and 100 iterations. This totals to 20 adversarial examples generated as part of this experiment.

In `attack-zk-mnist.json` located in `/task1/configs`, there are all of our configurations used to generate adversarial examples.

The subsamples and sublabels used during this experiment are located in the `task1/SubSamples` folder and are named `sublabels-mnist-ratio_0.1-112490.080191753.npy` and `subsamples-mnist-ratio_0.1-112490.080191753.npy`.

The active weak defenses chosen for the Ensemble are located in `/task1/configs/athena-mnist.json` in the `active_wds` JSON node.
 
 

***
# Projected Gradient Descent (PGD)
PGD attacks are white-box attacks, specifically designed to take advantage of each layer's weight in the ML model. This attack has a parameter, `epsilon`, that attempts to find the biggest weaknesses in the model, while trying to minimize the input distortion or alteration. We exectued the PGD attack with five different values of `epsilon`.

#### The Inputs
The parameters of epsilon for the attacks are `0.03`, `0.06`, `0.12`, `0.24`, and `0.48`. When we increase epsilon, two things will happen. The first, is that the inputs (images) will be increasingly poised to exploit the model's weights. The second, which occurs as an effect of the first, is the image's increasing distortion. This is a form of constrained optimization problem that would need to be tuned to each attack's purpose. 

As an example, if one was attempting to bypass the content filtering of an image upload service, the image would need to be _mostly_ recoverable. Bypassing a content filter to upload an unrecognizable image would not make sense in practical applications.

The inputs, in JSON form, looked like the below:

In [1]:
{
"configs0": {
    "attack": "pgd",
    "description": "pgd_eps003",
    "eps": 0.03
  },
  "configs1": {
    "attack": "pgd",
    "description": "pgd_eps006",
    "eps": 0.06
  },
  "configs2": {
    "attack": "pgd",
    "description": "pgd_eps012",
    "eps": 0.12
  },
  "configs3": {
    "attack": "pgd",
    "description": "pgd_eps024",
    "eps": 0.24
  },
  "configs4": {
    "attack": "pgd",
    "description": "pgd_eps048",
    "eps": 0.48
  }
}

{'configs0': {'attack': 'pgd', 'description': 'pgd_eps003', 'eps': 0.03},
 'configs1': {'attack': 'pgd', 'description': 'pgd_eps006', 'eps': 0.06},
 'configs2': {'attack': 'pgd', 'description': 'pgd_eps012', 'eps': 0.12},
 'configs3': {'attack': 'pgd', 'description': 'pgd_eps024', 'eps': 0.24},
 'configs4': {'attack': 'pgd', 'description': 'pgd_eps048', 'eps': 0.48}}

### Generated Examples
The results matched our hypothsis. As the value of the tuned parameter `epsilon` increased, more distortion was created in the image, more model weights were exploited, and more errors occured.
 
 
**Images at Various Epsilons**

As seen below, when the value of epsilon increases, the recognizability of the image decreases; while, the error rate of the classifier increases.
 
In the title of each image, the `X->Y` denotes the classifiers interpretation of the number. `X` represents the original value of the image; `Y` represents its classification by the model after pertubation.

0.03            |  0.06 | 0.12 | 0.24 | 0.48
:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|
![](images/pgd_eps003.png)  |  ![](images/pgd_eps006.png) |  ![](images/pgd_eps012.png) |  ![](images/pgd_024.png) |  ![](images/pgd_eps048.png)


### Results of Evaluated Models
| Adversarial Example | UM         | Ensemble   | PGD-ADT    |
|---------------------|------------|------------|------------|
| PGD_eps_0.03        | 0.04137235 | 0.00201816 | 0.00605449 |
| PGD_eps_0.06        | 0.20282543 | 0.00302725 | 0.01210898 |
| PGD_eps_0.12        | 0.84661958 | 0.01009082 | 0.02926337 |
| PGD_eps_0.24        | 0.99091826 | 0.03632694 | 0.10191726 |
| PGD_eps_0.48        | 0.99091826 | 0.29868819 | 0.64883956 |

In this table, its evident that as the value of epsilon increased, the error rate increased as well. This makes sense, since epsilon represents a maginitude of damage to the image. With enough pertubation, one's own mind begins to misclassify an image. It would be expected of a classifier misclassify it too.

![PGD Chart](images/PGD_Evaluations.png)

As you can see, given our chosen Ensemble of weak defenses for Vanilla Athena, Athena consistantly far outperforms the UM and PGD-ADT models with a lower error rate.

***
# Fast Gradient Signed Method (FGSM)
FGSM adversarial attacks are white-box attacks that exploit the gradients, or parameters, to a neural network. It is designed to prioritize speed, rather than designed around solving the constrained optimization problem between data integrity and perturbation, similarly to PGD.

FGSM uses the sign of loss function (this is conceptually similar to the linear "direction" to the next classification) to determine where the model could easiest misrepresent the data, moves in a "distance" of `epsilon` to that next space within the network. 

With this vector, having a direction (the sign of a loss function) and magnitude (epsilon), can be used to alter input and fool a classifier. 

#### The Inputs
The parameters of epsilon (i.e., distance/magnitude) for the FGSM  attacks are: `0.03`, `0.06`, `0.12`, `0.24`, and `0.48`. In FGSM, `epsilon` is a scalar value that determines how much pertubation to create in the classification.

The inputs, in JSON format, looked like the below:


In [1]:
 {
  "configs5": {
    "attack": "fgsm",
    "description": "fgsm_eps003",
    "eps": 0.03
  },
  "configs6": {
    "attack": "fgsm",
    "description": "fgsm_eps006",
    "eps": 0.06
  },
  "configs7": {
    "attack": "fgsm",
    "description": "fgsm_eps012",
    "eps": 0.12
  },
  "configs8": {
    "attack": "fgsm",
    "description": "fgsm_eps024",
    "eps": 0.24
  },
  "configs9": {
    "attack": "fgsm",
    "description": "fgsm_eps048",
    "eps": 0.48
  }
 }

{'configs5': {'attack': 'fgsm', 'description': 'fgsm_eps003', 'eps': 0.03},
 'configs6': {'attack': 'fgsm', 'description': 'fgsm_eps006', 'eps': 0.06},
 'configs7': {'attack': 'fgsm', 'description': 'fgsm_eps012', 'eps': 0.12},
 'configs8': {'attack': 'fgsm', 'description': 'fgsm_eps024', 'eps': 0.24},
 'configs9': {'attack': 'fgsm', 'description': 'fgsm_eps048', 'eps': 0.48}}

### Generated Examples
The results matched our hypothsis. When the value of the tuned parameter `epsilon` increased, the 'distance' away from the original classification changed.

**Images at Various Epsilons**

As seen below, the value of `epsilon` is indirectly proportional to the recognizability of the image; but, the error rate of the classifier is directly proportional.

In the title of each image, the `X->Y` denotes the classifiers interpretation of the number. "X" represents the original value of the image; "Y" represents its classification by the model after pertubation.

0.03            |  0.06 | 0.12 | 0.24 | 0.48
:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|
![](images/fgsm_eps003.png)  |  ![](images/fgsm_eps006.png) |  ![](images/fgsm_eps012_2.png) |  ![](images/fgsm_eps024.png) |  ![](images/fgsm_eps048.png)


### Results of Evaluated Models
| Adersarial Example | UM         | Ensemble   | PGD-ADT    |
|--------------------|------------|------------|------------|
| FGSM_eps_0.03      | 0.02421796 | 0.00201816 | 0.00605449 |
| FGSM_eps_0.06      | 0.09788093 | 0.00201816 |  0.0110999 |
| FGSM_eps_0.12      | 0.35822402 | 0.00605449 | 0.02522704 |
| FGSM_eps_0.24      | 0.81533804 | 0.03733602 | 0.10292634 |
| FGSM_eps_0.48      | 0.91321897 | 0.73360242 | 0.86074672 |

In the case of the undefended model, there was a general direct proportionality between epsilon and the error rate of the model. In Vanilla Athena (the ensemble of weak defenses) and PGD-ADT, this trend was not present. This could be attributed to the framework being trained to defend against this adversarial attack.

![FGSM Chart](images/FGSM_Evaluations.png)

Much like with our results from the Evualation of PGD, Athena with our chosen ensemble consistantly outperformed UM and PGD_ADT.



***
# Basic Iteractive Method (BIM)
BIM attacks are a variant to FGSM attacks. The same direction is computed from the loss function and magnitude is found with epsilon; but, the adversarial attack is performed many different times, 'iteratively', in increasing step sizes. 

#### The Inputs
The parameters of epsilon for the attacks are the same as those utilized for PGD and FGSM, however with BIM we ran each epsilon value at `50` iterations and `100` iterations in order to observe the difference in error generated when increasing iteration rates across multiple different epsilon values.

The inputs, in JSON form, looked like the below:

In [7]:
  {
  "configs10": {
    "attack": "bim",
    "description": "bim_eps003iter50",
    "eps": 0.03,
    "max_iter": 50
  },
  "configs11": {
    "attack": "bim",
    "description": "bim_eps006iter50",
    "eps": 0.06,
    "max_iter": 50
  },
  "configs12": {
    "attack": "bim",
    "description": "bim_eps012ter50",
    "eps": 0.12,
    "max_iter": 50
  },
  "configs13": {
    "attack": "bim",
    "description": "bim_eps024iter50",
    "eps": 0.24,
    "max_iter": 50
  },
  "configs14": {
    "attack": "bim",
    "description": "bim_eps048iter50",
    "eps": 0.48,
    "max_iter": 50
  },
  "configs15": {
    "attack": "bim",
    "description": "bim_eps003iter100",
    "eps": 0.03,
    "max_iter": 100
  },
  "configs16": {
    "attack": "bim",
    "description": "bim_eps006iter100",
    "eps": 0.06,
    "max_iter": 100
  },
  "configs17": {
    "attack": "bim",
    "description": "bim_eps0121ter100",
    "eps": 0.12,
    "max_iter": 100
  },
  "configs18": {
    "attack": "bim",
    "description": "bim_eps024iter100",
    "eps": 0.24,
    "max_iter": 100
  },
  "configs19": {
    "attack": "bim",
    "description": "bim_eps048iter100",
    "eps": 0.48,
    "max_iter": 100
  }
  }

{'configs10': {'attack': 'bim',
  'description': 'bim_eps003iter50',
  'eps': 0.03,
  'max_iter': 50},
 'configs11': {'attack': 'bim',
  'description': 'bim_eps006iter50',
  'eps': 0.06,
  'max_iter': 50},
 'configs12': {'attack': 'bim',
  'description': 'bim_eps012ter50',
  'eps': 0.12,
  'max_iter': 50},
 'configs13': {'attack': 'bim',
  'description': 'bim_eps024iter50',
  'eps': 0.24,
  'max_iter': 50},
 'configs14': {'attack': 'bim',
  'description': 'bim_eps048iter50',
  'eps': 0.48,
  'max_iter': 50},
 'configs15': {'attack': 'bim',
  'description': 'bim_eps003iter100',
  'eps': 0.03,
  'max_iter': 100},
 'configs16': {'attack': 'bim',
  'description': 'bim_eps006iter100',
  'eps': 0.06,
  'max_iter': 100},
 'configs17': {'attack': 'bim',
  'description': 'bim_eps0121ter100',
  'eps': 0.12,
  'max_iter': 100},
 'configs18': {'attack': 'bim',
  'description': 'bim_eps024iter100',
  'eps': 0.24,
  'max_iter': 100},
 'configs19': {'attack': 'bim',
  'description': 'bim_eps048iter10

### Generated Examples
It was previously found that the magnitude of epsilon is directly proportional to the classifier's error rate and the error rate of the three models.

Now, it is possible to draw the correlation in BIM's number of iterations and error rate.
 
**Images at Various Epsilons (50 Iterations)

When the value of epsilon increases, the recognizability of the image decreases; but, the error rate of the classifier increases. See the table of images below. 

0.03            |  0.06 | 0.12 | 0.24 | 0.48
:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|
![](images/bim_eps003_iter50.png)  |  ![](images/bim_eps006_iter50.png) |  ![](images/bim_eps012_iter50.png) |  ![](images/bim_eps024_iter50.png) |  ![](images/bim_eps048_iter50.png)


**Images at Various Epsilons (100 Iterations)**
 
Interestingly enough, doubling the iterations at the same epsilon does not double the error rate, and sometimes it does not increase the error rate a significant amouunt.


0.03            |  0.06 | 0.12 | 0.24 | 0.48
:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|
![](images/bim_eps003_iter100.png)  |  ![](images/bim_eps006_iter100.png) |  ![](images/bim_eps012_iter100.png) |  ![](images/bim_eps024_iter100.png) |  ![](images/bim_eps048_iter100.png)




As seen from the above images, there was not a human-noticeable difference between images manipulated through BIM at 50 iterations and images manipulated by BIM at 100 iterations.

### Results of Evaluated Models
| BIM at 50 Iterations |            |            |            |   | BIM at 100 Iterations |            |            |             |
|----------------------|------------|------------|------------|---|-----------------------|------------|------------|-------------|
| Adversarial Examples | UM         | Ensemble   | PGD-ADT    |   | Adversarial Examples  | UM         | Ensemble   | PGD-ADT     |
| BIM_eps_0.03         | 0.05247225 | 0.00201816 | 0.00605449 |   | BIM_eps_0.03          | 0.05348133 | 0.00201816 |  0.00605449 |
| BIM_eps_0.06         | 0.30575177 | 0.00302725 | 0.01311806 |   | BIM_eps_0.06          |  0.3148335 | 0.00302725 | 0.013118063 |
| BIM_eps_0.12         | 0.98486377 | 0.01009082 | 0.02825429 |   | BIM_eps_0.12          | 0.98789102 | 0.01009082 |  0.02926337 |
| BIM_eps_0.24         | 0.99091826 | 0.07568113 |  0.1654894 |   | BIM_eps_0.24          | 0.99091826 | 0.08072654 | 0.166498486 |
| BIM_eps_0.48         | 0.99091826 | 0.65489405 | 0.95358224 |   | BIM_eps_0.48          | 0.99091826 | 0.65993946 | 0.956609485 |

In FGSM and PGD, it was possible to see that the magnitude of epsilon directly correlated to a higher error rate. In this iterative approach, we're able to see a correlation between the number of iterations, or individual attacks, and error rate. Iteratively and independantly attacking the models with BIM seemed to cause the highest error rates when compared to the other attack methodologies. It was also the **most expensive computationally** (time of execution) due to its iterative nature. 

An interesting point is that while there was an increase in error across the models between 50 iteration experiments and 100 iteration experiments, there was not a significant benefit in running BIM at 100 iterations.

This information is also visiable in the charts below.

50 Iterations            |  100 Iterations
:-------------------------:|:--------------------------:
![](images/BIM_50_Iterations_Evaluations.png)  |  ![](images/BIM_100_Iterations_Evaluations.png)



# Conclusion 

After generating 20 adversarial examples and evaluating them against Vanilla Athena, an Undefended Model, and PGD-ADT, the results favor Vanilla Athena with the chosen ensemble of 5 defenses. It is interesting to note that no matter what the attack method was, Athena has a significant advantage over the other models, especially when the image was still in a state that looked mostly recoverable and recognizable. 

***
## Contributions
### Austin Staton
Worked Primarily on the report, as well as developed AEs for the PGD attack method with Daniel Jones.
### Daniel Jones
Worked on developing AEs for the FGSM and BIM attack methods with Praful Chunchu and Ravi Patel. Also worked on data aggregation and some repository maintenance.
### Praful Chunchu
Worked on developing AEs for FGSM and BIM attack methods with Ravi Patel and Daniel Jones.
### Ravi Patel
Worked on developing AEs for FGSM and BIM attack methods with Praful Chunchu and Daniel Jones.