# Machine Learning Systems Task 1
_Daniel Jones, Praful Chunchu, Ravi Patel and Austin Staton_

**Objective**: Generating adversarial attacks in the context of a zero-knowledge threat model.

We will be exploring various adversarial attacks, including: _Projected Gradient Descent_, _Fast Gradient Sign Method_, and the _Basic Iterative Method_. We will be generating these adversarial examples with a specific set of tuned parameters for each attack, to also allow for a demonstration of the effectiveness of different machine learning models.
 
### Experimental Design
We will be attacking three different models: an undefended model, the [vanilla Athena](https://github.com/softsys4ai/athena), and PGD-ADT. 

In order to effectively determine the differences in success (or rather, the differences in errors) between each different approach, identical parameters will be sent to each different model, by adversarial attack. So, for any _one_ attack, (PGD, FGSM, BIM) the parameters testing the attack's efficacy will remain constistent across the three differently independant models. This does not mean that, for example, the values of inputs to attack type remains consistent for all different models. 

We expect this to give some experimental consistency to our results.

# Projected Gradient Descent (PGD)
PGD attacks are white-box attacks, specifically designed to take advantage of each layer's weight in the ML model. This attack has a parameter, `epsilon`, that attempts to find the biggest weaknesses in the model, while trying to minimize the input distortion or alteration. We exectued the PGD attack with five different values of `epsilon`.

#### The Inputs
The parameters of epsilon for the attacks are `0.03`, `0.07`, `0.09`, `0.12`, and `0.18`. When we increase epsilon, two things will happen. The first, is that the inputs (images) will be increasingly poised to take advantage of the model's weights. The second, which occurs as an effect of the first, is that the image is increasingly distorted. This is a form of constrained optimization problem that would need to be tuned to each attack's purpose. 

As an example, if one was attempting to bypass the content filtering of an image upload service, the image would need to be _mostly_ recoverable. Bypassing a content filter to upload an unrecognizable image would not make much sense in practical applications.

The inputs, in JSON form, looked like the below:

In [7]:
{
  "num_attacks": 5,
  "configs0": {
    "attack": "pgd",
    "description": "PGD_eps0.03",
    "eps": 0.03
  },
  "configs1": {
    "attack": "pgd",
    "description": "PGD_eps0.07",
    "eps": 0.07
  },
  "configs2": {
    "attack": "pgd",
    "description": "PGD_eps0.09",
    "eps": 0.09
  },
  "configs3": {
    "attack": "pgd",
    "description": "PGD_eps0.12",
    "eps": 0.12
  },
  "configs4": {
    "attack": "pgd",
    "description": "PGD_eps0.18",
    "eps": 0.18
  }
}

{'num_attacks': 5,
 'configs0': {'attack': 'pgd', 'description': 'PGD_eps0.03', 'eps': 0.03},
 'configs1': {'attack': 'pgd', 'description': 'PGD_eps0.07', 'eps': 0.07},
 'configs2': {'attack': 'pgd', 'description': 'PGD_eps0.09', 'eps': 0.09},
 'configs3': {'attack': 'pgd', 'description': 'PGD_eps0.12', 'eps': 0.12},
 'configs4': {'attack': 'pgd', 'description': 'PGD_eps0.18', 'eps': 0.18}}

### Generated Examples
The results matched our hypothsis. Meaning, as the value of the tuned parameter `epsilon` increased, more distortion was created in the image, more model weights were exploited, and more errors occured.


**Error Rates**
 * `epsilson:0.03` -> `error_rate: 0.052` (5.2%)
 * `epsilson:0.07` -> `error_rate: 0.306` (30.6%)
 * `epsilson:0.09` -> `error_rate: 0.563` (56.3%)
 * `epsilson:0.12` -> `error_rate: 0.855` (85.5%)
 * `epsilson:0.18` -> `error_rate: 1.0` (100%)
 
 
 **One Generated Image at Each Epsilion Value**
 As you can see below, as the value of epsilon increases, the recognizability of the image decreases; but, the error rate of the classifier increases.

![Epsilon 0.03 Error](img/pgd_eps003_error.png)
![Epsilon 0.07 Error](img/pgd_eps007_error.png)
![Epsilon 0.09 Error](img/pgd_eps009_error.png)
![Epsilon 0.12 Error](img/pgd_eps012_error.png)
![Epsilon 0.18 Error](img/pgd_eps018_error.png)


### Results of Evaluated Models
| **Epsilon** | **Undefended Model** | **Ensemble of WDs** | **PGD-ADT**|
|:---------:|:------------:|:---------:|:------------:|
| x | 0.04137235116044399 | 0.0030272452068617556 | 0.006054490413723511 |
| x | 0.2956609485368315 | 0.007063572149344097 |

# Fast Gradient Signed Method (FGSM)
FGSM adversarial attacks are white-box attacks that exploit the gradients, or parameters, to a neural network. It is designed to prioritize speed, rather than designed around solving the constrained optimization problem between data integrity and perturbation, like PGD.

FGSM uses the sign of loss function (what is somewhat similar to the linear "direction" to the next classification) to determine where the model could easiest misrepresent the data, moves in a "distance" of `epsilon` to that next space within the network. 

With this vector, having a direction (the sign of a loss function) and magnitude (epsilon) can be used to fool a classifier. 

#### The Inputs
The parameters of epsilon (i.e., distance) for the FGSM  attacks are: `0.1`, `0.5`, `0.7`, `0.8`, and `0.9`. `epsilon` in FGSM is paired as a scalar value to determine how much pertubation to create in the classification.

The inputs, in JSON form, looked like the below:


In [None]:
{
  "num_attacks": 5,
  "configs0": {
    "attack": "fgsm",
    "description": "fgsm_eps0.1",
    "eps": 0.1
  },
  "configs1": {
    "attack": "fgsm",
    "description": "fgsm_eps0.5",
    "eps": 0.5
  },
  "configs2": {
    "attack": "fgsm",
    "description": "fgsm_eps0.7",
    "eps": 0.7
  },
  "configs3": {
    "attack": "fgsm",
    "description": "fgsm_eps0.8",
    "eps": 0.8
  },
  "configs4": {
    "attack": "fgsm",
    "description": "fgsm_eps0.9",
    "eps": 0.9
  }
}

### Generated Examples
The results matched our hypothsis. Meaning, as the value of the tuned parameter `epsilon` increased, the 'distance' away from the original classification changed.

**Error Rates**
 * `epsilson:0.1` -> `error_rate: 0.273` (27.3%)
 * `epsilson:0.5` -> `error_rate: 0.904` (90.4%)
 * `epsilson:0.7` -> `error_rate: 0.906` (90.6%)
 * `epsilson:0.8` -> `error_rate: 0.917` (91.7%)
 * `epsilson:0.9` -> `error_rate: 0.908` (90.8%)
 
 
**One Generated Image at Each Epsilion Value**

As you can see below, as the value of epsilon increases, the recognizability of the image decreases; but, the error rate of the classifier increases.

![Epsilon 0.1 Error](img/fgsm_eps01.png)
![Epsilon 0.5 Error](img/fgsm_eps0.5.png)
![Epsilon 0.7 Error](img/fgsm_eps07.png)
![Epsilon 0.8 Error](img/fgsm_eps08.png)
![Epsilon 0.9 Error](img/fgsm_eps09.png)


### FGSM Eval_Model Results
{'UM': 0.8990918264379415, 'Ensemble': 0.8940464177598385, 'PGD-ADT': 0.8809283551967709}

| **Un** | **** | **** |

# Basic Iteractive Method (BIM)

In [None]:
{
  "num_attacks": 5,
  "configs0": {
    "attack": "bim",
    "description": "bim_eps0.9iter100",
    "eps": 0.9,
    "max_iter": 100
  },
  "configs1": {
    "attack": "bim",
    "description": "bim_eps0.4iter75",
    "eps": 0.4,
    "max_iter": 75
  },
  "configs2": {
    "attack": "bim",
    "description": "bim_eps0.01ter100",
    "eps": 0.01,
    "max_iter": 100
  },
  "configs3": {
    "attack": "bim",
    "description": "bim_eps0.9iter40",
    "eps": 0.9,
    "max_iter": 40
  },
  "configs4": {
    "attack": "bim",
    "description": "bim_eps0.6iter25",
    "eps": 0.6,
    "max_iter": 25
  }
}

### Generated Examples
The results matched out hypothsis. Meaning, as the value of the tuned parameter `epsilon` increased, more distortion was created in the image, more model weights were exploited, and more errors occured.


**Error Rates**
 * `epsilson:0.9; max_iter: 100` -> `error_rate: 1.0` (100%)
 * `epsilson:0.4; max_iter: 75` -> `error_rate: 1.0` (100%)
 * `epsilson:0.01; max_iter: 100` -> `error_rate: 0.018` (1.8%)
 * `epsilson:0.9; max_iter: 40` -> `error_rate: 1.0` (100%)
 * `epsilson:0.6; max_iter: 25` -> `error_rate: 1.0` (100%)
 
 
 **One Generated Image at Each Epsilion Value**
 As you can see below, as the value of epsilon increases, the recognizability of the image decreases; but, the error rate of the classifier increases.

![Epsilon 0.9 Iter 100 Error](img/bim_eps0.9iter100.png)
![Epsilon 0.4 Iter 75 Error](img/bim_eps0.4iter75.png)
![Epsilon 0.01 Iter 100 Error](img/bim_eps0.01iter100.png)
![Epsilon 0.9 Iter 40 Error](img/bim_eps0.9iter40.png)
![Epsilon 0.6 Iter 25 Error](img/bim_eps0.6iter25.png)


## FGSM Eval_Model Results
{'UM': 0.009081735620585268, 'Ensemble': 0.0030272452068617556, 'PGD-ADT': 0.005045408678102927}

## Eval_Model Results when evaluating against all 15 adversarial examples generated

{'UM': 0.009081735620585268, 'Ensemble': 0.0030272452068617556, 'PGD-ADT': 0.005045408678102927}