# Machine Learning Systems Task 2 
## Option 1: Optimazation-Based White-Box Attack
_Daniel Jones, Praful Chunchu, Ravi Patel and Austin Staton_

**Objective**: Generate adversarial examples for the vanilla Athena, in the context of an optimized white-box threat model.

We will be extending the [Athena](https://arxiv.org/pdf/2001.00308.pdf) framework to create adversarial attacks within a white-box threat model. Meaning, the attacks genenerated are specificially aimed to take advantage of a given model with its ensemble of defenses. 

Earlier work provided us with data for attacks in a more general, zero-knowldege context. This prior data will serve as a baseline reference for comparison. The contrast between zero-knowledge and white-box threats will highlight the efficacy of adversarial examples generated to a specific model. Also, Expectation Over Transformation threats are explored to demonstrate thier ability within the context of an optimized white-box threat model.

## Relevant Files
* 24 generated (subsampled) adversarial examples are located at:
 * `/task2/data/ae_wb_*.npy`

* Subsamples/sublabels for a 1:10 ratio are located at:
 * `/task2/data/sublabels-mnist-ratio_0.1-112490.080191753.npy` and
 * `/task2/data/subsamples-mnist-ratio_0.1-112490.080191753.npy`.
 
* Subsamples/sublabels for a 1:100 ratio are located at:
 * `/task2/data/sublabels-mnist-ratio_0.01-3.860350964.npy` and
 * `/task2/data/subsamples-mnist-ratio_0.01-3.860350964.npy`.

* JSON configurations for the various adversarial examples are located at:
 * `/task2/configs/`.


***
# Vanilla Athena

The Athena framework can be built to accomodate for multiple, discrete weak defenses that protect an input against adversarial inputs. Our experimentation used an arbitrary 5 weak defenses. The five weak defenses were:

```python

```

In [6]:
{ "configs3": {
        "type": "rotate",
        "subtype": "",
        "id": 3,
        "description": "rotate270",
        "angle": 270
},
"configs39": {
        "type": "distort",
        "subtype": "y",
        "id": 39,
        "description": "distort_y",
        "r1": 5.0,
        "r2": 2.0,
        "c": 28.0
},    
"configs42": {
        "type": "noise",
        "subtype": "poisson",
        "id": 42,
        "description": "noise_poisson",
        "noise": "poisson"
},
"configs43": {
        "type": "noise",
        "subtype": "salt",
        "id": 43,
        "description": "noise_salt",
        "noise": "salt"
},    
"configs55": {
        "type": "filter",
        "subtype": "entropy",
        "id": 55,
        "description": "filter_entropy",
        "radius": 2
} }

{'configs3': {'type': 'rotate',
  'subtype': '',
  'id': 3,
  'description': 'rotate270',
  'angle': 270},
 'configs39': {'type': 'distort',
  'subtype': 'y',
  'id': 39,
  'description': 'distort_y',
  'r1': 5.0,
  'r2': 2.0,
  'c': 28.0},
 'configs42': {'type': 'noise',
  'subtype': 'poisson',
  'id': 42,
  'description': 'noise_poisson',
  'noise': 'poisson'},
 'configs43': {'type': 'noise',
  'subtype': 'salt',
  'id': 43,
  'description': 'noise_salt',
  'noise': 'salt'},
 'configs55': {'type': 'filter',
  'subtype': 'entropy',
  'id': 55,
  'description': 'filter_entropy',
  'radius': 2}}

***
# Experimental Approach

There were two parts to experimentation: 
* First, adversarial examples were generated for the white-box threat model and compared to the zero-knowledge examples. Adversarial Examples were mostly generated with identical configurations that were used in Task 1, thus allowing observation of AE behavior when the target model is changed, but AE configurations remain the same.
* Second, Expectation Over Transformation (EOT) adversarial examples (which increase robustness) were created across a distribution of transformations.

#### White-Box vs. Zero-Knowledge Threats
We generated adversarial examples for three types of adversarial attacks: 
 * Projected Gradient Descent (PGD), 
 * the Fast Gradient Signed Method (FGSM),
 * and the Basic Iterative Method (BIM).
 
For each of these attack methods, the epsilon values (i.e., the degrees of input alteration) were manipulated to attack the model defended by vanilla Athena. The epsilon values for each attack in this experiment were: `0.03`, `0.06`, `0.12`, `0.24`, and `0.48` for all three attacks. 

These epsilon values and attack configurations remained consistent between the white-box and zero-knowledge tasks as a control to data integrity.

#### Expectation Over Transformation (EOT)
Adversarial examples can be enhanced to increase the pertubation to the attacked model. This level of robustness is created by applying a distrubution of transformations to each example, which is aimed to constrain the expected effective distance between the adversarial and original inputs to the classifier (_1. Athalye, Engstrom, et al._). 

In our experiments, we are using a rotation, from -45 to +45 offset, as the transformation for robustness. This transformation was generated separately from the basic, non-EOT, adversarial examples crafted against vanilla Athena. They were also crafted at different epsilon values. 

PGD and FGSM were used to experiment with EOT adversarial examples. The tuned parameters, mainly `eps` (epsilon) and `num_samples` (number of transformations in the input) are below, along with all other configs used to generate adversarial examples:

In [11]:
{
  "num_attacks": 24,
  "configs0": {
    "attack": "pgd",
    "description": "pgd_eps003",
    "eps": 0.03
  },
  "configs1": {
    "attack": "pgd",
    "description": "pgd_eps006",
    "eps": 0.06
  },
  "configs2": {
    "attack": "pgd",
    "description": "pgd_eps012",
    "eps": 0.12
  },
  "configs3": {
    "attack": "pgd",
    "description": "pgd_eps024",
    "eps": 0.24
  },
  "configs4": {
    "attack": "pgd",
    "description": "pgd_eps048",
    "eps": 0.48
  },
  "configs5": {
    "attack": "fgsm",
    "description": "fgsm_eps003",
    "eps": 0.03
  },
  "configs6": {
    "attack": "fgsm",
    "description": "fgsm_eps006",
    "eps": 0.06
  },
  "configs7": {
    "attack": "fgsm",
    "description": "fgsm_eps012",
    "eps": 0.12
  },
  "configs8": {
    "attack": "fgsm",
    "description": "fgsm_eps024",
    "eps": 0.24
  },
  "configs9": {
    "attack": "fgsm",
    "description": "fgsm_eps048",
    "eps": 0.48
  },
  "configs10": {
    "attack": "bim",
    "description": "bim_eps003iter50",
    "eps": 0.03,
    "max_iter": 50
  },
  "configs11": {
    "attack": "bim",
    "description": "bim_eps006iter50",
    "eps": 0.06,
    "max_iter": 50
  },
  "configs12": {
    "attack": "bim",
    "description": "bim_eps012ter50",
    "eps": 0.12,
    "max_iter": 50
  },
  "configs13": {
    "attack": "bim",
    "description": "bim_eps024iter50",
    "eps": 0.24,
    "max_iter": 50
  },
  "configs14": {
    "attack": "bim",
    "description": "bim_eps048iter50",
    "eps": 0.48,
    "max_iter": 50
  },
  "configs15": {
    "attack": "pgd",
    "description": "pgd_eps05_EOT_50",
    "eps": 0.5,
    "distribution": {
      "num_samples": 50,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
  "configs16": {
    "attack": "pgd",
    "description": "pgd_eps05_EOT_100",
    "eps": 0.5,
    "distribution": {
      "num_samples": 100,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
  "configs17": {
    "attack": "pgd",
    "description": "pgd_eps05_EOT_200",
    "eps": 0.5,
    "distribution": {
      "num_samples": 200,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
  "configs18": {
    "attack": "fgsm",
    "description": "fgsm_eps05_EOT_50",
    "eps": 0.5,
    "distribution": {
      "num_samples": 50,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
  "configs19": {
    "attack": "fgsm",
    "description": "fgsm_eps05_EOT_100",
    "eps": 0.5,
    "distribution": {
      "num_samples": 100,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
  "configs20": {
    "attack": "fgsm",
    "description": "fgsm_eps05_EOT_200",
    "eps": 0.5,
    "distribution": {
      "num_samples": 200,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
  "configs21": {
    "attack": "fgsm",
    "description": "fgsm_eps035_EOT_50",
    "eps": 0.35,
    "distribution": {
      "num_samples": 50,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
  "configs22": {
    "attack": "fgsm",
    "description": "fgsm_eps035_EOT_100",
    "eps": 0.35,
    "distribution": {
      "num_samples": 100,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
  "configs23": {
    "attack": "fgsm",
    "description": "fgsm_eps035_EOT_200",
    "eps": 0.35,
    "distribution": {
      "num_samples": 200,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  }
}

{'num_attacks': 6,
 'configs0': {'attack': 'pgd',
  'description': 'pgd_eps024',
  'eps': 0.5,
  'distribution': {'num_samples': 50,
   'transformation': 'translation',
   'min_offset': -0.5,
   'max_offset': 0.5}},
 'configs1': {'attack': 'pgd',
  'description': 'pgd_eps024',
  'eps': 0.5,
  'distribution': {'num_samples': 100,
   'transformation': 'translation',
   'min_offset': -0.5,
   'max_offset': 0.5}},
 'configs2': {'attack': 'pgd',
  'description': 'pgd_eps024',
  'eps': 0.5,
  'distribution': {'num_samples': 200,
   'transformation': 'translation',
   'min_offset': -0.5,
   'max_offset': 0.5}},
 'configs3': {'attack': 'fgsm',
  'description': 'fgsm_eps024',
  'eps': 0.35,
  'distribution': {'num_samples': 50,
   'transformation': 'translation',
   'min_offset': -0.5,
   'max_offset': 0.5}},
 'configs4': {'attack': 'fgsm',
  'description': 'fgsm_eps024',
  'eps': 0.35,
  'distribution': {'num_samples': 100,
   'transformation': 'translation',
   'min_offset': -0.5,
   'max_off

#### Subsampling 
Adversarial examples were created with subsamples. To maintain the integrity of the comparisons drawn, **identical** subsamples were used for the generation of adversarial examples for the comparison the zero-knowledge (old) and white-box (new) threats.

This increased computational cost substantially; however, the experiment needed to keep its controls for authenicity. These subsamples/sublabels were formed with a ratio of 1:10 (`0.1`).

We did make computational cost more reasonable when generating Expectation Over Transformation (EOT) adversarial examples. These were an appendage to the previous comparisons, so creating smaller subsamples here will not effect data legitimacy. They were subsampled at a ratio of 1:100 (`0.01`).

***
# White-Box vs. Zero-Knowledge Results

Adversarial examples created in an optimized white-box context proved to be much more effective for vanilla athena specifically, matching expectations. AEs were evaluated against the ensemble of 5 weak defences shown above with the AVEP ensemble strategy, identical to how they were in task 1. 

### Projected Gradient Descent
Below are the results from the former, zero-knowledge (ZK) and the new, white-box (WB) PGD adversarial examples. These are specifically for the model with the ensemble of of weak defenses, **Athena**.

| Epsilon | ZK Error Rate | WB Error Rate| Effective Gain | 
|:-----|------------|------------|----------|
| 0.03 | 0.00201816 | 0.00201816 | 1.0x |
| 0.06 | 0.00302725 | 0.00504540 | 1.6x |
| 0.12 | 0.01009082 | 0.01311806 | 1.3x |
| 0.24 | 0.03632694 | 0.07568113 | 2.1x |
| 0.48 | 0.29868819 | 0.49243189 | 1.7x |

![PGD](img/pgd_zk_v_wb.png)

The results of our experiment matched our expectations. It would make sense, when adversarial examples are tailored to a specific model, for the error rate to increase. Noticably, at epsilon `0.24`, a doubling of the error rate is created.

This, however, is not the interesting trend. The increase in error rate for an attack deployed on the specific model it was created for is expected. An interesting trend can be seen on when the same, tailored adversarial example is sent to an **undefended model**; its data is below:

| Epsilon | ZK Error Rate | WB Error Rate| Effective Gain | 
|:-----|------------|------------|----------|
| 0.03 | 0.04137235 | 0.0        | 0.0x |
| 0.06 | 0.20282543 | 0.00504541 | 0.0249x |
| 0.12 | 0.84661958 | 0.02522704 | 0.0298x |
| 0.24 | 0.99091826 | 0.13622603 | 0.1375x |
| 0.48 | 0.99091826 | 0.72653885 | 0.7332x |

In the PGD adversarial examples that were tailored for a different model (Athena) but sent to the undefended model instead, a _huge_ reduction in effectiveness is noticed. At epsilon `0.06`, there is a decrement to success by a factor of `0.025`.

This can allow us to conclude, for PGD at these epsilon values, that the optimized white-box adversarial examples designed for vanilla Athena are successful at perturbing its indended model; but, highly _ineffective_ at perturbing the other models.

### Fast Gradient Signed Method
Below are the results from the former, zero-knowledge (ZK) and the new, white-box (WB) FGSM adversarial examples. These are specifically for the model with the ensemble of of weak defenses, **Athena**.

| Epsilon | ZK Error Rate | WB Error Rate| Effective Gain | 
|:-----|------------|------------|----------|
| 0.03 | 0.00201816 | 0.00201816 | 1.0x |
| 0.06 | 0.00201816 | 0.00403633 | 2.0x |
| 0.12 | 0.00605449 | 0.00807265 | 1.3x |
| 0.24 | 0.03733602 | 0.02825429 | 0.76x |
| 0.48 | 0.73360242 | 0.64984864 | 0.89x |

![FGSM](img/fgsm_zk_v_wb.png)

The results for FGSM begin to resemble that of PGD at low epsilons; the effectiveness of tailored adversarial examples increases when compared to a zero-knowledge example. It does begin to decrease in efficacy at higher epsilons. 

In a similar comparison to that of PGD, FGSM is less successful on models not tailored to from the adversarial example. Below are the results of the adversarial examples tailored to Athena, send to the **undefended model**.

| Epsilon | ZK Error Rate | WB Error Rate| Effective Gain | 
|:-----|------------|-------------|----------|
| 0.03 | 0.02421796 | 0.0         | 0.0x |
| 0.06 | 0.09788093 | 0.006054490 | 0.062x |
| 0.12 | 0.35822402 | 0.017154389 | 0.048x |
| 0.24 | 0.81533804 | 0.137235116 | 0.168x |
| 0.48 | 0.91321897 | 0.885973764 | 0.970x |

Again, we see a decrement in effectiveness across all values of epsilon for execution on the unintended model. At no point is the general, zero-knowledge adverarial example worse against the undefended model.


### Basic Iterative Method
Below are the results from the former, zero-knowledge (ZK) and the new, white-box (WB) BIM adversarial examples. These are specifically for the model with the ensemble of of weak defenses, **Athena**. For consistency, all of this data was collected using `50` iterations for BIM.

| Epsilon | ZK Error Rate | WB Error Rate| Effective Gain | 
|:-----|------------|------------|----------|
| 0.03 | 0.00201816 | 0.00302725 | 1.5x |
| 0.06 | 0.00302725 | 0.00403633 | 1.3x |
| 0.12 | 0.01009082 | 0.02825429 | 2.8x |
| 0.24 | 0.07568113 | 0.31382442 | 4.1x |
| 0.48 | 0.65489495 | 0.94349142 | 1.4x |

![BIM](img/bim_zk_v_wb.png)

The Athena-based, white-box adversarial examples are consistently better than the general BIM AEs at disrupting the model. At an epsilon value of 0.24 and 50 iterations, this is an increase by a factor of 4.1. This sizeable increase is the largest increase yet; and, it displays the importance of defending against white-box attacks.

Like PGD and FGSM, BIM saw less success in other models with its custom adversarial examples. Below is data comparing zero-knowledge and white-box AEs against the **undefended model** at `50` BIM iterations.

| Epsilon | ZK Error Rate | WB Error Rate| Effective Gain | 
|:-----|------------|------------|----------|
| 0.03 | 0.05247225 | 0.00201816 | 0.038x |
| 0.06 | 0.30575177 | 0.00706357 | 0.023x |
| 0.12 | 0.98486377 | 0.04540869 | 0.046x |
| 0.24 | 0.99091826 | 0.41069627 | 0.414x |
| 0.48 | 0.99091826 | 0.95660949 | 0.965x |

BIM, while the best performer using attacks tailored to the vanilla Athena, is the worst performer when these same attacks are applied elsewhere. 

***
# Expectation Over Transformation Results

EOT, the distribution of robust adversarial examples using a specific trasformation, was highly effective against the defended Athena. Experimentation was done using rotation to transdform the inputs. EOT was specifically used with PGD and FGSM. Initially, each of the white-box adversarial examples were executed at an epsilon value of `0.5`. The error rates for an Undefended Model, Athena, and PGD-ADT, all against **PGD** with an epsilon value of `0.5` are below:

| Number of Samples | UM | Athena | PGD-ADT |
|:--|--|--|--|
| 50  | 0.8 | 0.2 | 0.2 |
| 100 | 0.6 | 0.3 | 0.4 | 
| 200 | 0.8 | 0.5 | 0.3 |

Similarly, the results for **FGSM** with an epsilon value of `0.5` are below:

| Number of Samples | UM | Athena | PGD-ADT |
|:--|--|--|--|
| 50  | 0.9 | 0.9 | 0.8 |
| 100 | 0.9 | 0.9 | 0.8 | 
| 200 | 0.9 | 0.8 | 0.8 |

FGSM created very high error rates at this epsilon value. To obtain more interesting data, an identical test was executed at an epsilon of `0.35`. Its results are below.

| Number of Samples | UM | Athena | PGD-ADT |
|:--|--|--|--|
| 50  | 0.8 | 0.1 | 0.2 |
| 100 | 0.6 | 0.3 | 0.4 | 
| 200 | 0.8 | 0.4 | 0.3 |

Here, the results show a clear, direct correlation between the number of samples in the transformation's distribution and its success in Athena. 


# Conclusions
The comparison between zero-knowledge and optimized white-box adversarial examples yielded a clearly more effective approach. The adversarial examples generated to perturb a specific model successfully did so better than general adversarial examples did. 

Interestingly though, it was found that these optimized white-box adversarial examples created for Athena faired far worse on other models. Optimizing adversarial examples for one model diminished its success in other models.

Expectation Over Transformation created major pertubations in Athena. especially when compared to similar values of epsilon in the non-EOT approaches. Generally, as the number of samples in the distrubution increased, effectiveness did as well.

***
## Citations

1. Anish Athalye, Logan Engstrom, Andrew Ilyas, Kevin Kwok. _Synthesizing Robust Adversarial Examples_. ICML 2018.

***
## Contributions
**Austin Staton**: Worked on data analysis and the creation of the report.

**Daniel Jones**: Developed original project's approach, data collection, and generating AEs. 

**Praful Chunchu**: Worked on the development of AEs.

**Ravi Patel**: Worked on the development of AEs.