## Attack Surface of ML
### Taxonomy
* Knowledge
  * Black
  * Gray
  * White
* Target
  * Training
  * Testing
* Goals
  * Confidence reduction
  * Misclassification

### Poisoning Attacks
* Manipulate training dataset

### Evasion Attacks
* Manipulate input samples at test time to cause misclassification

### Evasion Attacks
* Discover the parameters of the model

### Membership Inference
* Infer if a data is part of training dataset or of the same distribution as training data

### Security Goals

| Goal                  | Attack               |
|-----------------------|----------------------|
| Data integrity        | Poisoning attack     |
| Model integrity       | Evasion attack       |
| Model confidentiality | Model extraction     |
| Privacy               | Membership inference |

## Adversarial Examples and Evasion Attacks

### Adversarial Example
* Input to model to cause the model to make a mistake
* Add noise, features to fool the model

### Attacking a Network
* **Goal 1:** find an input that is not Y (say, iguana) but will be classified as Y
$$L(\hat{y}, y) = \dfrac{1}{2} (\hat{y} - y_{iguana})^2$$

* **Goal 2:** find an input that is Y (say, cat) but will be classified as Y' (say, iguana)
$$L(\hat{y}, y) = \dfrac{1}{2} (\hat{y} - y_{iguana})^2 + \lambda (x - xcat)^2$$

### Fast-Gradient Sign Method (FGSM)
* Perturb the image so that it is misclassified but still looks like the original
$$adv\_x = x + \epsilon * sign(\nabla_x J(\theta, x, y))$$

* How to change the objective function to misclassify?
???

### Exercise
$$\begin{aligned}
w &= \begin{bmatrix}
1 & 3 & -1 & 2 & 2 & 3
\end{bmatrix}\\
X &= \begin{bmatrix}
1\\
-1\\
2\\
3\\
-2
\end{bmatrix}\\
\hat{y} &= w^Tx + b\\
&= -4
\end{aligned}$$

* How to change $X \rightarrow X^*$ radically but $X^* \subseteq X$?
$$\begin{aligned}
\dfrac{\delta y}{\delta x} &= w^T\\
X &= X + \epsilon w^T\\
&= \begin{bmatrix}
1\\
-1\\
2\\
3\\
-2
\end{bmatrix} + \begin{bmatrix}
0.2 * 1\\
0.2 * 3\\
\dots
\end{bmatrix}\\
&= \begin{bmatrix}
1.2\\
-0.4\\
\dots
\end{bmatrix}\\
\end{aligned}$$

Thus,
$$\begin{aligned}
\hat{y} &= 1.6
\end{aligned}$$

## Black Box Attacks and Transferability
* Steps
  1. Query the remote ML model using some API with inputs to obtain their labels
  2. Use this labeled data to create local surrogate ML model
  3. Use local model to craft adversarial example which are misclassified by the remote model

### Transferability
* Ability of an attack crafted against a surrogate local model to be effective against an unknown model

#### Intra-technique Transferability
* Models A and B are trained using the same ML technique

#### Cross-technique Transferability
* Models A and B are different

#### Results
* Cell $(i, j)$ represents percentage of adversarial samples produced using model $i$ misclassified by model $j$.

## Generative Adversarial Network (GAN)
* Comprised of two neural networks - **Discriminator** and **Generator**
  * Both compete against each other
* **Goal**: Given training data, generate new samples from the same distribution, ie: learn $p_{model}(x)$ similar to $p_{data}(x)$

### Models
<img src="./pictures/gan_models.png" alt="Discriminator and Generator" width="800"/>

#### Discriminative Model
* Model that classifies data into two categories - fake or not

#### Generative Model
* Model pre-trained on some distribution $D$ when given some random distribution $Z$ produces a distribution $D'$ which is close to $D$

### Math:
##### Binary Cross Entropy Loss
$$L(y, \hat{y} = [y \log \hat{y} + (1-y) \log (1 - \hat{y})]$$

#### Discriminator
* Data from $p_{data}(x)$:
$$\begin{aligned}
y &= 1 \text{ // true label}\\
\hat{y} &= D(x) \text{ // output of the discriminator model}\\
L(\hat{y}, y) &= L(D(x), 1) = \log (D(x))
\end{aligned}$$

* Data from Generator:
$$\begin{aligned}
y &= 0 \text{ // true label}\\
\hat{y} &= D(G(z)) \text{ // output of the discriminator model}\\
L(\hat{y}, y) &= L(D(x), 1) = \log (1 - D(G(z)))
\end{aligned}$$

<img src="./pictures/discriminator_loss1.png" alt="Discriminator Loss, y=1" width="500"/>
<img src="./pictures/discriminator_loss2.png" alt="Discriminator Loss, y=2" width="500"/>
<!-- ![Discriminator Loss, y=1](./pictures/discriminator_loss1.png)
![Discriminator Loss, y=0](./pictures/discriminator_loss2.png) -->

* Objective:
$$\max [\log (D(x)) + \log (1 - D(G(z)))]$$

* Why max?
Look at the graphs, when:
  * $y = 1$ and $D(x) = 1$, loss is 0
  * $y = 1$ and $D(x) = 0$, loss is $-\infty$
  * $y = 0$ and $D(x) = 0$, loss is 0
  * $y = 0$ and $D(x) = 1$, loss is $-\infty$
Thus, we need to maximize loss.


#### Generator
$$\begin{aligned}
y &= 0 \text{ // because fake image}\\
\hat{y} &= D(G(z)) \text{ // output of Discriminator}\\
L(\hat{y}, y) &= L(D(G(z)), 0) = \log (1 - D(G(z)))
\end{aligned}$$

<img src="./pictures/generator_loss.png" alt="Generator Loss, y=0" width="500"/>
<!-- ![Generator Loss, y=0](./pictures/generator_loss.png) -->

We add notation $\log D(x)$ just so that we can combine the two:
$$\min [\log (D(x)) + \log (1 - D(G(z)))]$$

Combining the two, we have:
$$\min_{G} \max_{D} [\log (D(x)) + \log (1 - D(G(z)))]$$

For all samples:
$$\min_{G} \max_{D} \dfrac{1}{m} \sum_{i=1}^{m} [\log (D(x)) + \log (1 - D(G(z)))]$$

## Momentum
$$\begin{aligned}
\end{aligned}$$