## Introduction to Security
* __Risk__
  * a function of __threats__ exploiting __vulnerabilities__ to damage or obtain unauthorized access to __assets__
  * potential for loss or damage to __asset__ as a result of a __threat__ exploiting a __vulnearibility__
* __Assets__
  * what we are trying to protect
  * software or hardware
  * contain information or support information related activities
* __Vulnerabilities__
  * weakness in system that could be exploited or triggered by a threat
  * Source:
    * bad software / hardware
    * bad design
    * bad policy / configuration
    * system misuse
* __Threat__
  * specific means by which an attacker can put a system at risk
* __Attack__
  * when someone attempts to exploit a vulnerability
  * Kinds:
    * Passive (eavesdropping, keylogger)
    * Active (password guessing)
    * DoS or DDoS
* __Compromise__
  * when an attack is successful

### Security Goals / Triads
* Confidentiality
  * assets are _accessed_ (viewing, printing or simply knowing its existence) only by authorized parties
* Integrity
  * assets are _modified_ only by authorized parties
* Availability
  * degree to which system is accessible and in functioning condition

* Privacy
  * individual's desire to control who has access to their personal data

### Threat Assessment and Model
* Threat assessment
  * what kind of threats
  * capabilities of the adversary
  * limitations of the adversary
* Threat model
  * result of threat assessment
  * characterization of the threats a system might face

* Evaluation of security
  * Identify the security goals
    * what assets need protection
  * Perform a threat assessmnet
  * Security analysis
    * any feasible attacks that can violate security goals

## Attack Surface of ML
### Taxonomy
* Knowledge
  * Black
  * Gray
  * White
* Target
  * Training
  * Testing
* Goals
  * Confidence reduction
  * Misclassification

### Poisoning Attacks
* Manipulate training dataset
* Decision boundary is changed

### Evasion Attacks
* Manipulate input samples at test time to cause misclassification
* Decision boundary does not change but the input is changed

### Model Extraction
* Discover the parameters of the model

### Membership Inference
* Infer if a data is part of training dataset or of the same distribution as training data

### Security Goals

| Goal                  | Attack               |
|-----------------------|----------------------|
| Data integrity        | Poisoning attack     |
| Model integrity       | Evasion attack       |
| Model confidentiality | Model extraction     |
| Privacy               | Membership inference |

## Adversarial Examples and Evasion Attacks

### Adversarial Example
* Input to model to cause the model to make a mistake
* Add noise, features to fool the model

### Evasion Attacks
* **Goal 1:** find an input that is not Y (say, iguana) but will be classified as Y
$$\begin{align*}
L(\hat{y}, y) &= \dfrac{1}{2} (\hat{y} - y_{iguana})^2\\
x &= x - \alpha \dfrac{\partial L}{\partial x}
\end{align*}$$

* **Goal 2:** find an input that is Y (say, cat) but will be classified as Y' (say, iguana)
$$\begin{align*}
L(\hat{y}, y) &= \dfrac{1}{2} (\hat{y} - y_{iguana})^2 + \lambda (x - x_{cat})^2\\
x &= x - \alpha \left( \dfrac{\partial L}{\partial x} + \lambda (x - x_{cat}) \right)
\end{align*}$$

### Fast-Gradient Sign Method (FGSM)
* Perturb the image so that it is misclassified but still looks like the original
$$adv\_x = x + \epsilon * sign(\nabla_x J(\theta, x, y))$$

* How to change the objective function to misclassify?
???

#### Exercise
$$\begin{aligned}
w &= \begin{bmatrix}
1 & 3 & -1 & 2 & 2 & 3
\end{bmatrix}\\
X &= \begin{bmatrix}
1\\
-1\\
2\\
3\\
-2
\end{bmatrix}\\
\hat{y} &= w^Tx + b\\
&= -4
\end{aligned}$$

* How to change $X \rightarrow X^*$ radically but $X^* \subseteq X$?
$$\begin{aligned}
\dfrac{\delta y}{\delta x} &= w^T\\
X &= X + \epsilon w^T\\
&= \begin{bmatrix}
1\\
-1\\
2\\
3\\
-2
\end{bmatrix} + \begin{bmatrix}
0.2 * 1\\
0.2 * 3\\
\dots
\end{bmatrix}\\
&= \begin{bmatrix}
1.2\\
-0.4\\
\dots
\end{bmatrix}\\
\end{aligned}$$

Thus,
$$\begin{aligned}
\hat{y} &= 1.6
\end{aligned}$$

## Black Box Attacks and Transferability
* Steps
  1. Query the remote ML model using some API with inputs to obtain their labels
  2. Use this labeled data to create local surrogate ML model
  3. Use local model to craft adversarial example which are misclassified by the remote model

### Transferability
* Ability of an attack crafted against a surrogate local model to be effective against an unknown model

#### Intra-technique Transferability
* Models A and B are trained using the same ML technique

#### Cross-technique Transferability
* Models A and B are different

#### Results
* Cell $(i, j)$ represents percentage of adversarial samples produced using model $i$ misclassified by model $j$.

## Generative Adversarial Network (GAN)
* Comprised of two neural networks - **Discriminator** and **Generator**
  * Both compete against each other
* **Goal**: Given training data, generate new samples from the same distribution, ie: learn $p_{model}(x)$ similar to $p_{data}(x)$

### Models
<img src="./pictures/gan_models.png" alt="Discriminator and Generator" width="800"/>

#### Discriminative Model
* Model that classifies data into two categories - fake or not

#### Generative Model
* Model pre-trained on some distribution $D$ when given some random distribution $Z$ produces a distribution $D'$ which is close to $D$

### Math:
##### Binary Cross Entropy Loss
$$L(y, \hat{y} = [y \log \hat{y} + (1-y) \log (1 - \hat{y})]$$

#### Discriminator
* Data from $p_{data}(x)$:
$$\begin{aligned}
y &= 1 \text{ // true label}\\
\hat{y} &= D(x) \text{ // output of the discriminator model}\\
L(\hat{y}, y) &= L(D(x), 1) = \log (D(x))
\end{aligned}$$

* Data from Generator:
$$\begin{aligned}
y &= 0 \text{ // true label}\\
\hat{y} &= D(G(z)) \text{ // output of the discriminator model}\\
L(\hat{y}, y) &= L(D(x), 1) = \log (1 - D(G(z)))
\end{aligned}$$

<img src="./pictures/discriminator_loss1.png" alt="Discriminator Loss, y=1" width="500"/>
<img src="./pictures/discriminator_loss2.png" alt="Discriminator Loss, y=2" width="500"/>
<!-- ![Discriminator Loss, y=1](./pictures/discriminator_loss1.png)
![Discriminator Loss, y=0](./pictures/discriminator_loss2.png) -->

* Objective:
$$\max [\log (D(x)) + \log (1 - D(G(z)))]$$

* Why max?
Look at the graphs, when:
  * $y = 1$ and $D(x) = 1$, loss is 0
  * $y = 1$ and $D(x) = 0$, loss is $-\infty$
  * $y = 0$ and $D(x) = 0$, loss is 0
  * $y = 0$ and $D(x) = 1$, loss is $-\infty$
Thus, we need to maximize loss.


#### Generator
$$\begin{aligned}
y &= 0 \text{ // because fake image}\\
\hat{y} &= D(G(z)) \text{ // output of Discriminator}\\
L(\hat{y}, y) &= L(D(G(z)), 0) = \log (1 - D(G(z)))
\end{aligned}$$

<img src="./pictures/generator_loss.png" alt="Generator Loss, y=0" width="500"/>
<!-- ![Generator Loss, y=0](./pictures/generator_loss.png) -->

We add notation $\log D(x)$ just so that we can combine the two:
$$\min [\log (D(x)) + \log (1 - D(G(z)))]$$

Combining the two, we have:
$$\min_{G} \max_{D} [\log (D(x)) + \log (1 - D(G(z)))]$$

For all samples:
$$\min_{G} \max_{D} \dfrac{1}{m} \sum_{i=1}^{m} [\log (D(x)) + \log (1 - D(G(z)))]$$

### Issues
* Vanishing gradient
* Mode collapse (generator always produces same output)
* Nash equilibrium - both models achieve convergence concurrently
* Counting
* Perspective

## Adversarial Example Detection

### Attribute-Steered Model
#### Attribute witness extraction
* Intersection of __attribute substitution__ and __attribute preservation__.
* New model is created by:
  * __Neuron weakening:__ weaken the non-witness neurons
  * __Neuron strengthening:__ strengthen the witness neurons

* Detection (??)
  * what does false positive on benign input mean?
  * incorrect classification to a class, is it classification with trigger?

### Neural network invariant checking
* Allow correct behaviors and forbit malicious behaviors (eg: assert certain _behaviors_)
* __Value invariants:__ possible value distribution for each neuron
* __Provenance invariants:__ possible delta between the values [_pattern_] of two layers of neurons
* Examples (??)

## Adversarial Sample Defenses
### Gradient Masking
* Hide or destroy the gradient (so that all gradient based attacks fail)
* Defense techniques:
  * __Distillation defense:__ changes the scale of last hidden layer
  * __Input preprocessing:__ transforms input images by resizing, cropping, discretizing pixels
  * __Defense GAN:__ uses GAN to transform perturbed images into clean images

#### Evading gradient masking
* Approximate gradients
* Hiding or breaking gradients makes the loss surface zig-zaggy, when doing backward pass replace function with difficult gradient with one that has nice gradient

### Certified Adversarial Robustness
* Model gives prediction and a certificate that the prediction is constant (holds) within an $l2$ around the input
* Randomized smoothing leads to smoother decision boundaries
  * Smooth 𝑓 into a new classifier $g$ (the “smoothed classifier”) as follows:
$$g(x) = \text{the most probable prediction by } f \text{ of random Gaussian corruptions of } x$$
* Large random perturbations “drown out” small adversarial perturbations (??)

## Confidentiality and Privacy Threats in ML
### Model Inversion
* Extra sensitive inputs by leverage knowledge about model structure and some information about an individual or object

#### Type 1
* For example, if one of the features is sensitive, attacker can judge the feature by setting it to 0 or 1 and checking the output label, $y$
* What if $y$ does not change on changing feature?
* White box or black box? White - attacker has information about parameters or black - attacker changes one of those but cannot see other parameters.
* Formally:
  * infer $x_n$ given $f$, $x_1, x_2, ..., x_{n-1}$ and $y$ (??) where $x_n \in \{v_1, v_2, ..., v_s\}$
  * compute $y_j = f(x_1, x_2, ..., x_{n-1}, v_j)$ for each $j$
  * output $v_j$ that maximizes:
  $$dist(y, y_j) \times P(v_j | x_1, x_2, ..., x_{n-1})$$
  * what is $y$? (??)

#### Type 2
* Given $f$ and $y$, infer $X$
* Use gradient descent to search for input $X$ which maximizes probability of $y$
* White box or black box?

### Model Extraction
* Learn a close approximation of the model $f$ using as few queries to the model as possible
* For example, logistic regression function can be converted to a linear equation in $n+1$ variables
* __Extraction attack:__ learn model architecture or parameters
* __Oracle attack:__ construct a substitute model

### Membership Inference
* Given an input $x$ and a black box access to the model $f$, determine if $x \in D$, meaning whether $x$ was part of the training data (or distribution (??))
* Privacy concern?
  * If $x$ is used for training a medical model, if one can determine $x \in D$, one can predict whether an individual have health issue or not

#### Attack stages
1. Development of shadow dataset
  * Goal: develop a dataset $D'$ which closely emulates the original dataset $D$
  * Techniques: statistics-based, query-based, active learning, region-based
2. Generation of attack model training dataset
  * Takes input from shadown dataset $D'$ as $(x', y')$ and outputs a probability vector $p = (p_1, p_2, ..., p_k)$ and a binary label indicating "in" or "out"
  * Partition $D'$ into $D_1, D_2, ..., D_s$
  * $\forall j,$ train $f_j$ to output "in" for $D_j$ and "out" for $D \setminus D_j$
  * Obtain attack training data, $p = (p_1, p_2, ..., p_k)$ and label "in" or "out" 
3. Training and depoloyment of membership inference attack model
  * Given input of probability vector, return "in" or "out"

## Differential Privacy
* Guarantees:
  * Raw data will not be viewed
  * Output will have distortions

### Confidence Interval
* Range of values for which we are fairly sure (say, $x%$) the true value lies in
$$\overline{X} \pm Z\dfrac{s}{\sqrt{n}}$$

### Standard DP
* Analysts sends a query to a software called _DP guard_
* Guard sends the query to the DB or model to retrieve the output
* Guard adds __noise__ to the output (in order to protect the confidentiality of the individual whose data was accessed from DB) and sends back the response to the analyst

### Local DP
* User anonymizes the data themselves and send to the aggregator
* Aggregator doesn't have access to the real data

#### Advantages / disadvantages
* Local DP less prone to data leak as the aggregator does not have access to real data
* Might be less accurate (?)

### Formalism
* Whether or not more data is adding into $D$, both the results with or without will be the same, $R$. $A = 1$ is ideal in which case both the results are identical, whereas if $A$ is much larger or smaller, the result deviates too much.
$$\dfrac{P(Q(D_I)) = R}{P(Q(D_{I \pm i})) = R} \leq A$$

* Putting, $A = e^\epsilon$:
$$\dfrac{P(R | D_I}{P(R | D_{I \pm i}) = R} \leq e^\epsilon$$

#### Global sensitivity
* $F(D) = X$ is a deterministic, non-privatized function over dataset $D$ which returns $X$, a vector $k$ real numbers.
* Global sensitivity is the sum of the worst case differences between datasets $D1$ and $D2$ differing by at most one element, $\Delta F$:
$$\Delta F = \max_{D1, D2} \left|\left| F(D1) - F(D2) \right|\right|_{L1}$$

#### Noise adding mechanism
* Privatizing by adding noise from Laplace distribution:
$$P(R = x | D \text{ is true world}) = \dfrac{\epsilon}{2 \Delta F} \exp{-\dfrac{\left| x - F(D) \right| \epsilon}{\Delta F}}$$
* Laplace ($\epsilon-$ differential)
$$F(x) = f(x) + Lap\left(\dfrac{s}{\epsilon}\right)$$
* Exponential ($\epsilon-$ differential)
  * Works with both numeric and categorical data
  * Releases the identity of the element with MAX noisy score and not that of the score itself (??)
* Gaussian ($\epsilon, \delta-$ differential)