## Application
- Fraud Detection
    - $x^{(i)}$ = features of user $i$'s activities
    - Identify unusual users by checking which have $P(x) < \epsilon$
- Manufacturing
- Monitoring computers in a data center
    - Features: memory usage, CPU load, network traffic, etc

## Algorithm
$P(x) = \Pi_{j=1}^n P(x_j; \mu_j,\sigma_j^2)$
- $\mu_j = \frac{1}{m}\sum_{i=1}^m x_j^{(i)}$
- $\sigma_j^{2}=\frac{1}{m}\sum_{i=1}^m(x_j^{(i)}-\mu_j)^2$

$P(x) < \epsilon$, flag as anomaly;
Otherwise, no

## Implementation

> Aircraft engines example

### Split the dataset
- Original Dataset: 10,000 normal engines and 20 anomalous engines
- Training set: 6,000 normal and 0 anomalous
- Cross Validation: 2,000 normal and 10 anomalous
- Test set: 2,000 normal and 10 anomalous

### Evaluation Metrics
- True Positive, false positive, false negative, true negative
- Precision/Recall
- $F_1$ score

Use cross validation to choose right parameter $/epsilon$ that provides us desired result

### Compare with the Supervised Learning
#### Difference
![W9-AS-DIF](Plots/W9-AS-DIF.png)

#### Different Application
![W9-AS-DIFF2](Plots/W9-AS-DIFF2.png)
Note: for those Anomaly Detection Applications, they can be solved with the Supervised Learning if there are enough positive samples

### Select Features
#### Non-gaussian Features
- Convert to Gaussian form with functions like `log(x + c)`
- Use `hist()` (Histogram) to observe the distribution before and during processing

#### Choose/Create Features that might take on unusually large/small values during anomaly
- For example, if $x_1$ and $x_2$ have linear relationship ($x_1 \approx x_2$) during normal running, but the relationship will be violated during anomaly. We can create a $x_3 = \frac{x_1}{x_2}$ so that normally it will be a constant, but it will be very large/low if there is anomaly. 

## Multivariate Gaussian Distribution
Model all $p(x_i)$ together instead of evaluating them separately
- All the dimension $x^i$ will have an impact on the final $p$

### Model
Parameter: 
- $\mu \in R^n$
- $\Sigma \in R^{n*n}$: covariance matrix
    - $|\Sigma|$: determinant of $\Sigma$

Model
![W9-MGD-MODEL](Plots/W9-MGD-MODEL.png)

### Visualization
#### Change in Variance
![W9-MGD-1](Plots/W9-MGD-1.png)

#### Change in Mean
![W9-MGD-2](Plots/W9-MGD-2.png)

### Applying Multivariate Gaussian Distribution in Anomaly Detection
![W9-MGD-ALGO](Plots/W9-MGD-ALGO.png)

#### Relationship to the original model
![W9-MGD-ORIGIN](Plots/W9-MGD-ORIGIN.png)

#### When to use which
![W9-COMPARE](Plots/W9-COMPARE.png)