flag example if p(X_test) < epsilon

## 
- **Density Estimation** - a model to estimate the probability p(x) = p(x1) * p(x2)
    - assumes statistically independent but should work fine even if they are dependent

## Anomaly Detection Algorithm
1. Choose n features zi that you think may be indicative of anomalous examples
2. Fit parameters μ 1...n and σ^2 1...n
    - μ_j = 1/m * sum(x)
    - σ^2 = 1.m * sum(x - μj)
3. Given a new example x, compute p(x) = 
4. Flag as anomaly if p(x) < epsilon

- very useful to have a small number of anomolous examples to create a cross validation set and test set
- train algorithm on training set (fit gaussian ditribution to these examples)
- on cross validaiton set, can see how many are correctly flagged - can tune epsilon to be set higher or lower depnding on rate of false positives or false negatives

## Algorithm Evaluation
- fit model p(x) on training set
- on a cross validation set, predict y = 1 if p(x) < epsilon, or y = 0 if p(x) >= epsilon
- possible metrics
    - true positive, false positive, false negative, true negative
    - precision/recall
    - F1 score

## Anomoly Detection vs Supervised Learning

**Anomoly Detection**
- very small number of positive examples (0-20), large number of negative examples
- many "types" of abomalies
- future anomalies may look nothing like previous anomolies

**Supervised Learning**
- large number of positive and negative examples
- future positive examples likely to be similar to ones in training set

**Features**
- ideally transform features to be gaussian
    - log(x1), log(x1+c), sqrt(x1) could all be potential transofmrations
    - use a histogram to see shape
- common problem: p(x) is comparable for normal ad anomalous examples  (i.e large for both)
    - try to identify new feature that makes it distinct/different from other examples

## Practice Example

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [3]:
# Function for estimating mean and variance of each feature
def estimate_gaussian(X):
    m, n = X.shape

    mu = np.mean(X, axis = 0)
    var = sum((X-mu) ** 2) / X

    return mu, var


* `select_threshold` - a loop that will try many different values of $\varepsilon$ and select the best $\varepsilon$ based on the $F_1$ score. 

* calculate the F1 score from choosing `epsilon` as the threshold and place the value in `F1`. 

  * Recall that if an example $x$ has a low probability $p(x) < \varepsilon$, then it is classified as an anomaly. 
        
  * Then, you can compute precision and recall by: 
   $$
   prec = \frac{tp}{tp+fp}
   \\

   rec = \frac{tp}{tp+fn}$$ 
   
   where
    * $tp$ is the number of true positives: the ground truth label says it’s an anomaly and our algorithm correctly classified it as an anomaly.
    * $fp$ is the number of false positives: the ground truth label says it’s not an anomaly, but our algorithm incorrectly classified it as an anomaly.
    * $fn$ is the number of false negatives: the ground truth label says it’s an anomaly, but our algorithm incorrectly classified it as not being anomalous.

  * The $F_1$ score is computed using precision ($prec$) and recall ($rec$) as follows:
    $$F_1 = \frac{2\cdot prec \cdot rec}{prec + rec}$$ 

In [None]:
# function for finding best threshold to use for selecting outliers based on results from validation set
# uses F1 score

def select_threshold(y_val, p_val)
