# Anomaly Detection

When to use anomaly detection vs. when to use supervised learning:

- **Anomaly detection:**
    - Very few positive examples (y = 1) (0-20 is common), with a large number of negative examples (y = 0).
    - If there are many different "types" of anomalies. It's challenging for any algorithm to learn from positive examples what the anomalies look like; future anomalies may bear no resemblance to any of the anomalous examples encountered thus far.
    - Examples: 
        - Fraud detection.
        - Manufacturing: finding new previously unseen defects in manufacturing.
        - Monitoring machines in a data center (different types of attacks or malfunctions).

- **Supervised learning:**
    - Large number of positive and negative examples.
    - Sufficient positive examples for the algorithm to grasp what positive examples entail; future positive examples are likely to resemble those in the training set.
    - Examples: 
        - Email spam classification.
        - Manufacturing: finding known previously seen defects.
        - Weather prediction (sunny/rainy/etc.).
        - Diseases classification.


## Librarys

In [1]:
import numpy as np
import matplotlib.pyplot as plt

### Gaussian distribution

To perform anomaly detection, you will first need to fit a model to the data’s distribution.

* Given a training set $\{x^{(1)}, ..., x^{(m)}\}$ you want to estimate the Gaussian distribution for each of the features $x_i$. 

* Recall that the Gaussian distribution is given by

   $$ p(x ; \mu,\sigma ^2) = \frac{1}{\sqrt{2 \pi \sigma ^2}}\exp^{ - \frac{(x - \mu)^2}{2 \sigma ^2} }$$

   where $\mu$ is the mean and $\sigma^2$ is the variance.
   
* For each feature $i = 1\ldots n$, you need to find parameters $\mu_i$ and $\sigma_i^2$ that fit the data in the $i$-th dimension $\{x_i^{(1)}, ..., x_i^{(m)}\}$ (the $i$-th dimension of each example).

To estimate the parameters, ($\mu_i$, $\sigma_i^2$), of the $i$-th feature by using the following equations. To estimate the mean, you will use:

$$\mu_i = \frac{1}{m} \sum_{j=1}^m x_i^{(j)}$$

and for the variance you will use:
$$\sigma_i^2 = \frac{1}{m} \sum_{j=1}^m (x_i^{(j)} - \mu_i)^2$$


In [3]:
def estimate_gaussian(X): 
    """
    Calculates mean and variance of all features 
    in the dataset
    
    Args:
        X (ndarray): (m, n) Data matrix
    
    Returns:
        mu (ndarray): (n,) Mean of all features
        var (ndarray): (n,) Variance of all features
    """

    m, n = X.shape
    
    mu = np.sum(X, axis=0) / m
    var = np.sum((X - mu) ** 2, axis=0) / m
    
    return mu, var