# Improving on Cheby-Shev anomaly detection using robust estimators.

As discussed at the end of the notebook titled [*Using Chebyshev's inequality to detect accumulated precipitation data anomalies.*](https://github.com/mesowx/precip_anomaly_detection/blob/master/precip_anomaly_detection_chebyshev.ipynb) we are now going to look at the same process, but using robust estimators instead of the usual sample estimators. We will briefly summarize what these estimators are and how they are defined and then we will look at using them in our Cheby computation.

## Estimators
An estimator is loosely defined as a function of the data that estimates a statistic of the data. The easiest example is sample mean $\dfrac{1}{n}\sum^{n}_{i=1}X_{i}$ which estimates $\mu$ the mean based off the data $X_i$. A robust estimator resists the bias introduced by outliers. The median is an example of a robust mean. 

For our calculations we will use the median to estimate the expectation of our groups and to estimate the variance we will use the median absolute deviation and square it to estimate the variance. The median absolute deviation or MAD is defined as $\text{median}\left(|x_1 - \text{median}(X)|, |x_2 - \text{median}\left(X\right)|, \dots, |x_n - \text{median}(X)|\right)$.

## Using robust estimators
We will go through the same examples as before in [*Using Chebyshev's inequality to detect accumulated precipitation data anomalies.*](https://github.com/mesowx/precip_anomaly_detection/blob/master/precip_anomaly_detection_chebyshev.ipynb) Except that we will use our new robust estimators instead of the usual ones. I will run both computations side by side so that we can see the differences.


In [6]:
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import numpy as np
import os.path
import requests
import json
import pandas as pd
from statsmodels import robust
from datetime import datetime
from pandas.io.json import json_normalize
%run ./functions.ipynb

In [7]:
def _plot_(x):
    plt.hist(x,bins=15)
    plt.xlabel('Precip')
    plt.ylabel('Count')
    plt.title('Precip frequency')
    plt.show()
    plt.plot(x, '-o', label='Precip')
    med = np.median(x)
    std = robust.mad(x)
    plt.hlines(med, xmin=0, xmax=x.shape[0]-1, label='Median')
    plt.hlines([med+std, med-std], xmin=0, xmax=x.shape[0]-1, 
               linestyles='dashed', label='1 Std')
    plt.xlabel('Index')
    plt.ylabel('Precip')
    plt.legend()
    plt.show()

TODO: Copy and modify computations from other notebook
