## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
import math

## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [2]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [3]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [4]:
g = make_data(5, 10)
list(g)

[1000000000.1651973,
 999999996.7757941,
 999999997.258189,
 1000000000.3176253,
 999999996.5975062,
 1000000000.574569,
 1000000000.080098,
 999999997.6280578,
 999999996.7938086,
 999999998.112668,
 999999999.182761]

In [5]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[999999999.9603157,
 999999997.8893621,
 999999998.893588,
 1000000000.6896414,
 1000000000.5568396,
 1000000001.0931848,
 1000000000.9773989,
 1000000001.2357879,
 1000000001.084495,
 1000000000.9022137,
 1000000000.6835046,
 1000000001.0621428,
 1000000001.0125499,
 1000000000.7525662,
 1000000000.953211,
 1000000001.2077235,
 1000000001.1367694,
 1000000001.1712629,
 1000000001.2522962,
 1000000001.0626787,
 1000000000.8445464,
 1000000000.7068657,
 1000000000.669386,
 1000000000.5399154,
 1000000000.4595795,
 1000000000.5181054,
 1000000000.6351154,
 1000000000.61227,
 1000000000.6649035,
 1000000000.6608326,
 1000000000.6339375,
 1000000000.6622217,
 1000000000.6410437,
 1000000000.4901657,
 1000000000.7092863,
 1000000000.6843334,
 1000000000.6679525,
 1000000000.6604654,
 1000000000.4844692,
 1000000000.5208037,
 1000000000.531698,
 1000000000.5815014,
 1000000000.5063007,
 1000000000.450988,
 1000000000.2205423,
 1000000000.322192,
 1000000000.4101021,
 1000000000.4850068,
 100

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [6]:
def online_mean_dev(iterator):
    var = 0
    mu = 0
    n = 0
    for value in iterator:
        n += 1
        mu_n = mu + (value - mu)/n
        var +=  (value - mu)*(value - mu_n)
        mu = mu_n

        if n > 1:
            stddev = math.sqrt(var/(n-1))
            yield (n, value, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [7]:
data_with_stats = online_mean_dev(make_data(5, 100000))

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [8]:
def is_ok(level, t):
    n,value,mu,stdev = t
    
    return abs((value - mu)/stdev) < level

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [9]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [10]:
list(anomalies)#materialize

[(4902, 1000000016.2932473, 1000000000.0333636, 2.85510038949931),
 (5379, 999999985.439479, 1000000000.0211384, 2.870248415107049),
 (6469, 1000000014.5391358, 1000000000.0340708, 2.8619185741441764),
 (7641, 1000000016.455661, 1000000000.0389076, 2.8726261309647114),
 (23440, 999999983.0382384, 999999999.9936624, 2.8970535392453427),
 (26862, 1000000015.509879, 999999999.9880424, 2.8890868150181213),
 (29884, 999999984.8960956, 999999999.9869593, 2.903407526712833),
 (30160, 999999985.1752106, 999999999.9849819, 2.903111242969281),
 (30662, 1000000015.6644803, 999999999.9868735, 2.906454949375207),
 (37206, 999999981.5520903, 999999999.9876082, 2.9087465060107682),
 (38828, 999999982.3285216, 999999999.9881566, 2.908474771907184),
 (40645, 999999983.5374833, 999999999.9905467, 2.903150329138423),
 (44989, 999999982.8132724, 999999999.9943061, 2.907147367157982),
 (45430, 1000000017.8590502, 999999999.9936498, 2.908451635172009),
 (45652, 999999983.611526, 999999999.9930124, 2.9098901

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).