## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
#your code here
import numpy as np
import reprlib
import numbers

class TimeSeries():
    '''A class for storing temporally-ordered data.'''

    def __init__(self, init_t, init_v):
        '''
        Create a TimeSeries object from reference data.
        args:
          init: a sequence-like object containing pairs of numeric types.
            This is *required*, even if it is an empty list.
            Data can always be added later, but this behavior makes it less likely
            uninitialized TimeSeries objects will get nonsensically passed around.
            The first element of every pair should be monotonically increasing.
        '''
        self._t = np.array(init_t)
        self._v = np.array(init_v)

    def __getitem__(self, t):
        where = self._t==t
        if np.any(where):
            return self._v[where][0]

    def __setitem__(self, t, v):
        where = self._t==t
        if np.any(where):
            self._v[where] = v

    def __contains__(self, v):
        return v in self._v

    def __iter__(self):
        for v in self._v:
            yield v

    def values(self):
        '''Returns just the values of the time series.'''
        return self._v

    def itervalues(self):
        '''Returns an iterator over the values of the time series.'''
        for v in self._v:
            yield v

    def times(self):
        '''Returns just the time points of the time series.'''
        return self._t
    
    def itertimes(self):
        '''Returns an iterator over the time points of the time series.'''
        for t in self._t:
            yield t

    def items(self):
        '''Returns all (value,time) pairs from the time series.'''
        return list(zip(self._v, self._t))
    
    def iteritems(self):
        '''Returns all (value,time) pairs from the time series.'''
        return zip(self._v, self._t)

    def __len__(self):
        return len(self._v)


## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [50]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [157]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [161]:
g = make_data(5, 10)
list(g)

[999999999.7656724,
 1000000000.6274325,
 1000000003.7758981,
 999999996.0674763,
 1000000001.9113333,
 999999999.7434896,
 999999993.2691914,
 999999999.1117319,
 1000000000.9274582,
 1000000002.23507,
 999999999.878448]

In [159]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[999999997.5868304,
 1000000002.0134393,
 1000000001.8733623,
 1000000001.2120469,
 1000000001.1716702,
 1000000000.4667126,
 1000000000.5998436,
 1000000000.4183294,
 1000000000.5597438,
 1000000000.5023081,
 1000000000.4029907,
 1000000000.2789339,
 1000000000.4295208,
 1000000000.1196861,
 1000000000.1279308,
 1000000000.3334513,
 1000000000.3139879,
 1000000000.3254998,
 1000000000.3092072,
 1000000000.2576585,
 1000000000.2815876,
 1000000000.331565,
 1000000000.3070619,
 1000000000.2942008,
 1000000000.2558479,
 1000000000.0969156,
 999999999.8548498,
 999999999.866622,
 999999999.7307874,
 999999999.581027,
 999999999.5903629,
 999999999.6376853,
 999999999.3061813,
 999999999.3098277,
 999999999.3396436,
 999999999.4933454,
 999999999.4945222,
 999999999.504842,
 999999999.5231825,
 999999999.4817737,
 999999999.5027447,
 999999999.4654695,
 999999999.4021502,
 999999999.2523974,
 999999999.1905079,
 999999999.3095661,
 999999999.3507677,
 999999999.2651911,
 999999999.3426483,

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [54]:
# your code here
import math
def online_mean_dev(iterator):
    n = 0
    mu = 0
    dev_accum = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        dev_accum = dev_accum + delta*(value - mu)
        if n > 1:
            stddev = math.sqrt(dev_accum/(n-1))
            yield (n, value, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [162]:
data_with_stats = online_mean_dev(make_data(5, 100000))

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [91]:
#your code here
def is_ok(level, t):
    n, value, mu, sigma = t
    low = mu - level*sigma
    high = mu + level*sigma
    return low < value < high


We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [163]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [164]:
list(anomalies)#materialize

[(1437, 1000000015.3969947, 999999999.9834799, 2.8720472277715783),
 (3539, 1000000015.2153754, 1000000000.0135044, 2.8450915652151894),
 (6479, 999999983.333909, 999999999.9885885, 2.879403009895133),
 (7607, 1000000014.595981, 999999999.99259, 2.8742771376746994),
 (16918, 999999984.7531024, 999999999.9869336, 2.8602778618743057),
 (19333, 1000000015.0314125, 999999999.9911495, 2.8569528008579983),
 (19438, 1000000016.1911799, 999999999.9908891, 2.8579857593666187),
 (21057, 999999984.1167829, 999999999.9905019, 2.8620946233352806),
 (30123, 1000000014.630911, 999999999.9990067, 2.865450987897285),
 (32538, 999999984.0739291, 999999999.9936726, 2.8687768516193763),
 (33989, 1000000015.9683862, 999999999.9954549, 2.8702020095875382),
 (40053, 999999985.2921027, 999999999.9900451, 2.8611434831477935),
 (40579, 1000000020.709818, 999999999.9925895, 2.8648092759234838),
 (44096, 1000000015.3838378, 1000000000.0056177, 2.865237932343893),
 (46328, 999999983.9251215, 1000000000.0111825, 2.

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).