## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
#your code here
def __iter__(self):
    return TimeSeriesIterator(self._values)
    
def itertimes(self):
    return TimeSeriesIterator(self._times)
    
def itervalues(self):
    return TimeSeriesIterator(self._values)
    
def iteritems(self):
    combined = []
    for i in range(len(self._times)):
        combined.append((self._times[i], self._values[i]))
    return TimeSeriesIterator(combined)

## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [2]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [3]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [4]:
g = make_data(5, 10)
list(g)

[1000000003.4877927,
 999999996.2085866,
 999999999.6720818,
 1000000000.4171668,
 1000000001.2171743,
 1000000002.7110213,
 1000000000.3507746,
 999999999.5313233,
 1000000001.3035794,
 1000000000.5783994,
 999999993.892708]

In [5]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[1000000000.3202308,
 1000000002.8950593,
 1000000003.1082267,
 1000000002.8666036,
 1000000002.4831146,
 1000000002.3112051,
 1000000002.1470407,
 1000000001.9044988,
 1000000001.5904995,
 1000000000.9491882,
 1000000000.7588848,
 1000000000.8445706,
 1000000000.8802992,
 1000000000.8521402,
 1000000001.0582367,
 1000000001.0109584,
 1000000000.8383832,
 1000000000.7369761,
 1000000000.8033888,
 1000000000.7447013,
 1000000000.7068086,
 1000000000.8120801,
 1000000000.6836,
 1000000000.9931349,
 1000000000.5766698,
 1000000000.713647,
 1000000000.7173547,
 1000000000.6804718,
 1000000000.6397326,
 1000000000.6484588,
 1000000000.6424934,
 1000000000.6091831,
 1000000000.5844717,
 1000000000.6034403,
 1000000000.6611179,
 1000000000.6286106,
 1000000000.5995481,
 1000000000.6862773,
 1000000000.7410644,
 1000000000.7551719,
 1000000000.7445625,
 1000000000.7047147,
 1000000000.730354,
 1000000000.7152361,
 1000000000.7054473,
 1000000000.7722892,
 1000000000.6836604,
 1000000000.718364

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [6]:
# your code here
import math
# your code here
def online_mean_dev(iterator):
    n = 0
    stddev = 0
    mu = 0
    for value in iterator:
        n += 1
        prev_mu = mu
        delta = value - mu
        mu = mu + delta/n
        
        if n > 1:
            dev_accum = stddev + (value - prev_mu) * (value - mu)
            stddev = math.sqrt(dev_accum/(n-1))
            yield (n, value, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [7]:
data_with_stats = online_mean_dev(make_data(5, 100000))

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [8]:
#your code here
def is_ok(level, t):
    value = t[1]
    mu = t[2]
    stddev = t[3]
    levelsigma = level - stddev
    lower = mu - levelsigma
    upper = mu + levelsigma
    if (value <= upper) and (value >= lower):
        yield True
    yield False


We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [9]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [164]:
list(anomalies)#materialize

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).