## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [36]:
# done in cs207project repo


## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [37]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [38]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [39]:
g = make_data(5, 10)
list(g)

[999999997.730158,
 999999998.9958848,
 1000000001.8079658,
 999999999.5824769,
 999999999.6938463,
 1000000001.5728308,
 999999997.4811376,
 1000000004.7623678,
 1000000008.146267,
 1000000000.3948588,
 999999999.2613391]

In [40]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[999999998.9834534,
 999999999.3807309,
 1000000001.1604174,
 1000000001.5517755,
 1000000000.4816217,
 999999998.7852046,
 999999998.3871565,
 999999998.6936857,
 999999998.54088,
 999999998.5693561,
 999999998.7344073,
 999999998.9176939,
 999999998.960564,
 999999999.0153313,
 999999999.0272496,
 999999999.8509332,
 999999999.9168798,
 999999999.8170762,
 999999999.6946275,
 999999999.5316718,
 999999999.4463699,
 999999999.6252806,
 999999999.699519,
 999999999.6416403,
 999999999.6101433,
 999999999.6492956,
 999999999.6079557,
 999999999.6337146,
 999999999.6039177,
 999999999.3120049,
 999999999.3946655,
 999999999.3466591,
 999999999.4649135,
 999999999.4760742,
 999999999.4752284,
 999999999.534982,
 999999999.5409931,
 999999999.5684206,
 999999999.4155502,
 999999999.4007127,
 999999999.4539622,
 999999999.4551326,
 999999999.4688388,
 999999999.4852523,
 999999999.5186437,
 999999999.6642011,
 999999999.6720303,
 999999999.682789,
 999999999.6711762,
 999999999.769932,
 999

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [41]:
import math
def online_mean_dev(iterator):
    n = 1
    value = next(iterator)
    dev_accum = 0
    mu = value
    while True:
        if n > 1:
            stddev = math.sqrt(dev_accum / (n - 1))
            yield (n, value, mu, stddev)
        else:
            stddev = 0
            yield (n, value, mu, stddev)

        # progress variance & mean
        n = n + 1
        value = next(iterator)
        mu_old = mu
        mu = mu + (value - mu) / n
        dev_accum = dev_accum + (value - mu_old) * (value - mu)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [42]:
data_with_stats = online_mean_dev(make_data(5, 100000))

# for test purposes
#data_with_stats = online_mean_dev(make_data(5, 10))
#list(data_with_stats)

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [43]:
#your code here
def is_ok(level, t):
    return t[2] - level * t[3] < t[1] and t[1] < t[2] + level * t[3]

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [44]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [45]:
list(anomalies)#materialize

[(1, 999999997.9096532, 999999997.9096532, 0),
 (1962, 999999984.3391823, 999999999.9753909, 2.720234302630168),
 (5935, 999999985.6089971, 1000000000.0444479, 2.832036563298437),
 (7475, 1000000018.4895133, 1000000000.0126005, 2.8356922857835936),
 (7686, 999999984.1703161, 1000000000.0002867, 2.842863346405057),
 (8779, 1000000014.8020021, 999999999.9969289, 2.8455561435400374),
 (12767, 1000000014.4882789, 999999999.9889926, 2.855348476306259),
 (22310, 1000000014.8633506, 999999999.9813097, 2.863749815026283),
 (23785, 999999984.8625108, 999999999.9858068, 2.8692910278375514),
 (25490, 1000000018.0251814, 999999999.9897361, 2.871823207626829),
 (25829, 999999984.1552175, 999999999.9861963, 2.873551193509647),
 (27753, 1000000015.9843408, 999999999.984603, 2.877108478128618),
 (29650, 1000000014.5793669, 999999999.9875549, 2.878720624720557),
 (32385, 1000000015.0052277, 999999999.9872433, 2.884341314103084),
 (33662, 1000000015.5476899, 999999999.9872378, 2.8846696185366643),
 (343

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).