## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
#your code here
def __iter__(self):
    return TimeSeriesIterator(self._values)
    
def itertimes(self):
    return TimeSeriesIterator(self._times)
    
def itervalues(self):
    return TimeSeriesIterator(self._values)
    
def iteritems(self):
    combined = []
    for i in range(len(self._times)):
        combined.append((self._times[i], self._values[i]))
    return TimeSeriesIterator(combined)

## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [1]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [2]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [3]:
g = make_data(5, 10)
list(g)

[999999995.3587719,
 1000000000.8490751,
 1000000003.4852787,
 999999996.5779055,
 1000000002.4041067,
 999999999.9922206,
 999999992.2797099,
 1000000001.7151784,
 999999999.4278566,
 1000000002.9561822,
 1000000002.9114865]

In [4]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[999999999.7560126,
 999999999.7208486,
 1000000000.140342,
 1000000000.1381993,
 999999999.790766,
 999999999.3783386,
 999999999.5233775,
 999999999.8303552,
 1000000000.190936,
 1000000000.4402493,
 1000000000.325816,
 1000000000.2037178,
 1000000000.2127202,
 1000000000.1694349,
 999999999.8855081,
 999999999.9587972,
 999999999.8916953,
 999999999.8613325,
 999999999.8138906,
 999999999.8347207,
 999999999.8242164,
 999999999.90628,
 999999999.913106,
 999999999.9366381,
 999999999.9703214,
 1000000000.1346506,
 1000000000.1497025,
 1000000000.1980342,
 1000000000.3731648,
 1000000000.3296163,
 1000000000.2629056,
 1000000000.2516718,
 1000000000.0221694,
 1000000000.2281001,
 1000000000.2782844,
 1000000000.2195377,
 1000000000.2980946,
 1000000000.4245039,
 1000000000.3835912,
 1000000000.3379253,
 1000000000.2913947,
 1000000000.2197529,
 1000000000.2129284,
 1000000000.146129,
 1000000000.2018661,
 1000000000.1615082,
 1000000000.2648426,
 1000000000.2758418,
 1000000000.29414

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [12]:
import math
# your code here
def online_mean_dev(iterator):
    n = 0
    stddev = 0
    mu = 0
    for value in iterator:
        n += 1
        prev_mu = mu
        delta = value - mu
        mu = mu + delta/n
        
        if n > 1:
            dev_accum = stddev + (value - prev_mu) * (value - mu)
            stddev = math.sqrt(dev_accum/(n-1))
            yield (n, value, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [13]:
data_with_stats = online_mean_dev(make_data(5, 100000))

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [61]:
#your code here
def is_ok(level, t):
    value = t[1]
    mu = t[2]
    stddev = t[3]
    levelsigma = level - stddev
    if levelsigma < 0:
        levelsigma = 0
    lower = mu - levelsigma
    upper = mu + levelsigma
    if (value <= upper) and (value >= lower):
        yield True
    yield False

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [62]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [63]:
list(anomalies)#materialize

[]

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).