## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
#your code here
import numpy as np
from lazy import *

class TimeSeries():
    '''
    """
Help on package TimeSeries:
NAME
    TimeSeries
DESCRIPTION
    TimeSeries
    =====
    
    Provides
      1. An sequence or any iterable objects
    
    How to use the documentation
    ----------------------------
    Documentation is available in two forms: docstrings provided
    with the code, and a loose standing reference guide, available from
    `the TimeSeries homepage <https://github.com/cs207-project>`_.
    
    We recommend exploring the docstrings using
    `IPython <http://ipython.scipy.org>`_, an advanced Python shell with
    TAB-completion and introspection capabilities.  See below for further
    instructions.
    
    The docstring examples assume that `numpy` has been imported as `np`::  
      
    
    
     |  Methods inherited from builtins.RuntimeWarning:
     |  
     |  __init__(self, *args, **kwargs)
     |      Initialize self.  See help(type(self)) for accurate signature.
     |      Stors a TimeSeries in self.TimeSeries_
     |    
     |  __repr__(self, /)
     |      Return a printable sequence shown in python list format containing all values in [self].
     |  
     |  __str__(self, /)
     |      Return a printable abbreviated sequence of maximum first 100 entrees.
     |  
     |  __getitem__(self, index)
     |      Return self[index]
     |
     |  __setitem__(self, index, values)
     |      Set self[index] = values
     |
     |  __len__(self)
     |      Return len(self.TimeSeries_)
     '''
    def __init__(self, times, values):
        if (iter(times) and iter(values)):
            # reorder according to Time step
            idx = np.argsort(times)
            times = np.array(times)[idx]
            values = np.array(values)[idx]

            self._TimeSeries=np.vstack((times,values))
            self._vindex = 0
            self._values = self._TimeSeries[1]
            self._times = self._TimeSeries[0]
    
    @property
    @lazy
    def lazy(self):
        return self

    def itervalues(self):
        for v in self._values:
            yield v

    def itertimes(self):
        for t in self._times:
            yield t

    def iteritems(self):
        for t,v in zip(self._times,self._values):
            yield (t,v)
            
    def __len__(self):
        return len(self._TimeSeries[0])
    
    def __contains__(self, time):
        index = np.where(self._TimeSeries[0]==time)
        return index[0].size>0
            
    
    def __getitem__(self,time):
        if (time in self):
            index = np.where(self._TimeSeries[0]==time)
            return self._TimeSeries[1][index]
        else:
            print ("no time point at t={0}".format(time))

    def __setitem__(self,time,value):
        if (time in self):
            index = np.where(self._TimeSeries[0]==time)
            self._TimeSeries[1][index]=value
        else:
            print ("no time point at t={0}".format(time))
            
    def __iter__(self):
        return iter(self._TimeSeries[1])
    
    def __repr__(self):
        return "%r"%(self._TimeSeries)
    
    def __str__(self):
        className = type(self).__name__
        if len(self)>100:
            return "%s" %('['+(str(self._values[:99]))[1:-1]+'...'+']')
        else:
            return "%s" %(self._TimeSeries)
        
    def __eq__(self, other):
        return np.array_equal(self._TimeSeries, other._TimeSeries)
        
    def values(self):
        return self._values
    
    def times(self):
        return self._times
    
    def mean(self):        
        if(len(self._values) == 0):
            raise ValueError("cant calculate mean of length 0 list")
        return np.mean(self._values)
    
    def median(self):
        if(len(self._values) == 0):
            raise ValueError("cant calculate median of length 0 list")
        return np.median(self._values)
    
    def interpolate(self, times):
        new_values = []
        for time in times:
            if time > self._times[-1]: # over the rightest boundary
                new_values.append(self._values[-1])
            elif time < self._times[0]: # over the leftest boundary
                new_values.append(self._values[0])
            elif time in self._times:
                new_values.append(self.__getitem__(time))
            else : #within boundary
                for i in range(len(self._times)):
                    if self._times[i] > time:
                        left_value = self._values[i-1]
                        right_value = self._values[i]
                        left_time = self._times[i-1]
                        right_time = self._times[i]
                        #interpolate
                        new_values.append(left_value + (right_value - left_value)/(right_time - left_time)*(time - left_time))
                        break
        return TimeSeries(times, new_values)

## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [1]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [2]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [3]:
g = make_data(5, 10)
list(g)

[1000000000.2255981,
 999999999.37897,
 999999992.9970627,
 999999991.7621284,
 1000000001.2656105,
 999999999.7760308,
 1000000001.6451735,
 999999999.5638437,
 1000000007.9584906,
 999999994.1965178,
 1000000000.7945606]

In [4]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[1000000003.4781332,
 1000000001.6274164,
 1000000001.0271283,
 1000000001.0488335,
 1000000001.3686892,
 1000000001.4555401,
 1000000000.3542231,
 1000000000.6765134,
 1000000001.1047043,
 1000000001.2982854,
 1000000000.9764346,
 1000000000.7741145,
 1000000000.700361,
 1000000000.8290153,
 1000000000.7494364,
 1000000000.7504617,
 1000000000.6936609,
 1000000000.6431705,
 1000000000.5417968,
 1000000000.4897432,
 1000000000.2909225,
 1000000000.0284007,
 999999999.7729641,
 999999999.7641338,
 999999999.7608366,
 999999999.7629703,
 999999999.7641644,
 999999999.8633716,
 999999999.8164307,
 999999999.7704891,
 999999999.7416564,
 999999999.7449793,
 999999999.6530576,
 999999999.6643834,
 999999999.63237,
 999999999.6924183,
 999999999.6445591,
 999999999.5563319,
 999999999.5579025,
 999999999.5594038,
 999999999.6320161,
 999999999.7348516,
 999999999.754019,
 999999999.7981224,
 999999999.8011156,
 999999999.8591453,
 999999999.803035,
 999999999.7824304,
 999999999.8087429,
 99

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [32]:
# your code here

import math
def online_mean_dev(iterator):
    n = 0
    dev_accum = 0
    mu = 0
    for value in iterator:
        n+=1
        if n > 1:
            mu_temp = mu + (value - mu)/n
            dev_accum = dev_accum + (value-mu)*(value - mu_temp)
            mu = mu_temp

            stddev = math.sqrt(dev_accum/(n-1))
            yield (n, value, mu, stddev)
        else:
            mu = value;
    

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [47]:
data_with_stats = online_mean_dev(make_data(5, 100000))

In [43]:
list(data_with_stats)

[(2, 999999993.7870499, 999999997.575315, 5.357415895100548),
 (3, 1000000001.3191196, 999999998.8232498, 4.361533771551814),
 (4, 1000000004.3521234, 1000000000.2054682, 4.508225297561671),
 (5, 999999997.3121996, 999999999.6268145, 4.113061163981567),
 (6, 999999996.5702507, 999999999.1173872, 3.8847026496852517),
 (7, 1000000001.0935768, 999999999.3996999, 3.6240399934410825),
 (8, 999999999.171858, 999999999.3712196, 3.356175931949439),
 (9, 1000000002.1301814, 999999999.677771, 3.2713438555730456),
 (10, 999999993.6759163, 999999999.0775855, 3.621441672540621),
 (11, 999999998.8745219, 999999999.0591252, 3.43614674173446),
 (12, 1000000000.914838, 999999999.2137679, 3.319744537841188),
 (13, 999999998.3921783, 999999999.1505687, 3.186571138397324),
 (14, 999999996.1296982, 999999998.9347923, 3.166223833291278),
 (15, 999999999.4729975, 999999998.9706726, 3.0542126369025486),
 (16, 999999999.6861265, 999999999.0153885, 2.956065963630791),
 (17, 999999998.9750835, 999999999.0130177,

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [48]:
#your code here

def is_ok(level,t):
    value = t[1]
    mean = t[2]
    if (level - mean < value) and (value < level + mean):
        return True
    else:
        return False

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [49]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [50]:
list(anomalies)#materialize

[(35, 1000000005.0086892, 999999999.5580734, 2.746342564552243),
 (42, 1000000004.4578348, 999999999.3658597, 2.896708208357214),
 (58, 1000000005.0120345, 999999999.3035905, 2.861771272426479),
 (72, 1000000007.9269682, 999999999.4711957, 2.934698846220757),
 (91, 1000000008.3712596, 999999999.6598611, 2.8396761493769356),
 (106, 1000000006.0402663, 999999999.8719969, 2.830383803227604),
 (135, 1000000006.7283592, 999999999.7840765, 3.0004044286533804),
 (143, 1000000006.8225834, 999999999.8378514, 3.0135747685592245),
 (144, 1000000005.1214381, 999999999.874543, 3.035125710976994),
 (146, 1000000005.519069, 999999999.9147229, 3.050142378843259),
 (193, 1000000004.9867973, 999999999.8172659, 2.9358905174082413),
 (207, 1000000009.0045165, 999999999.8633248, 2.926275261048211),
 (239, 1000000005.4647708, 999999999.9131668, 2.821910495312404),
 (243, 1000000006.4478984, 999999999.9658167, 2.8399593417134046),
 (257, 1000000008.9722062, 999999999.9722664, 2.890516890379518),
 (285, 10000

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).