In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 2 * matplotlib.rcParams['savefig.dpi']

# Generators and Coroutines

These functions allow for increased abstraction in python.  While a little abstract, they allow for more readable code overall by separating concerns.  These two concepts revolve around different uses of the `yield` operator.

## Generators
Generators are a type of iterator.  Benefits:
1. They are more powerful than just using `map` and `filter` because they allow you to hold state in between processing entries.  They are like `reduce` but much easier to use, which makes them powerful.
1. They allow you to hold data in an "inner" context without needing to resort to creating a `class`.  This can be faster since `self.foo` is actually pretty slow in python.
1. **Gotcha**: the generator is not run until you first call `.next`, which can be a bit counterintuitive ...

In [None]:
def Countdown(n):
    print "Counting down ..."
    while n > 0:
        yield n
        n -= 1
        
c = Countdown(5)
print "Set up Countdown"
for i in c:
    print i

Generators actually rely on the `.next()` mechanism used by iterators.  They throw an exception after reaching their end.

In [None]:
c = Countdown(3)
print c.next()
print c.next()
print c.next()
# This throws an exception (which is caught by the for loop)
# print c.next()

## Generator "pipelines"

In particular, we're going to create this generator

```
source_gen -> and_plus_one_gen -> sum_gen
```

and chain them together.  Note that for each generator input, we can yield none, one, or multiple outputs.

1. **Source:** pushes values using `yield`.
2. **Intermediate Step:** both requests previous values (`.next`) and pushes them using `yield`
3. **Sink:** iterates through previous values using `.next`.

**Question:** why is this better than dealing with a list?

In [None]:
def source_gen(n):
    for i in xrange(n):
        yield i
        
def and_plus_one_gen(gen):
    for i in gen:
        yield i
        yield i + 1
        
def sum_gen(gen):
    return sum(i for i in gen)

gen1 = source_gen(10)
gen2 = and_plus_one_gen(gen1)
result = sum_gen(gen2)

print result

## Iterator comprehensions

Like list comprehensions, python also supports generator comprehensions.  They use parentheses `()` instead of brackets `[]`.  While concise, this can only do `map` and `filter`-like things.

In [None]:
sum(j for i in xrange(10) for j in (i, i+1))

### Not all generators can be written as iterator comprehensions

It might seem from the above example that all generators can be written as generator expressions.  This is not true.  Generator expressions cannot keep track of state in between processing elements, generators can.  In the following example, the `total` variable holds state between generator iterations.

In [None]:
## The following generator cannot be translated as a generator comprehension.

def and_total_gen(gen):
    total = 0
    for i in gen:
        yield i
        total += i
        yield total

## Time complexity

Becuase they don't have to construct an entire list, iterators are much faster. Generator comprehensions will be faster than list comprehensions. They are also much more memory efficient (typically `O(1)` rather than `O(n)`).

In [None]:
%%timeit -n3

gen1 = xrange(int(1e6))
gen2 = (j for i in gen1 for j in (i, i+1))
sum(gen2)

In [None]:
%%timeit -n3

list1 = range(int(1e6))
list2 = [j for i in list1 for j in (i, i+1)]
sum(list2)

## Itertools in Python

Manipulating iterators requires a little more care than before.  For example, `range`, `map`, `filter`, all have their iterator equivalents for `xrange`, `imap`, `ifilter`.

In [None]:
from itertools import count, islice, chain, tee, ifilter, takewhile, dropwhile, combinations

print "slicing count", list(islice(count(), 10))
print "chaining two iterators", list(chain(xrange(10), xrange(10)))

it = xrange(10)
it1, it2 = tee(it, 2)
print "it1", list(it1)  # why is this dangerous?
print "it2", list(it2)
print "it1", list(it1)

print "ifilter", list(ifilter(lambda x: x < 'C', 'ABCDABCD'))
print "takewhile", list(takewhile(lambda x: x < 'C', 'ABCDABCD'))
print "dropwhile", list(dropwhile(lambda x: x < 'C', 'ABCDABCD'))

print "combinations", list(combinations(xrange(4), 2))

In [None]:
from itertools import izip

it = xrange(10)
it1, it2 = tee(it, 2)
it2.next()
list(izip(it1, it2))

### Exercises
1. How do you group an iterator pairwise?  That is, `s -> (s0,s1), (s1,s2), (s2, s3), ...`?  This is useful in a time series for monitoring the "derivative" with respect to time.  How do you do this for general triple-wise, quadruple-wise etc ...?
1. How do you find a powerset?  That is, given an iterator, return all possible subsets?
1. Inspect the i-th lookahead value?

## Coroutines

Coroutines are the "dual" of generators.  Generatros return data when called with `.next`.  Coroutines take data sent to them via `.send`.  But there's a **Gotcha**: you need to call `.send(None)` to start the coroutine:

In [None]:
def grep(pattern):
    print "Looking for %s" % pattern
    while True:
        line = yield
        if pattern in line:
            print line
            
g = grep("Python")
g.send(None)  # must be "primed"
g.send("Python is great!")
g.send("Java is OK")
g.send("particularly Python generators")

No one can remember to "prime" coroutines so let's just write a wrapper to do so `.send(None)`.

In [None]:
def coroutine(func):
    def start(*args,**kwargs):
        cr = func(*args,**kwargs)
        cr.send(None)
        return cr
    return start

# syntactic sugar for grep = coroutine(grep)
@coroutine
def grep(pattern):
    print "Looking for %s" % pattern
    while True:
        line = yield
        if pattern in line:
            print line
            
g = grep("Python")
g.send("Python is great!")
g.send("particularly Python generators")

In [None]:
@coroutine
def print_cr():
    try:
        while True:
            x = yield
            print x
    except GeneratorExit:
        print "Done"

x = print_cr()
x.send(1)
x.send(2)
x.close()

## Coroutine "pipelines"

This is the same pipeline as before, except that instead of "pulling" values from the previous generator via `.next`, it "pushes" values to the next generator via `.send`.

```
source -> and_plus_one_cr -> sum_cr
```

The 3 steps are:

1. **Source:** pushes values using `send`.
2. **Intermediate Step:** both requests values using `yield` and pushes them using `send`
3. **Sink:** pulls values using `yield` and prints them out.

In [None]:
def source_cr(n, cr):
    for i in xrange(n):
        cr.send(i)
    cr.close()
        
@coroutine
def and_plus_one_cr(cr):
    try:
        while True:
            i = yield
            cr.send(i)
            cr.send(i+1)
    except GeneratorExit:
        cr.close()
        
@coroutine
def sum_cr():
    total = 0
    try:
        while True:
            total += yield
    except GeneratorExit:
        print total

cr1 = sum_cr()
cr2 = and_plus_one_cr(cr1)
source_cr(10, cr2)

## Broadcasting

With coroutines, we want to broadcast data to multiple sources.  For example, let's say we want to print numbers that are odd and divisible by 5.  Let's write a simple coroutine to do this.  The architecture is as follows

```
source -> broadcast() ---> divisible_cr(5) -> print_cr()
                      \
                        -> divisible_cr(2) -> print_cr()
```

**Exercise:** how would you create this architecture?

```
source -> broadcast() ---> divisible_cr(5) --+--> print_cr()
                      \                     /
                        -> divisible_cr(2) -
```

"Pushing" data using coroutines allows you to build more complex data pipelines than "pulling" data.

In [None]:
def source(n, cr):
    for i in xrange(n):
        cr.send(i)

@coroutine
def broadcast(*crs):
    while True:
        i = yield
        for cr in crs:
            cr.send(i)
    
@coroutine
def divisible_cr(n, cr):
    while True:
        i = yield
        if (i % n) == 0:
            cr.send(i)
         
@coroutine
def print_cr():
    while True:
        print (yield)
    
source(10,
    broadcast(
        divisible_cr(5, print_cr()),
        divisible_cr(2, print_cr()),
    )
)

## Coroutines as classes

For example, they can often replace classes.  It's many fewer lines of code because the constructor and destructor code is grouped together.

In [None]:
import datetime
import numpy as np

class Timer1:
    def __init__(self):
        pass

    def __enter__(self):
        self.t1 = datetime.datetime.now()

    def __exit__(self, exc_type, exc_value, traceback):
        # may also get error handling if an error occured
        self.t2 = datetime.datetime.now()
        print "Seconds elapsed: {}\n".format((self.t2 - self.t1).total_seconds())
            
with Timer1():
    x = np.arange(1000)
    x + x

In [None]:
from contextlib import contextmanager
import datetime
import numpy as np

@contextmanager
def Timer2():
    t1 = datetime.datetime.now()
    yield
    t2 = datetime.datetime.now()
    print "Seconds elapsed: {}\n".format((t2 - t1).total_seconds())
    
with Timer2():
    x = np.arange(1000)
    x + x

**Exercise:** implement the decorator `contextmanager` using function decorators, a `class` that implements `__enter__` and `__exit__` and coroutines.

### A nifty example of a stats coroutine

We can also pass values to and get values from coroutines.  The `stats_cr` below computes the mean and standard deviation of the values sent to it.

In [None]:
import math

@coroutine
def stats_cr():
    m0 = 0
    m1 = 0.
    m2 = 0.
    while True:
        if m0 > 0:
            x = yield (m1 / m0), math.sqrt(m2 / m0 - (m1 / m0) * (m1 / m0))
        else:
            x = yield None, None
        m0 += 1
        m1 += x
        m2 += x * x
        
scr=stats_cr()
print scr.send(1)
print scr.send(2)
print scr.send(3)

*Copyright &copy; 2015 The Data Incubator.  All rights reserved.*