# Functional Programming in Python
## David Mertz
### dmertz@continuum.io
### 2016-04-22

# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
* [Iterables, iterators, and generators](#Iterables,-iterators,-and-generators)
	* [Iterators and iterables](#Iterators-and-iterables)
	* [Generators](#Generators)
		* [Generator expressions](#Generator-expressions)
		* [Generator functions](#Generator-functions)
	* [A digression on the `itertools` module](#A-digression-on-the-itertools-module)
* [Large sequences, even if not quite infinitely long](#Large-sequences,-even-if-not-quite-infinitely-long)
* [Chaining iterables](#Chaining-iterables)
* [Python's nomenclature for virtual sequences](#Python's-nomenclature-for-virtual-sequences)
* [Generator functions](#Generator-functions)
* [The iterator protocol](#The-iterator-protocol)
* [Generators defining iterables](#Generators-defining-iterables)
	* [Generator comprehensions](#Generator-comprehensions)
* [Exercise (factorization)](#Exercise-%28factorization%29)
* [Exercise (creating iterables)](#Exercise-%28creating-iterables%29)

# Learning Objectives:

After completion of this module, learners should be able to:

* construct & use iterators for sequential tasks
* understand the `itertools` module

# Iterables, iterators, and generators

We have seen many examples of loops over general data collections. We have also seen examples of more general objects (e.g., `range`) that can also be looped over. These are all specific examples of *iterables* in Python.

Some reading to extend the discussion here:

* [Maximize your program's laziness (slides)](http://gnosis.cx/publish/Laziness.pdf)
* [Iterables vs. Iterators vs. Generators](http://nvie.com/posts/iterators-vs-generators/) (A little pocket reference on iterables, iterators and generators.)
* [Python and lazy evaluation](http://swizec.com/blog/python-and-lazy-evaluation/swizec/5148)
* [Improve Your Python: `yield` and Generators Explained](https://www.jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/)

In [None]:
for k in [0,1,2,3,4]:
    print(k)

In [None]:
for k in range(5): # Equivalent in output but not in execution
    print(k)

In [None]:
for ch in "String":
    print(ch)

In [None]:
for key in {'a':1, 'b':2, 'c':3}:
    print(key)

## Iterators and iterables

In words, a Python *iterator* is

* usually an object with *state* that remembers where it paused during iteration
* any object with a `__next__` method (or `next` before Python 3) that:
    * returns the next value in the iteration
    * updates the state to point at the next value
    * signals when it is done by raising `StopIteration`
* any object that is *self-iterable* (i.e., it has an `__iter__` method that returns `self`).
* any object for which the builtin function `next` calls the `__next__` method on the object passed to it.

A Python *iterable* is

* any object that can be looped over (e.g., a string, a tuple, a list, a dictionary, a file, etc.)
* any object that can appear on the right-side of a `for` loop (i.e., `for x in object:`)
* any object that can be used within a call to `iter` (i.e., `iter(object)` returns an *iterator*)
* any object that has an `__iter__` method that returns an iterator *or* has a`__getitem__` method that permits indexed lookup.

In [None]:
x = ['a','b','c']
print('x is', type(x))
iter_x = iter(x)
print("Idempotency of iter():", iter(iter_x) is iter_x)
print('iter(x) is', type(iter_x))
print('next(iter_x) = %s' % next(iter_x))
print('next(iter_x) = %s' % next(iter_x))
print('next(iter_x) = %s' % next(iter_x))
next(iter_x)

* Most containers (e.g., `list`, `dict`, `set`, etc.) are iterables.
* A `range` object is iterable but not an iterator.
* An iterator is always an iterable. The opposite is not true.
* Iterables return iterators when `iter` is applied to them. This is what happens when a `for` loop uses an iterable.
* Iterators are consumed as they are used. That is, calling `next` produces results in sequence that cannot be retrieved without instantiating a new iterator.
* The [Python iterator protocol](https://docs.python.org/3/c-api/iter.html) specifies the behavior of an object's `__iter__` and `__next__` methods.

The principal advantage in distinguishing iterators and iterables is the ability to use lazy evaluation to defer generating terms in sequences. For instance, it is possible to loop over the iterable `list(range(10000000))` that explicitly builds the list of ten million elements before looping. But it is also possible to loop over the iterable `range(10000000)` itself which produces the integers in sequence without requiring storage for ten million elements in memory. More abstract iterables (notably files and data streams) can be arbitarily large, so understanding lazy evaluation is extremely useful.

## Generators

The term *generator* is widely but imprecisely used in Python so there is a lot of confusion around this topic (even more so than the confusion around iterators and iterables). There are *generator objects* (that are iterators), *generator functions* (that return generator objects), and *generator expressions* (that evaluate as generator objects and resemble comprehensions). According to the [Python glossary](http://docs.python.org/glossary.html#term-generator), the official terminology is now that *generator* means *"generator function"*. Unfortunately, generator objects still belong to the generator class, so this terminology is still not used consistently.

We will try to be consistent:

* A *generator object* is a special kind of iterator produced either by a *generator expression* or a *generator function*. We will simply call a generator object an iterator (because that is what it is).
* A *generator expression* is a comprehension (usually delimited by parentheses) that produces an iterator.
* A *generator function* is a function that returns an iterator and uses the `yield` keyword (rather than `return`) to pass values back to the calling namespace.

### Generator expressions

The easiest way to construct a generic iterator is to apply the function `iter` to a collection, e.g.,
```python
>>> my_string = 'This is a string'
>>> my_iter = iter(my_string)
```

An alternative is to use a *generator expression*&mdash;basically a comprehension delimited by parentheses.

In [None]:
# Construction of an iterator using the builtin *iter* function
my_string = 'This is a string'
my_iter = iter(my_string)
print('type(my_iter) = %s' % type(my_iter))
print('next(my_iter) = %s' % next(my_iter))
print('next(my_iter) = %s' % next(my_iter))
print('next(my_iter) = %s' % next(my_iter))
print('next(my_iter) = %s' % next(my_iter))
print('next(my_iter) = %s' % next(my_iter))

In [None]:
# Construction of an iterator
new_iter = (char.upper() for char in my_string)
print('type(new_iter) = %s' % type(new_iter))
print('next(new_iter) = %s' % next(new_iter))
print('next(new_iter) = %s' % next(new_iter))
print('next(new_iter) = %s' % next(new_iter))
print('next(new_iter) = %s' % next(new_iter))

In [None]:
# Generator comprehensions are similar to tuples in their syntax.
# The parenthesis are not always needed, e.g. if they occur inside a function call
sum(n**2 for n in range(10))

Here is a generator expression that produces a sequence of perfect squares. The important difference to notice between the list comprehension and the generator expression is that former explicitly produces the list while the latter uses *lazy evaluation* to produce elements as they are required. For large values of N, the generator expression is much faster (especially when we time, say, adding up the terms of the sequence).

In [None]:
N = int(1e8)

In [None]:
%%timeit 
listcomp = [k*k for k in range(N)]
sum(listcomp)
#print('listcomp = [%d, %d, %d, ... %d, %d]' % tuple(listcomp[:3]+listcomp[-2:]))
#print('listcomp is a %s' % type(listcomp))


In [None]:
%%timeit
genexpr = (k*k for k in range(N) )
# Comment this out for large values of N.
#print('genexpr = %s' % genexpr)
#print('genexpr is a %s' % type(genexpr))
sum(genexpr)


### Generator functions

A *generator function* is a function that produces an iterator. The principle difference between a gerator function and a standard function is the use of the keyword `yield` rather than `return` to exit the function.

In [None]:
def first_generator_function():
    yield 'A'
    yield 'B'
    yield 'C'

In [None]:
# Invoking first_generator_function gives an iterator
for result in first_generator_function():
    print('result = %s' % result)

In [None]:
# An alternative way to use the iterator produced by first_generator_function
g = first_generator_function()
print('next(g) = %s' % next(g))
print('next(g) = %s' % next(g))
print('next(g) = %s' % next(g))
print('next(g) = %s' % next(g))

This next example produces a generator function `fib_generator` that gives an iterator for an infinite sequence (namely the sequence of Fibonacci numbers).

In [None]:
def fib_generator():
    prev, curr = 0, 1
    while True:
        yield curr
        prev, curr = curr, prev + curr
        
# fib is an iterator created from the generator function fib
fibs = fib_generator()
for _ in range(5):
    print(next(fibs))

We can in principle use the iterator `fibs` (instantiated by invoking `fib_generator`) as the iterator in a `for` loop.
```python
fibs = fib_generator() # fib is an iterator created from the generator function fib
for result in fibs:
    print(result)
```

*This is an infinite loop!* Do *not* use `fib` as the iterable in a `for` loop *without specifying a `break` condition*.

In [None]:
# fibs is an iterator created from the generator function fib
fibs = fib_generator()
# Safer loop that will break at the first Fibonacci greater than 1000.
for k, result in enumerate(fibs):
    if result>1000:
        break
    print('%3d: %d' % (k,result))

In [None]:
next(fibs)

In [None]:
next(fibs), next(fibs)

Here is another infinite sequence generating primes using the [sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)

In [None]:
def sieve_generator():
    "Simple and naive lazy Sieve of Eratosthenes"
    candidate = 2
    found = []
    while True:
        if all(candidate % prime != 0 for prime in found):
            yield candidate
            found.append(candidate)
        candidate += 1

In [None]:
primes = sieve_generator()
print(next(primes))
print(next(primes))
print(next(primes))
print(next(primes))
# Notice that the numbering in the enumeration below is offset.
for k, p in enumerate(primes):
    if p>100:
        break
    print('The %dth prime is %d' % (k+5,p))

Just for fun, let's see if we can make the generator more efficient.

In [None]:
%%timeit
for p, n in zip(sieve_generator(), range(int(1e4))):
    pass

In [None]:
def sieve_generator2():
    "Less simple lazy Sieve of Eratosthenes; skip the even numbers"
    yield 2
    candidate = 3
    found = []
    while True:
        if all(candidate % prime != 0 for prime in found):
            yield candidate
            found.append(candidate)
        candidate += 2

In [None]:
%%timeit
for p, n in zip(sieve_generator2(), range(int(1e4))):
    pass

Much more significant that skipping the even numbers (or even a [wheel factorization](https://en.wikipedia.org/wiki/Wheel_factorization) to skip various multiples) is simply **not** looking higher than the square root of the candidate prime.

In [None]:
from math import sqrt, ceil
def up_to(seq, lim):
    for n in seq:
        if n < lim:
            yield n
        else:
            break
            
def sieve_generator3():
    "Pretty good Sieve; skip the even numbers, stop at sqrt(candidate)"
    yield 2
    candidate = 3
    found = []
    while True:
        lim = int(ceil(sqrt(candidate)))
        if all(candidate % prime != 0 for prime in up_to(found, lim)):
            yield candidate
            found.append(candidate)
        candidate += 2

In [None]:
%%timeit
for p, n in zip(sieve_generator3(), range(int(1e4))):
    pass

Another more useful generator function can be used to generate all permutations of a finite string. Notice that this generator function is recursive. Combinatorial functions like this are implemented in the `itertools` module.

In [None]:
def permutations(items):
    if not items:
        yield []
    else:
        for index in range(len(items)):
            for item in permutations(items[:index]+items[index+1:]):
                yield [items[index]] + item

for p in permutations('ABC'):
    print(''.join(p))

## A digression on the `itertools` module

The module `itertools` is a collection of very powerful—and carefully designed—functions for performing *iterator algebra*.  That is, these permit *function composition* with iterators in sophisticated ways while minimizing concrete instantiation of terms in iterable sequences. In addition to the basic functions in the module itself, the [module documentation](https://docs.python.org/3.5/library/itertools.html) provides a number of short recipes for additional functions using two or three of the basic module functions in combination. *Be aware that it is easy to get these recipes subtly wrong*. The third-party module `more_itertools` provides additional functions that are likewise designed to avoid common pitfalls and edge cases.

The basic goal of using the building blocks inside `itertools` is to avoid performing computations before they are required, to avoid the memory requirements of large collections, to avoid potentially slow I/O until strictly necessary, and so on. Iterators are lazy sequences rather than realized collections; when combined with functions or recipes in `itertools`, they retain this property.

Here is a quick example of combining a few things. Rather than the stateful `Fibonacci` class to let us keep a running sum, we might simply create a single lazy iterator to generate both the current number and this sum:

In [None]:
from itertools import count, tee
mycount = count()
next(mycount), next(mycount), next(mycount)

In [None]:
# Assume that this is code we cannot modify ourselves (3rd party, etc.)
def fibonacci_gen():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a+b
fibonacci = fibonacci_gen()
print(next(fibonacci))

In [None]:
list(zip("ABC", [1,2,3], range(100,103)))

In [None]:
from itertools import accumulate
# Iterate over both an iterable of numbers and running total of the sequence
def item_with_total(iterable):
    "Generically transform a stream of numbers into a pair of (num, running_sum)"
    s, t = tee(iterable) # unpacking tuples
    yield from zip(t, accumulate(s))
    # Equivalent to:
    # for item, total in zip(t, accumulate(s)):
    #     yield item, total

fibs = fibonacci_gen()
for n, (fib, total) in zip(range(10), item_with_total(fibs)):
    print("%3d. Item: %3d; Total: %3d" % (n+1, fib, total))

The documentation for the `itertools` module contain details on its combinatorial functions as well as a number of short recipes for combining them. Note that for practical purposes, `zip()`, `map()`, `filter()`, and `range()` (which is, in a sense, just a terminating `itertools.count()`) could well live in `itertools` if they were not built-ins.  That is, all of those functions lazily generate sequential items (mostly based on existing iterables) without creating a concrete sequence. Built-ins like `all()`, `any()`, `sum()`, `min()`, `max()`, and `functools.reduce()` also act on iterables, but all of them, in the general case, need to exhaust the iterator rather than remain lazy.

# Large sequences, even if not quite infinitely long

```python
log1 = open('huge.log')
seq = itertools.count()
rows = db.execute("select * from big_data")
z = zip(log1, seq, rows)
for line, num, row in z:
    if something:
        break
    something_else(line, num, row)
```

# Chaining iterables

The functions `itertools.chain()` and `itertools.chain.from_iterable()` combine multiple iterables.  Built-in `zip()` and `itertools.zip_longest()` also do this, but in manners that allow incremental advancement through the iterables.  A consequence of this is that while chaining infinite iterables is valid syntactically and semantically, no actual program will exhaust the earlier iterable. For example:

```python
from itertools import chain, count
thrice_to_inf = chain(count(), count(), count())
```

Conceptually, `thrice_to_inf` will count to infinity three times, but in practice once would always be enough.  However, for merely *large* iterables—not for infinite ones—chaining can be very useful and parsimonious.

In [None]:
from glob import glob
from itertools import chain, islice
def from_logs(fnames):
    yield from (open(file) for file in fnames)

# Substitute suitable directory with lots of log files...
logdir = '/Users/dmertz/Library/Logs/*.log'
logs = glob(logdir)
lines = chain.from_iterable(from_logs(logs))
for line in islice(lines, 16002, 16006):
    print(line, end='')

In [None]:
next(lines)

In [None]:
next(lines)

In [None]:
r = range(100000000)
r1, r2 = tee(r)
next(r1),next(r1),next(r1),next(r1),next(r1),next(r1)

In [None]:
next(r1)

In [None]:
next(r2)

Besides the chaining with `itertools`, we should mention `collections.ChainMap()` in the same breath. Dictionaries (or generally any `collections.abc.Mapping`) are iterable (over their keys). Just as we might want to chain multiple sequence-like iterables, we sometimes want to chain together multiple mappings without needing to create a single larger concrete one. `ChainMap()` is handy, and does not alter the underlying mappings used to construct it.

# Python's nomenclature for virtual sequences

There are several subtly different terms related to "lazy sequences" in Python.  A *generator
function* is a named function that, when called, returns a *generator*.  In turn, a generator is one particular type of *iterator*.  Other iterators include concrete lists, open file handles, file-like objects like `http.client.HTTPResponse`, views into collections, objects returned by calls to `itertools` functions, etc.

# Generator functions

The simplest generator function possible is:

In [None]:
def simple():
    yield

What does it do?  Not very much.  The main idea of a generator is that we yield values on demand instead of all at once.  We can yield these values from a value using the `yield` keyword instead of the `return` keyword.  You can think of the `yield` keyword as a "pause" button for the function.  It temporarily suspends execution of the function and yields control to the caller.  The calling function can demand another value from the generator using the `next()` function.

Note that only one `return` statements can ever be executed within a particular function call (but a function might have multiple potential branches that return).  In contrast, we can have multiple `yield` statements inside the function where each one will be executed on subsequent resumptions of the suspended function.

In [None]:
# A generator function that will yields  values  
def f():
    print("I'm going to yield 0")
    yield 0
    print("I'm going to yield 1")
    yield 1

In [None]:
# This *only* constructs the generator. Nothing in function is executed.
x = f()  
print("Calling next(x)")
# First next() statement executes up to and including first yield
print(next(x))  
print(next(x))  # Execute up to and including the next yield

What happens when a generator runs out of values? It raises an exception, of course.

Remember a slogan of Python: "Exceptions are not that exceptional." (this might take some getting used to for programmers coming from, e.g. C).

In [None]:
from traceback import print_exc
try:
    print(next(x))
except Exception:
    print_exc()

# The iterator protocol

There is a protocol for what makes something an *iterator*; and also for what makes it an *iterable* (which are not quite the same thing).  And iterable is simply an object with a `.__iter__()` method, where that method returns an iterator when called.  And iterator is itself an iterable, but one whose `.__iter__()` method generally returns itself.  The extra feature an iterator has over an iterable is that it also requires a `.__next__()` method.

These dunder methods might seem obscure and strange.  But most of their work happens "behind the scenes" and you do not have to think about them (except when you want to).  Basically, these magic methods are a lot like other Python magic methods, and they control how objects respond to basic syntactic constructs.  

Let's illustrate the differences among the types of things:

In [None]:
from collections.abc import *
def simple():
    yield True
inst = simple()

isinstance(simple, Callable), isinstance(inst, Iterator)

In [None]:
type(simple), type(inst)

In [None]:
inst.__next__, inst.__iter__

In [None]:
l = [1,2,3]
isinstance(l, Iterable), isinstance(l, Iterator)

In [None]:
type(l), l.__iter__

In [None]:
try:
    l.__next__
except AttributeError as e:
    print("Lists do not have a .__next__() method")

One powerful use for generators is for representing infinite sequences.  Generators allow us to work with long sequences efficiently.  We can avoid having to calculate and store the sequence all at once in memory.  Below, we represent a common alternating series whose sum converges to $ln(2)$.

In [None]:
def ln2():
    denom = 1
    sign = 1
    while True:
        yield (1.0/denom)*sign
        denom, sign = denom + 1, sign * -1

In [None]:
from itertools import islice # very useful for slicing an iterator an iterator
from math import log

In [None]:
# we slice off the first n terms of the sequence.
sum(islice(ln2(), 100000)) 

In [None]:
log(2)

We can call other generators.  Lets create a generator that takes care of the just the sign.  It will yield an unending stream of alternating 1, -1, 1, -1, ...

In [None]:
def altsign(pos=True):
    sign = 1 if pos else -1
    while True:
        yield sign
        sign *= -1
        
def ln2():
    sign = altsign()
    for denom, sign in enumerate(sign, 1):
        yield float(sign)/denom

list(islice(ln2(), 1, 10))

In [None]:
from itertools import count
altsign2 = ((n%2 * -2)+1 for n in count())
list(islice(altsign(), 1, 10)), list(islice(altsign2, 1, 10))

Generators can also recieve values from the calling method via the ```send()``` method.  This method will send a single object back into a generator.  This object becomes the return value of the ```yield``` statement inside the generator

In [None]:
def mr_postman():
    letter = None
    while True:
        # Yield, waiting for input
        letter = yield letter
        if not str(letter).isalpha():
            if len(letter) > 1:
                print("Those are not letters")
            else:
                print("That is not a letter")

In [None]:
f = mr_postman()  # Construct the generator object
next(f)           # Must call next to execute generator to first yield.  
                  # Equivalent to f.send(None)
f.send('g')       # Send a value into the generator.  
                  #   If our postman doesn't receive a string of only 
                  #   letter(s), he will complain
                  # Otherwise he will return the letter(s) he got.

# Generators defining iterables

While you *can* explicitly call `next(it)` or `it.send(val)` repeatedly on the iterators returned by generator functions, the more common pattern by a large margin is to use iterators as sequences (perhaps large or infinite) that you loop through.

In Python, the `StopIteration` exception that we saw is a special signal to loops that a sequence of items is exhausted.  This allows concrete collections like lists to behave the same way as lazy generators for most purposes.

In [None]:
# A generator function to return letters of a string multiple times
def iterate_letters(s, times=2):
    for letter in s:
        for _ in range(times):
            yield letter
            
for c in iterate_letters("StopIteration", 3):
    print(c, end='_')

Or for another example, remember our `ln2()` generator function defined above.  It successively approximates `math.log(2)` in an iterative way.  We might wonder how long it takes these approximations to get "pretty close" to the true answer (that is, the nearest IEEE-854 floating point number to the true, irrational, answer).

In [None]:
import math
from itertools import accumulate
delta = .01
log2 = math.log(2)
for i, approx in enumerate(accumulate(ln2())):
    print(i+1, "-", approx)
    if abs(log2-approx) < delta:
        break

## Generator comprehensions

In the Introduction notebooks, we discussed generator comprehensions.  Whether to express a generator as a comprehension or a function is often just a choice of style and readability.  In some sense they are formally equivalent.

In [None]:
# A simple generator function
def to_upper(s):
    for c in s:
        yield c.upper()

In [None]:
for c in to_upper("Hello world!"):
    print(c, end='')

In [None]:
# The same thing as a generator comprehension (but requires name in scope)
s = "Hello world!"
as_upper = (c.upper() for c in s)
for c in as_upper:
    print(c, end='')

In [None]:
# But we are free to wrap this in a function if we want...
def to_upper2(s):
    return (c.upper() for c in s)

In [None]:
for c in to_upper2("Hello world!"):
    print(c, end='')

In [None]:
type(to_upper), type(to_upper2)

In [None]:
type(to_upper(s)), type(to_upper2(s))

# Exercise (factorization)

Write a generator, that given the number $n$, returns the prime factorization of that number.

Optional: If you have time, write another generator that returns every factorization of the number.

In [None]:
# For a hint, run this cell
import codecs
print(codecs.encode('''# Hfr fbzr fcrpvny shapgvbaf qrsvarq va nabgure abgrobbx
vzcbeg flf
flf.cngu.nccraq('./fep')
vzcbeg cevzrf''', 'rot13'))

# Exercise (creating iterables)

Invent a clever iterable using the `yield` keyword to define a generator function.  In fact, invent a couple of them.  See if you can combine or utilize them in interesting ways using the tools in the `itertools` module.

In [None]:
import continuum_style; continuum_style.style()