<img src='img/logo.png' />

<img src='img/title.png'>

<img src='img/py3k.png'>

# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
* [Iterables, iterators, and generators](#Iterables,-iterators,-and-generators)
	* [Iterators and iterables](#Iterators-and-iterables)
	* [Generators](#Generators)
		* [Generator expressions](#Generator-expressions)
			* [In fuction calls](#In-fuction-calls)
		* [Generator functions](#Generator-functions)
		* [Infinite generators](#Infinite-generators)
		* [Multiple instances](#Multiple-instances)
		* [Recursions](#Recursions)
	* [Exercise](#Exercise)


# Learning Objectives:

After completion of this module, learners should be able to:

* construct & use iterators for sequential tasks

# Iterables, iterators, and generators

We have seen many examples of loops over general data collections. We have also seen examples of more general objects (e.g., `range`) that can also be looped over. These are all specific examples of *iterables* in Python.

Some reading to extend the discussion here:

* [Maximize your program's laziness (slides)](http://gnosis.cx/publish/Laziness.pdf)
* [Iterables vs. Iterators vs. Generators](http://nvie.com/posts/iterators-vs-generators/) (A little pocket reference on iterables, iterators and generators.)
* [Python and lazy evaluation](http://swizec.com/blog/python-and-lazy-evaluation/swizec/5148)
* [Improve Your Python: `yield` and Generators Explained](https://www.jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/)

In [None]:
# iterate a list

In [None]:
# iterate a string

In [None]:
# iterate a dictionary

In [None]:
# iterate a range object

## Iterators and iterables

In words, a Python *iterator* is

* usually an object with *state* that remembers where it paused during iteration
* any object with a `__next__` method (or `next` before Python 3) that:
    * returns the next value in the iteration
    * updates the state to point at the next value
    * signals when it is done by raising `StopIteration`
* any object that is *self-iterable* (i.e., it has an `__iter__` method that returns `self`).
* any object for which the builtin function `next` calls the `__next__` method on the object passed to it.

A Python *iterable* is

* any object that can be looped over (e.g., a string, a tuple, a list, a dictionary, a file, etc.)
* any object that can appear on the right-side of a `for` loop (i.e., `for x in object:`)
* any object that can be used within a call to `iter` (i.e., `iter(object)` returns an *iterator*)
* any object that has an `__iter__` method that returns an iterator *or* has a`__getitem__` method that permits indexed lookup.

In [None]:
x = ['a','b','c']
print('x is', type(x))

What happens when we make the list an `iter`?

* Most containers (e.g., `list`, `dict`, `set`, etc.) are iterables.
* A `range` object is iterable but not an iterator.
* An iterator is always an iterable. The opposite is not true.
* Iterables return iterators when `iter` is applied to them. This is what happens when a `for` loop uses an iterable.
* Iterators are consumed as they are used. That is, calling `next` produces results in sequence that cannot be retrieved without instantiating a new iterator.
* The [Python iterator protocol](https://docs.python.org/3/c-api/iter.html) specifies the behavior of an object's `__iter__` and `__next__` methods.

The principal advantage in distinguishing iterators and iterables is the ability to use lazy evaluation to defer generating terms in sequences. For instance, it is possible to loop over the iterable `list(range(10000000))` that explicitly builds the list of ten million elements before looping. But it is also possible to loop over the iterable `range(10000000)` itself which produces the integers in sequence without requiring storage for ten million elements in memory. More abstract iterables (notably files and data streams) can be arbitarily large, so understanding lazy evaluation is extremely useful.

## Generators

The term *generator* is widely but imprecisely used in Python so there is a lot of confusion around this topic (even more so than the confusion around iterators and iterables). There are *generator objects* (that are iterators), *generator functions* (that return generator objects), and *generator expressions* (that evaluate as generator objects and resemble comprehensions). According to the [Python glossary](http://docs.python.org/glossary.html#term-generator), the official terminology is now that *generator* means *"generator function"*. Unfortunately, generator objects still belong to the generator class, so this terminology is still not used consistently.

We will try to be consistent:

* A *generator object* is a special kind of iterator produced either by a *generator expression* or a *generator function*. We will simply call a generator object an iterator (because that is what it is).
* A *generator expression* is a comprehension (usually delimited by parentheses) that produces an iterator.
* A *generator function* is a function that returns an iterator and uses the `yield` keyword (rather than `return`) to pass values back to the calling namespace.

### Generator expressions

The easiest way to construct a generic iterator is to apply the function `iter` to a collection, e.g.,
```python
>>> my_string = 'This is a string'
>>> my_iter = iter(my_string)
```

An alternative is to use a *generator expression*&mdash;basically a comprehension delimited by parentheses.

In [None]:
parrot = """'E's a stiff! Bereft of life, 'e rests in peace! 
If you hadn't nailed 'im to the perch 'e'd be pushing up the daisies!
'Is metabolic processes are now 'istory! 'E's off the twig!
'E's kicked the bucket, 'e's shuffled off 'is mortal coil,
run down the curtain and joined the bleedin' choir invisibile!!

THIS IS AN EX-PARROT!!""".split()

Construct a generator to transform each word to upper case

#### In fuction calls

In [None]:
# Generator comprehensions are similar to tuples in their syntax.
# The parenthesis are not always needed, e.g. if they occur inside a function call
sum(n**2 for n in range(10))

Here is a generator expression that produces a sequence of perfect squares. The important difference to notice between the list comprehension and the generator expression is that former explicitly produces the list while the latter uses *lazy evaluation* to produce elements as they are required. For large values of N, the generator expression is much faster (especially when we time, say, adding up the terms of the sequence).

In [None]:
N = int(3e7)

In [None]:
%load_ext memory_profiler

In [None]:
%%memit 
listcomp = [k*k for k in range(N)]
sum(listcomp)

In [None]:
# delete the list to make a fair comparison
del listcomp

In [None]:
%%memit
genexpr = (k*k for k in range(N))
sum(genexpr)

### Generator functions

A *generator function* is a function that produces an iterator. The principle difference between a gerator function and a standard function is the use of the keyword `yield` rather than `return` to exit the function.

In [None]:
def first_generator_function():
    yield 'A'
    yield 'B'
    yield 'C'

In [None]:
# loop over the output of the generator function

In [None]:
# create a iterator object

This next example produces a generator function `fib_generator` that gives an iterator for an infinite sequence (namely the sequence of Fibonacci numbers).

In [None]:
def fib_generator():
    prev, curr = 0, 1
    while True:
        yield curr
        prev, curr = curr, prev + curr

In [None]:
# fib is an iterator created from the generator function fib
fibs = fib_generator()
for _ in range(5):
    print(next(fibs))

### Infinite generators

We can in principle use the iterator `fibs` (instantiated by invoking `fib_generator`) as the iterator in a `for` loop.
```python
fibs = fib_generator() # fib is an iterator created from the generator function fib
for result in fibs:
    print(result)
```

*This is an infinite loop!* Do *not* use `fib` as the iterable in a `for` loop *without specifying a `break` condition*.

In [None]:
# enumerate and break the sequence

### Multiple instances

Here is another infinite sequence generating primes using the [sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)

In [None]:
def sieve_generator():
    "Simple and naive lazy Sieve of Eratosthenes"
    candidate = 2
    found = []
    while True:
        if all(candidate % prime != 0 for prime in found):
            yield candidate
            found.append(candidate)
        candidate += 1

In [None]:
# enumerate and break

What happens if we make multiple instances?

*Each iterator instance starts from the beginning.*

### Recursions

Another more useful generator function can be used to generate all permutations of a finite string. Notice that this generator function is recursive. Combinatorial functions like this are implemented in the `itertools` module.

In [None]:
def permutations(items):
    if not items:
        yield []
    else:
        for index in range(len(items)):
            for item in permutations(items[:index]+items[index+1:]):
                yield [items[index]] + item

In [None]:
for p in permutations('ABC'):
    print(''.join(p))

## Exercise

<img src='img/topics/Exercise.png' align='left' style='padding:10px'>
<br>
Write the `range` function as a generator.

The range function takes three input arguments, `start`, `stop` and `step`.

Use `return` to properly exit from the function when `stop` is reached.a


In [None]:
def my_range(start, stop, step=1):
    # your solution here

---
<a href='adv_generators_soln.ipynb' class='btn btn-primary'>Solution</a>

<img src='img/copyright.png'>