# Generators
### Copyright Luca de Alfaro, 2020-21.  License: CC-BY-NC. 


Prepared on: Fri Sep 17 15:33:27 2021

This is a book chapter; it is not a homework assignment.  
Do not submit it as a solution to a homework assignment; you would receive no credit.


To understand what a generator is, let's start with a simple example.  
Assume we have a long list `words` of words, perhaps obtained by splitting a long document into the individual words.  The words in `words` can be either in uppercase, or in lowercase, or capitalized: we have no guarantees. 

Our task is this: given an additional word `keyword`, we want to check whether `keyword` appears in the document, regardless of case. 

One brute force way of doing this consists in creating a lowercase version of `words`, and check if `keyword`, once also put in lowercase, belongs to the list: 

In [26]:
words = "I told you I do not need a PHONE I need a TOASTER".split()

def lowercase_list(words):
    """Returns the lowercase version of the list"""
    return [w.lower() for w in words]

def check_occurrence(words, keyword):
    return keyword.lower() in lowercase_list(words)

print(check_occurrence(words, "Phone"))


True


The problem with this approach is that we are building a whole new duplicate list, just for the purpose of checking membership. 

One way of avoiding building the whole list is to inter-mingle the generation of the list and its use: 

In [27]:
def check_occurrence(words, keyword):
    keyword_lower = keyword.lower()
    for w in words:
        word_lower = w.lower() # Input preparation
        if keyword_lower == word_lower: # Input use
            return True
    return False


The problem with this is that it mixes the `for` loop in `check_occurrence` with the list transformation. 
In this case, the list transformation is simple (it consists in a single call to `.lower()`), but if it were complex, it would be a much cleaner design to transform the list in a separate function. 

Can we have a separate function, similar to `lowercase_list`, that avoids building a full list? 


It turns out, yes.  Consider this function, which iterates over a list of words, and prints the lowercase version of each word:

In [28]:
def print_lowercase(words):
    for w in words:
        print(w.lower())

print_lowercase(words)


i
told
you
i
do
not
need
a
phone
i
need
a
toaster


The function converts each word to lowercase on the fly, and prints it, without ever building the list of all lowercase words.  Can we just, instead of printing the word, return it?  Then we could check membership without building a new list.  Does this work? 

In [29]:
def return_lowercase(words):
    for w in words:
        return w.lower()

def check_occurrence(words, keyword):
    keyword_lower = keyword.lower()
    for w in return_lowercase(words):
        if w == keyword_lower:
            return True
    return False

print(check_occurrence(words, "Phone"))


False


What goes wrong?  The problem is that the `return` statement in line 3 above terminates the execution of the `for` loop.  So `return_lowercase` returns the first word `"i"` and terminates.  The check in line 6 then checks whether `"phone"` is a member of the list of characters `"i"`: if a string is used when a list is expected, the list of characters is used. 
This is not true, and the check returns `False`. 

Fundamentally, what goes wrong is that `return` does not allow us to return the lowercase words one by one, because it stops at the first word.  
We need a version of `return` that gives the result back, but "keeps going" in the loop.  This version of `return` is called `yield`.  Let's try it. 

In [30]:
# Generator
def yield_lowercase(words):
    for w in words:
        yield w.lower()

def check_occurrence(words, keyword):
    keyword_lower = keyword.lower()
    for w in yield_lowercase(words):
        print("w:", w)
        if w == keyword_lower:
            return True
    return False

print(check_occurrence(words, "Phone"))


w: i
w: told
w: you
w: i
w: do
w: not
w: need
w: a
w: phone
True


This works! 

And the function `yield_lowercase` is called a generator, because it generates a list -- without ever having to store one. 

The implementation of the `yield` statement is sophisticated, because it needs essentially to pause execution to return the value, keeping track of the place inside the function, so that execution can be restarted when the next value is needed.  But to understand how to use it, think of `yield` as a version of `print` that, instead of printing, returns the value, and then just like `print`, keeps going. 

## Understanding genrators via `print`

One way to understand generators is via the `print` function. 
Imagine you want to prepare some data, for example as before turning words to lowercase.  You could print every word once it is ready: 

In [31]:
def print_lowercase(words):
    for w in words:
        print(w.lower())


Printing the word adds it to the _output stream_, the list of characters and stuff that ends up in front of your eyes when you execute a notebook cell.  In a `print`, you add the word to the output stream and you keep going.  You don't worry about how the process works that will bring the output to your eyes. 

In a similar way, `yield` "hands back" the data, and then keeps going: 

In [32]:
def yield_lowercase(words):
    for w in words:
        yield w.lower()


Well, it doesn't quite keep going -- it waits till somebody needs one more piece to keep going, because laziness is a virtue, but the idea is quite similar. 

Remember, `yield` and `print` are somehow very similar when you imagine how code behaves.  They both do something with the data they have, and keep going. 

If you have trouble understanding the flow of control of a generator, mentally try replacing the `yield` with `print`, and imagine the code calling the generator to be "reading its output", waiting for the next piece of output to be passed to it. 

## Generators Are Infinitely Useful

So what can generators do? 

One thing, as we discovered, is return a variant of a list without every fully building the variant as a list.  This is important, because often we have big data structures in memory, and we want to be able to operate on them without having to duplicate them when we need something else. 

The most interesting use of generators, however, is to create on the fly -- without ever storing them -- structures that can be very large. 

In particular, one of the fun facts about generators is that they provide a _finite_ representation for _infinite_ things.  Here is an iterator on even numbers:

In [33]:
def even_numbers():
    i = 0
    while True:
        yield 2 * i
        i += 1

for n in even_numbers():
    print("This is even:", n)
    if n > 20:
        break


This is even: 0
This is even: 2
This is even: 4
This is even: 6
This is even: 8
This is even: 10
This is even: 12
This is even: 14
This is even: 16
This is even: 18
This is even: 20
This is even: 22


We stopped at 7, but `even_numbers` would have been quite happy to keep going.

Here's an iterator that produces all numbers that are not divisible by 2, 3, or 5.  Note how `yield` does not need to appear directly in a loop: it can appear inside if-then-else statements and anything else. 

In [34]:
def not_div_235():
    i = 0
    while True:
        if (i % 2) * (i % 3) * (i % 5) > 0:
            yield i
        i += 1

for n in not_div_235():
    print(n)
    if n > 20:
        break


1
7
11
13
17
19
23


## Generators That Know When To Stop

Sometimes, while running a generator, you realize that there's nothing more that needs to be `yield`-ed.  In that case, you can terminate the execution with a `return`.   Let's write a generator that takes a list of words, and iterates over the words until a given stop-word is reached, or until the end of the list, whichever happens first. 

In [35]:
def words_until_stop(stop_word, words):
    for w in words:
        if w == stop_word:
            # We stop the iteration.
            return
        else:
            yield w


Note that the function `words_until_stop` has two ways of terminating.  One is when the stop word is met, and the `return` executed.  The other is when the `for` loop runs to completion. 

In [36]:
words = "I like to eat pears far more than apples".split()

for w in words_until_stop("pears", words):
    print(w)


I
like
to
eat


## Iterating Over Permutations

Let us now write an iterator that, given a list, returns all the permutations of elements in the list.  The [itertools module](https://docs.python.org/3.7/library/itertools.html) contains such an iterator, but it is rather instructive to write it ourselves.

Recall that a permutation is a reordering.  So if you have a list 

    [1, 2, 3]

its six permutations are: 

    [1, 2, 3]
    [1, 3, 2]
    [2, 1, 3]
    [2, 3, 1]
    [3, 1, 2]
    [3, 2, 1]

To generate all permutations, we decompose the problem.  For a list $l$, let $P(l)$ be its set of permutations.  For a set of lists $C$, and for a single list $s$, denote with 

$$
s +^* C = \{s + c \mid c \in C \}
$$

the set obtaining by the concatenation of $s$ with every element of $C$. 
For elements $x_1, \ldots, x_n$, we have:

$$
\begin{align*}
P([x_1, \ldots, x_n]) = \{ \, & [x_1] +^* P([x_2, \ldots, x_n]),\\
& [x_2] +^* P([x_1, x_3, \ldots, x_n]), \\
& \cdots \\
& [x_k] +^* P([x_1, \ldots, x_{k-1}, x_{k+1}, \ldots, x_n]),\\
& \ldots \\
& [x_n] +^* P([x_1, \ldots, x_{n-1}]) \, \} \; .
\end{align*}
$$

In other words, given a list $l$, we can iterate on it, choosing in turn an element $x$.  We can then compute the permutations of $l$ by concatenating $x$ with the permutations of the list $l'$ that results from removing $x$ from $l$.

In [37]:
def permute(elements):
    """Yields all the permutations of iterable, one by one."""
    if len(elements) == 0:
        yield []
    else:
        for i, x in enumerate(elements):
            # We separate elements into x, and into remainder, that
            # consists of all elements minus x.
            remainder = elements[:i] + elements[i+1:]
            for p in permute(remainder):
                yield [x] + p


In [38]:
l = [1, 2, 3]
for ll in permute(l):
    print(ll)


[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]


### Iterating Over Subsets

Given a set, we can write write an iterator that returns all subsets of the set.  

As usual, the trick is to decompose the problem.  

Given a set $S$, let $x \in S$ and $S' = S \setminus \{x\}$.  Denoting with $P(S)$ (the _powerset_ of $S$) the set of subsets of $S$, we have: 

$$
P(S) = P(S') \cup \{T \cup \{x\} \mid T \in P(S') \} \; .
$$

In words, if you select an element of $x$, and let $S' = S \setminus \{x\}$, then the subsets of $S$ are the union of:

* the subsets of $S'$, 
* the subsets of $S'$, with $x$ added to them. 

This enables us to reduce the problem of computing the subsets of $S$, to that of computing the subsets of the smaller set $S'$. 
The code is as follows. 

In [39]:
def subsets(s):
    """Given a set s, yield all the subsets of s,
    including s itself and the empty set."""
    if len(s) == 0:
        yield set()
    else:
        ss = set(s)
        x = ss.pop()
        for t in subsets(ss):
            yield {x} | t
            yield t


In [40]:
s = set([1, 2, 3])
for t in subsets(s):
    print(t)


{1, 2, 3}
{2, 3}
{1, 3}
{3}
{1, 2}
{2}
{1}
set()
