# Generators #

## Contents ##
1. [From iterators to generators](#iterators_to_generators)
2. [Why not lists?](#why_not_lists)
3. [Introducing generators](#intro_gen)
4. [Another example](#one_more_ex)
5. [Some more examples](#more_examples)
6. [Generator vs. ordinary function](#generator_vs_function)
7. [Further reading](#further_reading)

<a id='iterators_to_generators'></a>
## From iterators to generators ##

The theme of generators is intertwined with that of [Iterators](iterators.ipynb). Please read up first on the notebook about [Iterators](iterators.ipynb) before continuing to read this one.

If you recall, an *iterator* is a container object implementing `__iter__` and `__next__`. It can be iterated over.

A *generator* is a special iterator which can save the execution context. While an iterator is implemented as a class, a generator is written as a function which contains a `yield` statement.

Often we can write easier code with the help of generators where we used iterators before.
The last example in [Iterators](iterators.ipynb) showed that: We had an iterable class `Sentence` and the corresponding iterator class `SentenceIterator`

In [3]:
import re 
RE_WORD = re.compile('\w+')

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __iter__(self):
        return SentenceIterator(self.words)

class SentenceIterator:
    def __init__(self, words):
        self.words = words
        self.index = 0

    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word

    def __iter__(self):
        return self
    

And we can use it like this:

In [4]:
it = iter(Sentence('"Begin at the beginning," the King said, very gravely, "and go on till you come to the end: then stop."'))

for word in it:
    print(word)

Begin
at
the
beginning
the
King
said
very
gravely
and
go
on
till
you
come
to
the
end
then
stop


<a id='why_not_lists'></a>
## Why not lists? ##
Quite fancy. But you could ask, why do we all need this? Why do we not simply work with lists? Let's look at just lists:

In [5]:
class NonIterableSentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def get_words(self):
        return self.words

s = NonIterableSentence('"and go on till you come to the end: then stop."')

for word in s.get_words():
    print(word)

and
go
on
till
you
come
to
the
end
then
stop


<a id='intro_gen'></a>
## Introducing generators ##

One answer is, imagine we had a huge text with zillions of words. Then we don't necessarily want to store all of them in a list and in memory. On the other hand, with an iterator or generator (special iterator), we can lazily (on demand) generate our values and save memory space. Furthermore, we do not need to wait until all the elements have been generated before we can start using them.

Now, *generators* are iterators that produce the values of the expressions passed to yield. Calling `next` on a generator will fetch the next item produced by yield while it remembered state from the last yield. When the function body is completed, StopIteration is raised.



In [6]:
import re 
RE_WORD = re.compile('\w+')

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __iter__(self):
        for word in self.words:
            yield word
            
s = Sentence('"Begin at the beginning," the King said, very gravely, "and go on till you come to the end: then stop."')
it = iter(s)
for i in range(0,5):
    print(next(it))


Begin
at
the
beginning
the


Quite nice since it is very short. But we cheated, since we still initialize our words eagerly, not lazily. The function `re.findall`is eager: it returns the whole list of words at once. There is a lazy alternative: `re.finditer`: Using it, we do need no more list:

In [7]:
import re
import reprlib

RE_WORD = re.compile('\w+')

class LazySentence:
    def __init__(self, text):
        self.text = text
    
    def __iter__(self):
        for match in RE_WORD.finditer(self.text):
            yield match.group()
            
s = LazySentence('"Begin at the beginning," the King said, very gravely, "and go on till you come to the end: then stop."')
it = iter(s)
for i in range(0,5):
    print(next(it))


Begin
at
the
beginning
the


<a id='one_more_ex'></a>
## Another example ##

Let's repeat with one more example:
When we read data from a file we often do not need to read everything at once into memory. Let's first do it wrong: read the whole input into a list at once:

In [8]:
import random 

data_file = 'random_numbers.txt'
data = []

def create_random_data():
    with open(data_file, 'w') as file:
        for _ in range(100):
            file.write(str(random.randrange(1, 100))+'\n')
            
def read_random_data():
    with open(data_file, 'r') as file:
        data = [int(line) for line in file.readlines()]
    print(data)
            
create_random_data()
read_random_data()

[35, 94, 56, 15, 74, 32, 53, 79, 12, 99, 23, 44, 1, 78, 13, 54, 33, 50, 52, 57, 32, 63, 51, 94, 60, 33, 92, 82, 34, 95, 10, 66, 51, 12, 59, 93, 17, 52, 46, 51, 78, 50, 6, 70, 48, 75, 77, 51, 3, 70, 73, 18, 41, 10, 53, 50, 47, 97, 56, 86, 43, 6, 89, 5, 53, 63, 39, 84, 39, 75, 76, 46, 84, 92, 71, 69, 76, 80, 33, 45, 73, 37, 55, 39, 61, 77, 75, 40, 40, 19, 53, 43, 72, 92, 56, 15, 63, 27, 67, 76]


To improve, let us write our own iterator:

In [9]:
import random

class NumberReaderIterator:
    def __init__(self, file):
        self.file = file
        self.index = 0

    def __next__(self):
        line = self.file.readline()
        if line is None:
            raise StopIteration()
        self.index += 1
        return int(line)

    def __iter__(self):
        return self
    
data_file = 'random_numbers.txt'

def create_random_data():
    with open(data_file, 'w') as file:
        for _ in range(100):
            file.write(str(random.randrange(1, 100))+'\n')
            
def read_random_data():
    with open(data_file, 'r') as file:
        it = NumberReaderIterator(file)
        for _ in range(1, 20):
            data = next(it)
            print(data)
            
create_random_data()
read_random_data()

59
99
73
93
10
16
27
39
40
62
49
81
75
82
74
85
28
94
70


Of course, we already know, we can simplify by writing an iterable whose `__iter__` returns a generator:

In [10]:
import random

class NumberReaderIterable:
    def __init__(self, file_path):
        self.file_path = file_path

    def __iter__(self):
        with open(self.file_path, 'r') as file:
            for line in file:
                yield int(line)
    
data_file = 'random_numbers.txt'

def create_random_data():
    with open(data_file, 'w') as file:
        for _ in range(100):
            file.write(str(random.randrange(1, 100))+'\n')
            
def read_random_data():
    it = iter(NumberReaderIterable(data_file))
    for _ in range(1, 20):
        print(next(it))
            
create_random_data()
read_random_data()

36
35
33
22
3
28
89
66
96
5
44
75
30
50
41
72
72
39
91


<a href="#more_examples"></a>
## More examples ##


In [11]:
def get_pages(directory):
    """A generator for all html pages inside a directory"""
    for name in os.listdir(directory):
        if name.endswith('.html'):
            yield name[:-5]

<a href="#generator_vs_function"></a>

## Generator vs. ordinary function ##

The difference between a generator function and an ordinary function (and the keypoint about the concept of generator) is that a generator function saves its state in between calls while a function forgets its state after return.

Let's say we want to build a list of the first 10 factorials. How would we do it? 

In [12]:
def factorial(n):
    if n < 2:
        return 1
    else:
        return n * factorial(n-1)
    
print (factorial(10))
print (factorial(5))

3628800
120


Let's first do it wrong again: write a function

In [16]:
def get_first_factorials(upper_limit):
    values = []
    for n in range(upper_limit):
        values.append(factorial(n))
    return values

print(get_first_factorials(10))

[1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]


In [18]:
def next_factorial_gen(upper_limit):
    for n in range(upper_limit):
        yield factorial(n)

In [19]:
for val in next_factorial_gen(10):
    print(val)

1
1
2
6
24
120
720
5040
40320
362880


<a id='further_reading'></a>
## Further Reading ##
- To repeat and deepen your understanding, you can read: [Comprehensions](comprehensions.ipynb)
