<table border="0" align="left" width="700" height="144">
<tbody>
<tr>
<td width="120"><img width="100" src="https://static1.squarespace.com/static/5992c2c7a803bb8283297efe/t/59c803110abd04d34ca9a1f0/1530629279239/" /></td>
<td style="width: 600px; height: 67px;">
<h1 style="text-align: left;">A Brief Tour of Generators</h1>
<p><a href="https://colab.research.google.com/github/KenzieAcademy/python-notebooks/blob/master/demo_generators.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" align="left" width="188" height="32" /> </a></p>
</td>
</tr>
</tbody>
</table>

Generators are functions with a twist. They are very easy to implement, but a bit difficult to understand.

Generators are used to create _iterators_, but with a different approach. Generators are simple functions which return an _iterable_ set of items, one at a time, in a special way.


When an iteration over a set of items is started using the `for` statement, the generator is run. Once the generator's function code reaches a `yield` statement, the generator yields its execution back to the `for` loop, returning a new value from the iterable. The generator function can generate as many values (possibly infinite) as it wants, yielding each one in its turn.

### Iteration Review
Before we jump into generators, it may be beneficial to review a bit about iteration in Python.  There is a distinction between an _iterator_ and an object that is _iterable_.  What exactly is an _iterator_?

 - An iterator is an object that allows _iteration_ to be performed on it.
 - Implements an `__iter__()` method
 - Implements a `__next__()` method
 
Iterators are **stateful** which means they are only able to iterate *once* over an iterable.  After one traversal of the iterable, the iterator is _exhausted_.

### Generators
Generators arrived way back in Python 2.3, from [PEP-255](https://www.python.org/dev/peps/pep-0255/).  This PEP introduced the idea of creating an iterator within a single function by using a new keyword: `yield`  "But wait!", you may ask &mdash; "If iterators must be stateful (so they can know when they are exhausted), how can this be done from a function?"

Well, a **generator** allows an ordinary function to store iterator state, *and* generate the members of an iterable, one at a time.  This is known as **lazy evaluation**.  The ordinary function is transformed into a generator simply by using the `yield` keyword.  The function does not become a generator until it is invoked (called).

From the [Python Docs: Glossary](https://docs.python.org/3/glossary.html):

> **Generator**: A function which returns a generator iterator. It looks like a normal function except that it contains `yield` expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the `next()` function.<br>
Each `yield` temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the generator iterator resumes, it picks up where it left off (in contrast to functions which start fresh on every invocation).

When Python sees a function with a `yield` keyword inside, it treats it differently.

In [None]:
import random

students = ['Kevin', 'Shanquel', 'Marcel', 'Gabby', 'Vincent', 'Wes', 'Sondos', 'Jalal']
def student_picker():
    """returns a new random student name, until all students are exhausted"""
    random.shuffle(students)
    print('Shuffled students: ' + str(students))
    for student in students:
        print("Yielding student " + student)
        yield student
        

In [None]:
# Invoke (call) the generator function to get our generator object
gen = student_picker()
print(gen)

# Note that we have only initialized the generator.
# We have not yet generated any student names...

In [None]:
# Remember that an iterator must implement the two special functions:
# __iter__() and __next__(). Does this generator object implement them?
print("Does gen have a '__next__' method? {}".format(hasattr(gen, '__next__')))
print("Does gen have a '__iter__' method? {}".format(hasattr(gen, '__iter__')))

In [None]:
# There is a more idiomatic way to check if an object is an 'iterator':
import collections
isinstance(gen, collections.Iterator)

In [None]:
# And a similar method to check if an object is an 'iterable'
isinstance(gen, collections.Iterable)

### IMPORTANT
Generators are not executed when they are invoked, only when they are _iterated_ over.  This is an important difference between generators and regular functions.  Python knows the function is a generator, and will return a generator object during invokation, without executing it.

After the function produces the generator object, you must iterate that object according to the Python iteration protocol.

In [None]:
# How to use it? Just keep callin' next(gen)!
next(gen)

In [None]:
next(gen)

NOTICE that generators _freeze their state_ after a yield statement.  They suspend their state of execution until the next `next()` call. 

In [None]:
# Let's exhaust the rest of them!
# The for-loop simply calls __next__() until the StopIteration exception is raised and then it terminates.
for s in gen:
    print(s)

The `for-loop` was used above, because it follows the Python iteration protocol.  It will continue calling the iterator's `__next__()` method until a `StopIteration` exception is raised.

Let's try it again, but without a for-loop this time, so we can see the `StopIteration` exception:

In [None]:
def simple_gen():
    seq = [1, 2, 3, 4]
    for i in seq:
        yield i

it = simple_gen()
# Looky, no for-loop!
print(next(it))
print(next(it))
print(next(it))
print(next(it))
# Wait for it ....
print(next(it))


### Generator return statement
Any `return` statement within a generator function will raise a `StopIteration` exception

In [None]:
def simple_gen():
    yield 'Marcel'
    yield 'Kevin'
    yield 'Sondos'
    return  # raises StopIteration!

it = simple_gen()
print(next(it))
print(next(it))
print(next(it))
print(next(it))

### What happens if we use `iter()` on a generator?

In [None]:
def simple_gen():
    yield 'PK'
    yield 'Doug'
    yield 'James'
    return
gen = simple_gen()

if iter(gen) == gen.__iter__() == gen:
    print("Same generator object instance!")
    
if gen is iter(gen):
    print("Generator is its own iterator!")

if id(gen) == id(iter(gen)):
    print("Stop me when this gets old")
    
# A generator is its own iterator!

### When should I use a generator?
The general rule of thumb is that a generator can replace any function that returns a list.  Look for a function pattern that accumulates something into a list, during a loop.

To use a generator instead, just insert a `yield` statement at the point of accumulation.

In [None]:
# A familiar function to all ...
def div_by_5_and_7(max_num):
    """Returns a list of numbers that are divisible by 5 AND 7"""
    result = []
    for n in range(1, max_num + 1):
        if n % 5 == 0 and n % 7 == 0:
            result.append(n)
    return result

div_by_5_and_7(5000000)

In [None]:
# Presto Chango
def div_by_5_and_7(max_num):
    """Returns a list of numbers that are divisible by 5 AND 7"""
    for n in range(1, max_num + 1):
        if n % 5 == 0 and n % 7 == 0:
            yield n
            
gen = div_by_5_and_7(500)
next(gen)

In [None]:
next(gen)

## Real-world example: Database Chunking
This function acts as a wrapper around `dbcursor.fetchmany()`.  A business may use very large datasets for analytics or reporting.  If the dataset is larger than the OS system memory, it's not possible to fetch the entire set from a single database read.  However if the data is fetched one row at a time, this imposes a large network time cost.

In [None]:
def fetch_many_wrapper(dbcursor, count=20000):
    """Fetch data in chunks instead of one row at a time."""
    done = False
    while not done:
        items = dbcursor.fetchmany(count)
        done = len(items) == 0
        if not done:
            for item in items:
                yield item

## Conclusions
 - Python generators are a powerful, often misunderstood tool. They are sometimes treated as too difficult a concept for
beginning programmers to learn &mdash; creating the illusion that beginners should hold off on learning generators until they are ready.

 - Generators are lazy because they only give us a value when we ask for it. The ultimate result is that generators are incredibly memory efficient, which makes it a perfect candidate for reading and using "Big Data" files. Once we ask for the next value of a generator, the old value is discarded. Once we traverse the entire generator, it is also discarded from memory as well.

 - Generators provide for **lazy evaluation**. Being lazy is (sometimes) good!