In [1]:
ez_string = "Generators"
for s in ez_string:
    print(s)

G
e
n
e
r
a
t
o
r
s


In [5]:
ez_dict = {1 : "First", 2 : "Second"}
for key, val in ez_dict.items():
           print(key, val)

1 First
2 Second


In each of the above examples, the for loop iterates over the sequence we give it. The code above used a list, string, and dictionary, but you can iterate over tuples and sets as well. In each loop above, we print each of the items in the sequence in the order they appear. For example, you can confirm that the order of the ez_list is replicated in the order that its items are printed out.

<p><img src="https://i.imgur.com/91NoaP0.jpg" alt="For Loop Anatomy"></p>

We refer to any object that can support iteration as an iterable.

Iterator can be thought of as a set of requirements to be used for a for loop. That is to say: lists, strings and dictionaries all follow the Iterator Protocol,

In [6]:
number  = 1234
for n in number:
    print(n)

TypeError: 'int' object is not iterable

An integer is just a singular number, not a sequence. You may argue that the "first" number in number is 1, but it is not the same as the first item in a sequence.

Therefore, one of the requirements to be an iterable is to be able to describe to the for loop what the next item to perform the operation on is. For example, lists tell the for loop that the next item to iterate on is in the index+1 from the current one (1 comes after 0).

Consequently, an iterable must also signal to a for loop when to stop iterating. This signal usually comes when we arrive at the end of a sequence (i.e. the end of a list or string). We will explore the specific functions that make something iterable later in this article, the important thing to know is that iterables describe how a for loop should traverse its contents

Generators are iterables themselves. As you'll see later, for loops are one of the main ways we use a generator, so they must be able to support iteration. We'll delve into how we can create our own generators in the next secton.

<ul>
<li>
<p>Iteration is the idea of repeating some process over a sequence of items. In Python, iteration is usually related to the <code>for</code> loop.</p>
</li>
<li>
<p>An iterable is an object that supports iteration.</p>
</li>
<li>
<p>To be an iterable, it must describe to a <code>for</code> loop two things:</p>
<ol>
<li>What item comes next in the iteration.</li>
<li>When should the loop stop iteration.</li>
</ol>
</li>
<li>
<p>Generators are iterables.</p>
</li>
</ul>

To truly explore generators, we'll use the Brewer's Friend Beer Recipes data set from Kaggle. You can find the data set here, if you'd like to follow along on your own computer.

The data contains important beer characteristics from brewers around the world, including style of beer, alcohol by volume (ABV), and amount of beer produced. For the purposes of this article, let's say that we are interested in brewing our own beer. Perhaps we want to sell our beer, so we would like to see what others have done to inform our brewing choices and produce more popular beer styles.

In [9]:
!wget https://raw.githubusercontent.com/thecbp/blog_data/master/recipeData.csv

--2018-06-23 21:02:52--  https://raw.githubusercontent.com/thecbp/blog_data/master/recipeData.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.128.133, 151.101.192.133, 151.101.0.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.128.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13922690 (13M) [text/plain]
Saving to: ‘recipeData.csv.1’


2018-06-23 21:03:06 (1.16 MB/s) - ‘recipeData.csv.1’ saved [13922690/13922690]



If you've never encountered a generator before, the most common real-life example of a generator is a backup generator, which creates — generates — electricity for your house or office.

In [10]:
! mv recipeData.csv recipeData_unprocessed.csv

In [11]:
! mv recipeData.csv.1 recipeData.csv

Conceptually, Python generators generate values one at a time from a given sequence, instead of giving the entirety of the sequence at once. This one-at-a-time fashion of generators is what makes them so compatible with for loops.

There are two ways to create a generator. They differ in their syntax, but the end result is still a generator. We'll teach these concepts by covering their syntax and comparing them to a similar, but non-generator equivalent.

A generator function versus a regular function

A generator expression versus a list comprehension

#### The generator function

A generator function is just like a regular function but with a key difference: the yield keyword replaces return.

In [14]:

# Regular function
def function_a():
    return "a"

# Generator function
def generator_a():
    yield "a"



In [16]:
function_a(),generator_a()


('a', <generator object generator_a at 0x7f8fac4f0780>)

Calling a regular function tells Python to go back to where the function is located in our code, perform the code within the block, and return the result. In order to get the generator function to yield its values, you need to pass it into the next() function.

next() is a special function that asks, "What's the next item in the iteration?" In fact, next() is the precise function that is called when you run a for loop! Lists, dictionaries, strings, and the like all implement next(), so this is why you can incorporate them into loops in the first place.

In [18]:
next(generator_a())

'a'

Notice that we have to pass in generator function with the parentheses since the function itself is the generator. Providing only the function name will throw an error since you're trying to give next() a function name. As expected, the generator function will yield "a" once we invoke the next() function.



Remember that generators produce a stream of values, so yielding a single value doesn't really qualify as a stream. To do this, we can actually put in multiple yield statements into a generator function. These yield statements form the sequence that the generator will output.`

We'll create a generator and bind it to a varible mg. Then, if we keep passing mg into next(), we'll get to the next yield. If we keep going past, we'll be given a StopIteration error to tell us that the generator has no more values to give. The StopIteration error is actually how a for loop knows when to stop iterating.

In [19]:
def multi_generate():
    yield "a"
    yield "b"
    yield "c"
    
mg = multi_generate()
next(mg)

'a'

In [20]:
next(mg)

'b'

In [21]:
next(mg)

'c'

In [22]:
StopIteration

StopIteration

In [23]:
next(mg)

StopIteration: 

In [24]:
next(multi_generate())

'a'

In [25]:
next(multi_generate())

'a'

It's easy to think of generators as a machine that waits for one command and one command only: next(). Once you call next() on the generator, it will dispense the next value in the sequence it is holding. Otherwise, you can't do much else with a generator. The image below represents our generator as a simple machine.

<p><img src="https://i.imgur.com/BalgrZY.jpg" alt="Generators 1"></p>

We continue to get the result of the first yield statement. The reason behind this is subtle. When we pass the generator function itself into next(), Python assumes you are passing a new instance of multi_generate into it, so it will always give you the first yield result. By binding the generator to a variable, Python knows you are trying to act on the same thing when you pass it into next().



We've noted that as we keep passing in mg into next, we get the other yield results. This is possible only if the generator somehow remembers what it last did. This memory is what distinguishes generator functions from regular functions! Once you use a function, it's a one-and-done deal. Once you return the value from the function. A generator will keep yielding values until its out.

This brings us to another important property of generators. Once we've finished iterating through them, we can't use them anymore. Once we got through all three yield values in mg, it can't provide anything to us anymore. We'd have to store another instance of the multi_generate generator to begin asking next() statements of it again.

In [26]:
# Creating a generator that will generate the data row by row
def beerDataGenerator():
    file = "recipeData.csv"
    for row in open(file, encoding="ISO-8859-1"):
        yield row

<ul>
<li>We've designated <code>dataGenerator</code> as our generator function that will dispense our CSV file row by row. The function includes the name of the file in <code>file</code>, and this enables us to use the <code>open()</code> function to be able to read it.</li>
<li>While we've discussed that Python objects like lists and dictionaries can be iterated over, we can also iterate over files that we <code>open()</code> as well.</li>
<li>The <code>encoding</code> tells Python what kinds of characters it should expect to see; ISO-8859-1 specifically refers to Latin-1.</li>
<li>The <code>for</code> loop will start with the first row in the CSV file, <code>yield</code> that row, and then save its current place in reading the file until the generator function is called again.</li>
</ul>

In [27]:
beer = beerDataGenerator()

In [28]:
next(beer)

'BeerID,Name,URL,Style,StyleID,Size(L),OG,FG,ABV,IBU,Color,BoilSize,BoilTime,BoilGravity,Efficiency,MashThickness,SugarScale,BrewMethod,PitchRate,PrimaryTemp,PrimingMethod,PrimingAmount,UserId\n'

In [29]:
next(beer)

'1,Vanilla Cream Ale,/homebrew/recipe/view/1633/vanilla-cream-ale,Cream Ale,45,21.77,1.055,1.013,5.48,17.65,4.83,28.39,75,1.038,70,N/A,Specific Gravity,All Grain,N/A,17.78,corn sugar,4.5 oz,116\n'

You may be asking, "We can store the data in a list comprehension! Why jump through an extra hoop and use a generator?"

Our data file doesn't qualify as Big Data, but we can still learn a lot by imposing a restriction on ourselves to recreate this conundrum. We'll assume for now that our beer data is so large in size that we are incapable of storing all of the data in a list of lists.`

#### The generator expression

Early, we compared our generator function to a regular function since they have many similar aspects. For generation expressions, we'll use list comprehensions.

In [30]:
lc_example = [n**2 for n in [1, 2, 3, 4, 5]]

genex_example = (n**2 for n in [1, 2, 3, 4, 5])

lc_example is our list comprehension, while genex_example is our generator expression that performs almost the same task. Take note that the only difference between the two is that the generator expression is surrounded by parentheses, rather than brackets.

In [31]:
lc_example , genex_example

([1, 4, 9, 16, 25], <generator object <genexpr> at 0x7f8fac4f0f10>)

In [32]:
genex_example2 = (n**2 for n in [1, 2, 3, 4, 5] if n >= 3)
next(genex_example2)

9

In [33]:
next(genex_example2)

16

In [34]:
beer_data = "recipeData.csv"

# This one line perfoms the same action as beerDataGenerator()!
lines =  (line for line in open(beer_data, encoding="ISO-8859-1"))

In [35]:
next(lines)

'BeerID,Name,URL,Style,StyleID,Size(L),OG,FG,ABV,IBU,Color,BoilSize,BoilTime,BoilGravity,Efficiency,MashThickness,SugarScale,BrewMethod,PitchRate,PrimaryTemp,PrimingMethod,PrimingAmount,UserId\n'

In [36]:
next(lines)

'1,Vanilla Cream Ale,/homebrew/recipe/view/1633/vanilla-cream-ale,Cream Ale,45,21.77,1.055,1.013,5.48,17.65,4.83,28.39,75,1.038,70,N/A,Specific Gravity,All Grain,N/A,17.78,corn sugar,4.5 oz,116\n'

#### Generators feeding generators

We currently haven't learned anything from the beer data. All we've done so far is to take the original CSV file and create a generator that will yield each line in the CSV, one at a time in the form of a string. Unless we'd like to do some crazy string manipulation, we'll need to think of a way to get our data into a readable, useable form. Below is a representation of what our code currently does: a simple read from file and output of a single line from the file.

<p><img src="https://i.imgur.com/WbjKFiC.jpg" alt="Generators 2"></p>

Generators come to the rescue again here! So far in the article, we've been passing in other structures, specifically iterators, to the generators to indicate what sequence we'd like to generate from. However, generators are iterators themselves too — why don't we create another generator that takes the output another generator? Our lines generator outputs the line in its entirety, so we'll make a second generator that does some formatting for us.

In [37]:
beer_data = "recipeData.csv"
lines = (line for line in open(beer_data, encoding="ISO-8859-1"))
lists = (l.split(",") for l in lines)

In [38]:
columns = next(lists)

In [39]:
beerdicts = (dict(zip(columns, data)) for data in lists)


In [40]:
beer_counts = {}
for bd in beerdicts:
    if bd["Style"] not in beer_counts:
        beer_counts[bd["Style"]] = 1
    else:
        beer_counts[bd["Style"]] += 1

most_popular = 0
most_popular_type = None
for beer, count in beer_counts.items():
    if count > most_popular:
        most_popular = count
        most_popular_type = beer

In [41]:
most_popular_type

'American IPA'