# **Python Generators**

Source URL: https://realpython.com/introduction-to-python-generators/

## **Examples of use**

### **Example 1: Reading Large Files**

In [4]:
def csv_reader(file_name):
    file = open(file_name)
    result = file.read().split("\n")
    return result

csv_gen = csv_reader('techcrunch.csv')
row_count = 0

for row in csv_gen:
    row_count += 1

print(f'Row count is {row_count}')

Row count is 1462


Although it worked, for some very large files there might be a `MemoryError` exception raised.

In [1]:
def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

csv_gen = csv_reader('techcrunch.csv')
row_count = 0

for row in csv_gen:
    row_count += 1

print(f'Row count is {row_count}')

Row count is 1461


In this case, the function `csv_reader()` was turned into a ***generator function***. This version opens a file, loops through each line, and yields each row, instead of returning it.

You can also define a ***generator expression*** (also called a 'generator comprehension'), which has a very similar syntax to list comprehensions. In this way, you can use the generator without calling a function:

In [None]:
file_name = 'techcrunch.csv'
csv_gen = (row for row in open(file_name))

### **Example 2: Generating an Infinite Sequence**

In [9]:
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

gen = infinite_sequence()

print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))

0
1
2
3


This code block is short and sweet. First, you initialize the variable num and start an infinite loop. Then, you immediately yield num so that you can capture the initial state. This mimics the action of range().

### **Example 3: Detecting Palindromes**

In [10]:
def is_palindrome(num):
    # Skip single-digit inputs
    if num // 10 == 0:
        return False
    temp = num
    reversed_num = 0

    while temp != 0:
        reversed_num = (reversed_num * 10) + (temp % 10)
        temp = temp // 10
    
    if num == reversed_num:
        return num
    else:
        return False

In [None]:
# Using the infinite sequence generator to detect palindromes
for i in infinite_sequence():
    pal = is_palindrome(i)
    if pal:
        print(i)

## **Understanding Generators**

Generator functions look and act just like regular functions, but with one defining characteristic. Generator functions use the Python `yield` keyword instead of `return`.

Recall the generator function `infinite_sequence()`. It looks like a typical function definition, except for the Python `yield` statement and the code that follows it. `yield` indicates where a value is sent back to the caller, but unlike `return`, you don’t exit the function afterward.

Instead, the **state** of the function is remembered. That way, when `next()` is called on a generator object (either explicitly or implicitly within a `for` loop), the previously yielded variable `num` is incremented, and then yielded again. Since generator functions look like other functions and act very similarly to them, you can assume that generator expressions are very similar to other comprehensions available in Python.

### **Building Generators With Generator Expressions**

Like list comprehensions, generator expressions allow you to quickly create a generator object in just a few lines of code. They’re also useful in the same cases where list comprehensions are used, with an added benefit: you can create them without building and holding the entire object in memory before iteration. In other words, you’ll have no memory penalty when you use generator expressions.

In [14]:
# Squared nums list comprehension
nums_squared_lc = [num ** 2 for num in range(5)]
print(nums_squared_lc)

# Squared nums generator comprehension
nums_squared_gc = (num ** 2 for num in range(5))
print(nums_squared_gc)

[0, 1, 4, 9, 16]
<generator object <genexpr> at 0x7ff9bc62dd90>


### **Profiling Generator Performance**

You learned earlier that generators are a great way to optimize memory. While an infinite sequence generator is an extreme example of this optimization, let’s amp up the number squaring examples you just saw and inspect the size of the resulting objects. You can do this with a call to `sys.getsizeof()`:

In [15]:
import sys
nums_squared_lc = [i ** 2 for i in range(10000)]
print(sys.getsizeof(nums_squared_lc))
nums_squared_gc = (i ** 2 for i in range(10000))
print(sys.getsizeof(nums_squared_gc))

85176
104


This means that the list is over 800 times larger than the generator object!

There is one thing to keep in mind, though. If the list is smaller than the running machine’s available memory, then list comprehensions can be faster to evaluate than the equivalent generator expression.

In [2]:
import cProfile
cProfile.run('sum([i * 2 for i in range(10000)])')

         5 function calls in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 <string>:1(<listcomp>)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




In [3]:
cProfile.run('sum((i * 2 for i in range(10000)))')

         10005 function calls in 0.003 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10001    0.002    0.000    0.002    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    0.003    0.003 <string>:1(<module>)
        1    0.000    0.000    0.003    0.003 {built-in method builtins.exec}
        1    0.001    0.001    0.003    0.003 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




As observed, summing across all values in the list comprehension took about a third of the time as summing across the generator. ***If speed is an issue and memory isn't, then a list comprehension is likely a better tool for the job***.

## **Understanding the Python Yield Statement**

In [4]:
def multi_yield():
    yield_str = 'This will print the first string'
    yield yield_str
    yield_str = 'This will print the second string'
    yield yield_str

multi_obj = multi_yield()
print(next(multi_obj)) # This will print the first string
print(next(multi_obj)) # This will print the second string
print(next(multi_obj)) # StopIteration exeption

This will print the first string
This will print the second string


StopIteration: 

You can see that execution has blown up with a traceback. This is because generators, like all iterators, can be exhausted. Unless your generator is infinite, you can iterate through it one time only. Once all values have been evaluated, iteration will stop and the for loop will exit. If you used `next()`, then instead you’ll get an explicit `StopIteration` exception.

`StopIteration` is a natural exception that's raised to signal the end of an iterator. `for` loops, for example, are built around `StopIteration`. You can even implement your own for loop by using a `while` loop:

In [5]:
letters = ['a', 'b', 'c', 'y']
it = iter(letters)
while True:
    try:
        letter = next(it)
    except:
        break
    print(letter)

a
b
c
y


## **Using Advanced Generator Methods**

In addition to `yield`, generator objects can make use of the methods `.send()`, `.throw()` and `.close()`

### **How to Use `.send()`**

For this next section, you’re going to build a program that makes use of all three methods. This program will print numeric palindromes like before, but with a few tweaks. Upon encountering a palindrome, your new program will add a digit and start a search for the next one from there. You’ll also handle exceptions with `.throw()` and stop the generator after a given amount of digits with `.close()`. First, let’s recall the code for your palindrome detector:

In [6]:
def is_palindrome(num):
    # Skip single-digit inputs
    if num // 10 == 0:
        return False
    temp = num
    reversed_num = 0

    while temp != 0:
        reversed_num = (reversed_num * 10) + (temp % 10)
        temp = temp // 10

    if num == reversed_num:
        return True
    else:
        return False

def infinite_palindromes():
    num = 0
    while True:
        if is_palindrome(num):
            i = (yield num)
            if i is not None:
                num = i
        num += 1

pal_gen = infinite_palindromes()
for i in pal_gen:
    digits = len(str(i))
    pal_gen.send(10 ** (digits))

KeyboardInterrupt: 

### **How to Use `.throw()`**

In [7]:
pal_gen = infinite_palindromes()
for i in pal_gen:
    print(i)
    digits = len(str(i))
    if digits == 5:
        pal_gen.throw(ValueError("We don't like large palindromes"))
    pal_gen.send(10 ** (digits))

11
111
1111
10101


ValueError: We don't like large palindromes

### **How to Use `.close()`**

In [8]:
pal_gen = infinite_palindromes()
for i in pal_gen:
    print(i)
    digits = len(str(i))
    if digits == 5:
        pal_gen.close()
    pal_gen.send(10 ** (digits))

11
111
1111
10101


StopIteration: 

## **Creating Data Pipelines With Generators**

It’s time to do some processing in Python! To demonstrate how to build pipelines with generators, you’re going to analyze this file to get the total and average of all series A rounds in the dataset.

Let’s think of a strategy:

1. Read every line of the file.
2. Split each line into a list of values.
3. Extract the column names.
4. Use the column names and lists to create a dictionary.
5. Filter out the rounds you aren’t interested in.
6. Calculate the total and average values for the rounds you are interested in.

Normally, you can do this with a package like pandas, but you can also achieve this functionality with just a few generators.

In [9]:
file_name = 'techcrunch.csv'
lines = (line for line in open(file_name))

In [10]:
list_line = (s.rstrip().split(",") for s in lines)

In [11]:
cols = next(list_line)

In [12]:
company_dicts = (dict(zip(cols, data)) for data in list_line)

In [13]:
funding = (
    int(company_dict["raisedAmt"])
    for company_dict in company_dicts
    if company_dict["round"] == "a"
)

In [14]:
total_series_a = sum(funding)

In [15]:
print(f'Total series A fundraising: ${total_series_a}')

Total series A fundraising: $4376015000
