#What are generators in Python?

There is a lot of overhead in building an iterator in Python; we have to implement a class with __iter__() and __next__() method, keep track of internal states, raise StopIteration when there was no values to be returned etc.

This is both lengthy and counter intuitive. Generator comes into rescue in such situations.

Python generators are a simple way of creating iterators. All the overhead we mentioned above are automatically handled by generators in Python.

Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).

#How to create a generator in Python?

It is fairly simple to create a generator in Python. It is as easy as defining a normal function with yield statement instead of a return statement.

If a function contains at least one yield statement (it may contain other yield or return statements), it becomes a generator function. Both yield and return will return some value from a function.

The difference is that, while a return statement terminates a function entirely, yield statement pauses the function saving all its states and later continues from there on successive calls.

Differences between Generator function and a Normal function

Here is how a generator function differs from a normal function.

    Generator function contains one or more yield statement.
    When called, it returns an object (iterator) but does not start execution immediately.
    Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using next().
    Once the function yields, the function is paused and the control is transferred to the caller.
    Local variables and their states are remembered between successive calls.
    Finally, when the function terminates, StopIteration is raised automatically on further calls.


In [None]:


def my_gen():
    n = 1
    print('This is printed first')
    # Generator function contains yield statements
    yield n

    n += 1
    print('This is printed second')
    yield n

    n += 1
    print('This is printed at last')
    yield n


StopIteration: 

In [4]:

# If we call function 
my_gen()

StopIteration: 

In [8]:
# It returns the generator object. We should use for loop for the generator.
a=my_gen()

>>> # It returns an object but does not start execution immediately.
>>> a = my_gen()

>>> # We can iterate through the items using next().
>>> next(a)
This is printed first
1
>>> # Once the function yields, the function is paused and the control is transferred to the caller.

>>> # Local variables and theirs states are remembered between successive calls.
>>> next(a)
This is printed second
2

>>> next(a)
This is printed at last
3

>>> # Finally, when the function terminates, StopIteration is raised automatically on further calls.
>>> next(a)
Traceback (most recent call last):
...
StopIteration
>>> next(a)
Traceback (most recent call last):
...
StopIteration

In [9]:
next(a)

This is printed first


1

In [10]:
next(a)

This is printed second


In [11]:
next(a)

This is printed at last


3

In [12]:
#It yields stop iteration as there is not more elements.
next(a)

StopIteration: 

In [13]:
#We use iteration or loop to run generators.
for i in my_gen():
    print(i)

This is printed first
1
This is printed second
2
This is printed at last
3


In [14]:
#normal function
def func_cubes(n):
    result = []
    for i in range(n):
        result.append(i**3)
    return result
func_cubes(2)

[0, 1]

In [16]:
#it uses memory in the form of list.
#We can use iteration for memory utilisation. 
for i in func_cubes(2):
    print(i)

0
1


In [18]:
def func_cubes_generator(n):
    for i in range(n):
        yield i**3
func_cubes_generator(2)

<generator object func_cubes_generator at 0x7f10bc1d26d0>

In [21]:
#it gives the generator obect instead of list.
for a in func_cubes_generator(2):
    print (a)

0
1


In [22]:
# or we can do
a=func_cubes_generator(4)
next(a)

0

In [23]:
next(a)

1

In [24]:
# we can also save it into list.
list(func_cubes_generator(3))

[0, 1, 8]

In [25]:
s='Anuj'
for letter in s:
    print(letter)


A
n
u
j


In [26]:
next(s)

TypeError: 'str' object is not an iterator

In [27]:
s_str=iter(s)

In [28]:
next(s_str)

'A'

In [29]:
next(s_str)

'n'

#Why generators are used in Python?

There are several reasons which make generators an attractive implementation to go for.

1. Easy to Implement

Generators can be implemented in a clear and concise way as compared to their iterator class counterpart. Following is an example to implement a sequence of power of 2's using iterator class.

In [30]:

class PowTwo:
    def __init__(self, max = 0):
        self.max = max

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n > self.max:
            raise StopIteration

        result = 2 ** self.n
        self.n += 1
        return result

In [31]:
#Using Generator 
def PowTwoGen(max = 0):
    n = 0
    while n < max:
        yield 2 ** n
        n += 1
#Since, generators keep track of details automatically, 
# it was concise and much cleaner in implementation.

2. Memory Efficient

A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill if the number of items in the sequence is very large.

Generator implementation of such sequence is memory friendly and is preferred since it only produces one item at a time.

#3. Represent Infinite Stream

Generators are excellent medium to represent an infinite stream of data. Infinite streams cannot be stored in memory and since generators produce only one item at a time, it can represent infinite stream of data.

The following example can generate all the even numbers (at least in theory).

In [32]:
def all_even():
    n = 0
    while True:
        yield n
        n += 2

#4. Pipelining Generators

Generators can be used to pipeline a series of operations. This is best illustrated using an example.

Suppose we have a log file from a famous fast food chain. The log file has a column (4th column) that keeps track of the number of pizza sold every hour and we want to sum it to find the total pizzas sold in 5 years.

Assume everything is in string and numbers that are not available are marked as 'N/A'. A generator implementation of this could be as follows.

In [33]:
with open('sells.log') as file:
    pizza_col = (line[3] for line in file)
    per_hour = (int(x)
    for x in pizza_col if x != 'N/A')
    print("Total pizzas sold = ",sum(per_hour))

FileNotFoundError: [Errno 2] No such file or directory: 'sells.log'

#9.10. Generator Expressions

Some simple generators can be coded succinctly as expressions using a syntax similar to list comprehensions but with parentheses instead of square brackets. These expressions are designed for situations where the generator is used right away by an enclosing function. Generator expressions are more compact but less versatile than full generator definitions and tend to be more memory friendly than equivalent list comprehensions.

Examples:

>>> sum(i*i for i in range(10))                 # sum of squares
285

>>> xvec = [10, 20, 30]
>>> yvec = [7, 5, 3]
>>> sum(x*y for x,y in zip(xvec, yvec))         # dot product
260

>>> from math import pi, sin
>>> sine_table = {x: sin(x*pi/180) for x in range(0, 91)}

>>> unique_words = set(word  for line in page  for word in line.split())

>>> valedictorian = max((student.gpa, student.name) for student in graduates)

>>> data = 'golf'
>>> list(data[i] for i in range(len(data)-1, -1, -1))
['f', 'l', 'o', 'g']

In [36]:
sum(i*i for i in range(10))   

285

In [37]:
#dot product
avec=[3,4,5]
bvec=[4,5,6]
sum(a*b for a,b in zip(avec,bvec))

62

In [39]:

from math import pi, sin
sine_table = {x: sin(x*pi/180) for x in range(0, 91)}


In [40]:
sine_table

{0: 0.0,
 1: 0.01745240643728351,
 2: 0.03489949670250097,
 3: 0.05233595624294383,
 4: 0.0697564737441253,
 5: 0.08715574274765817,
 6: 0.10452846326765346,
 7: 0.12186934340514748,
 8: 0.13917310096006544,
 9: 0.15643446504023087,
 10: 0.17364817766693033,
 11: 0.1908089953765448,
 12: 0.20791169081775931,
 13: 0.224951054343865,
 14: 0.24192189559966773,
 15: 0.25881904510252074,
 16: 0.27563735581699916,
 17: 0.29237170472273677,
 18: 0.3090169943749474,
 19: 0.32556815445715664,
 20: 0.3420201433256687,
 21: 0.35836794954530027,
 22: 0.374606593415912,
 23: 0.3907311284892737,
 24: 0.40673664307580015,
 25: 0.42261826174069944,
 26: 0.4383711467890774,
 27: 0.45399049973954675,
 28: 0.4694715627858908,
 29: 0.48480962024633706,
 30: 0.49999999999999994,
 31: 0.5150380749100542,
 32: 0.5299192642332049,
 33: 0.5446390350150271,
 34: 0.5591929034707469,
 35: 0.573576436351046,
 36: 0.5877852522924731,
 37: 0.6018150231520483,
 38: 0.6156614753256582,
 39: 0.6293203910498374,
 40: 0.

In [42]:
unique_words = set(word  for line in page  for word in line.split())

NameError: name 'page' is not defined

In [43]:
valedictorian = max((student.gpa, student.name) for student in graduates)

NameError: name 'graduates' is not defined

In [45]:
data = 'golf'
list(data[i] for i in range(len(data)-1, -1, -1))


['f', 'l', 'o', 'g']