<a href="https://colab.research.google.com/github/bing020815/Python-Basic/blob/master/Basic/Generators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

`Generator` has some advantages over list. It is better with performance.

Here is a typical python function return a list:

In [0]:
def square_numbers(nums):
    result = []
    for i in nums:
        result.append(i*i)
    return result

In [2]:
my_nums = square_numbers([1,2,3,4,5])
print(my_nums)

[1, 4, 9, 16, 25]


Now, change the python function to a generator by using `yield` instead of `return`

In [0]:
def square_numbers(nums):
    for i in nums:
        yield (i*i)

In [4]:
my_nums = square_numbers([1,2,3,4,5])
print(my_nums)

<generator object square_numbers at 0x7fdac9c54fc0>


The `my_nums` variable has become a generator object.

Generators do not hold entire result in memory. It yields one result at a time. It is waiting for the `next` execution for yielding the next value.

In [5]:
print(next(my_nums)) # yield 1
print(next(my_nums)) # yield 4
print(next(my_nums)) # yield 9
print(next(my_nums)) # yield 16
print(next(my_nums)) # yield 25

1
4
9
16
25


The generator has yielded all all the value and the next one would be out of value.

In [6]:
try:
    print(next(my_nums)) # out of value
except:
    print("StopIteration: out of value")

StopIteration: out of value


Instead of get the values one at a time, it can be printed with a for loop.

In [7]:
# reset the generator since it has been exploited in the previus steps
my_nums = square_numbers([1,2,3,4,5])

for num in my_nums:
    print(num)

1
4
9
16
25


The `square_numbers(nums)` can also be written as a list comprehension.

In [8]:
my_nums = [i*i for i in [1,2,3,4,5]]
print(f'This is a list: {my_nums}')

for num in my_nums:
    print(num)

This is a list: [1, 4, 9, 16, 25]
1
4
9
16
25


A generator can be produced in the same way with `()` instead of using `[]`

In [9]:
my_nums = (i*i for i in [1,2,3,4,5])
print(f'This is a generator object: {my_nums}')

for num in my_nums:
    print(num)

This is a generator object: <generator object <genexpr> at 0x7fdac9c10e60>
1
4
9
16
25


Because all the values in the generator are not all hold in the memory, to print all the value at one in the generator, a `list()` can be used to convert a `generator` to a `list`.

However, by converting a `generator` to a `list`, it loses the benefit in terms of performance.

Generator is better with performance, not holding all the values in the memory when dealing with a huge dataset.

In [10]:
pip install memory_profiler #install memory_profiler extension in Colab



In [0]:
%load_ext memory_profiler 
     # load the extension
import random
import time

names = ['John', 'Bing', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']

In [0]:
def people_list(num_people):
    result = []
    for i in range(num_people):
        person = {
            'id':i,
            'name': random.choice(names),
            'major': random.choice(majors)
        }
        result.append(person)
    return result

def people_generator(num_people):
    for i in xrange(num_people): # xrange only stores the range params and generates the numbers on demand
        person = {
            'id':i,
            'name': random.choice(names),
            'major': random.choice(majors)
        }
        yield person

In [13]:
%%memit
t1 = time.clock()   # time the start point
people = people_list(10000000)
t2 = time.clock()   # time the end point
print(f'Took {t2-t1} Seconds')

Took 16.511546 Seconds
peak memory: 2972.76 MiB, increment: 2820.78 MiB


In [14]:
%%memit
t1 = time.clock()
people = people_generator(10000000)
t2 = time.clock()
print(f'Took {t2-t1} Seconds')

Took 2.7201030000000017 Seconds
peak memory: 2972.83 MiB, increment: 0.00 MiB


By comparing two functions, it is obvious to see that the `generator` gives a big performance boost not only in execution time but memory as well.

In the future, in order to increase the performance, try to use `comprhensions in generator expression` instead of using `list comprehension`.