# Practicing Python Generators from [Python generators and being lazy](http://naiquevin.github.io/python-generators-and-being-lazy.html)

# *Note: The last example is this walkthrough and no longer works*

## A simple Example

In [6]:
def gen():
    for i in range(1, 6):
        yield i

print gen()

<generator object gen at 0x1103a4b90>


In [5]:
g = gen()
type(g)
# The above line returns a generator object although there is no return in the gen() function
# A function with yield with magically return a generator object

generator

The calls to the function will not execute any code inside it yet.
For that we need to call the generator object's `next` method

In [6]:
g.next()

1

In [7]:
print 'hello, taking break from generator'

hello, taking break from generator


In [8]:
g.next()

2

In [9]:
g.next()

3

- On first call - yield statement is executed once and a value is returned
- At the same time, the control is returned back to the calling code
- On next call to the `next` method, the control goes back to the function and it can resume the execution from where it left with full access to the local vars

## [Interator Protocol](http://docs.python.org/2/library/stdtypes.html?highlight=iterator#iterator-types) and Generator expressions

### Iterator Protocol basically means:
- It implements `next` and `__iter__` methods
- Raises `StopIteration` exception when no more values can be yielded
- Hence we can use for loop to generate values from a generator instead of calling the next method manually.

- `for` loop will implicitly handle the `StopIteration` and when that happens, will end the loop


In [10]:
for i in g:
    print i

4
5


### Generator Expressions (just like list has list comprehensions)
- The syntax is similar, only change: round brackets `()` instead of square brackets `[]`
- And that this will give us an iterator (a generator object) instead of an interable (a list in memory)

In [11]:
squares = [i*i for i in range(1, 11)] # list

In [12]:
type(squares)

list

In [13]:
gen_squares = (i*i for i in range(1, 11)) # generator object

In [14]:
type(gen_squares)

generator

In [15]:
iter(gen_squares) is gen_squares

True

## Why generators?

Key difference
- Generator gives out new values on the fly
- Doesn't keep the elements in memory

A function to give us an incremental values infinitely

In [17]:
def infinitely_incr(start=0):
    n = start
    while True:
        n += 1
        yield n

In [18]:
infi_incr = infinitely_incr()

In [19]:
infi_incr.next()

1

In [20]:
infi_incr.next()

2

In [21]:
infi_incr.next()

3

In [22]:
infi_incr.next()

4

In [23]:
infi_incr.next()

5

- We can call `infi_incr.next()` as many times as we want to get an incremented number each time without having a list in memory.

Another example: What if we have huge data in some file and need to process each of it's lines by calling one or many functions on them...

In [24]:
# Creating a temp file to use in gen1() function below
import tempfile
# import commands
# import os

# commandname = "cat"

temp_file_obj = tempfile.NamedTemporaryFile(delete=False)
temp_file_obj.write("1")
temp_file_obj.write("2")
temp_file_obj.write("3")
temp_file_obj.write("4")
temp_file_obj.write("5")

temp_file_obj.close() # file is not immediately deleted because we
          # used delete=False
# res = commands.getoutput("%s %s" % (commandname,temp_file_obj.name))
# print res
# os.unlink(temp_file_obj.name)

def gen1():
    with open(temp_file_obj.name) as f:
        for line in f:
            yield line

# g = gen1()
# g2 = (process(x) for x in g)

# for x in g2:
#     print x




- In Python, a file object can be iterated over to obtain one line at a time.
- In the above eg, since the `process` function is called inside a generator expression, it will not be executed until the for loops starts consuming the generator.
- This is when the `process` function will execute for each value.
- This way the cost of loading huge file into memory is avoided
- Though, this also means that the file cannot be closed until all lines are processed

Also:
- Not keeping elements in memory implies that a generator object can be looped through or consumed only once.
- Hence obviously not a good choice if the sequence of items need to be reused. In this case a normal list would be suitable

In [7]:
g = gen()
squares = (i*i for i in g)
list(squares)

[1, 4, 9, 16, 25]

In [26]:
cubes = (i*i*i for i in g)
list(cubes)

[]

- But if you have a series of functions that need to be exectuted one after the another on each line of a file, then the laziness of generator expressions can be tremendously useful

## Understanding the 'lazy' using a ~~concrete~~ contrived example

- Imagine our hugedata.txt is actually a tiny file of just 5 lines containing the first 5 positive integers
- Here's an example that uses list comprehensions and hence will build and keep lists in memory

In [25]:
def digit_sum(x):
    print 'Digit sum of {} ->'.format(x),
    return sum(map(int, str(x)))

def square(x):
    print 'Square of {} ->'.format(x),
    return int(x)*int(x)

numbers = gen()
squares = [square(n) for n in numbers]
dsums = [digit_sum(n) for n in squares]

for n in dsums:
    print n

Square of 1 -> Square of 2 -> Square of 3 -> Square of 4 -> Square of 5 -> Digit sum of 1 -> Digit sum of 4 -> Digit sum of 9 -> Digit sum of 16 -> Digit sum of 25 -> 1
4
9
7
7


First all squares were calculated, then their digit sums and then the results will be printed one by one

Now with generator expressions, lets see what we get...

See example below section --Now with generator expressions just see what we get-- on the [link](http://naiquevin.github.io/python-generators-and-being-lazy.html): 

- Its called lazy because the numbera are getting consumed late, at the time of iteration.
- The implicit call to `next` by the `for` loop asks for `digit_sum` of `1` from `dsum` which asks for `suqare` of `1` from `squares` which asks for `1` from `numbers`.
- This continues till `numbers` can yield a value.
- Finally, nothing is evaluated until it is asked for

### Common Traps and things to watch out for

- Rule #0: Use generators wisely. Don't use a generator expression only because the syntax is slightly different from list comprehensions.

    If sequence needs to be reused, then simply use a list. Keeping stuff in memory is not bad after all (we do that all the time while caching values! Don't we?)
    
    
- Another important thing to watch out for is the scope of variables that are going to be used by functions when they execute in lazy manner. Here's an example...
    
    Suppose we have a generator that yields alphabets and we need to add two suffixes to each alphabet, for eg, we have alphabet `a`. First it's suffixed with `x` which makes it `ax` and then with `y` which makes it `axy`.
    
    We need to do this with multiple alphabets and we choose to use a generator object to yield each alphabet

In [34]:
def add_suffix(s, suffix):
    return '{}{}'.format(s, suffix)

def gen():
    for i in ['a', 'b', 'c', 'd']:
        yield i

ns = gen()
suffixes = ['x', 'y']

for s in suffixes:
    ns = (add_suffix(i, s) for i in ns)

print list(ns)

['ayy', 'byy', 'cyy', 'dyy']


- Lets try to understand what happened here:
    
    A generator can remember the local variables when it gets back the control on the call of `next` method. The local scope here is actually that of the `for` loop. By the time the generator is consumed upon call to `list(ns)`, the value of `s` in local scope is `y`. The value `x` (which was the value of `s` in first iteration of `for` loop) in the previous iteration of suffixes is simply lost.

- Lets fix this, we just define another function wrapping over the call to `add_suffix` function that will return a generator object

In [45]:
def gen1(s, sfx=['a', 'b', 'c', 'd']):
    for x in s:
        yield add_suffix(x, sfx)

for s in suffixes:
    ns = gen1(ns, s)

In [46]:
list(ns)

[]

*Note: The last example is broken and no longer works*