# Agenda

1. Review the iterator protocol
2. Generator functions
    - How to define them
    - How they're different from regular functions
    - Keeping state across invocations
    - How do they work?
3. Generator expressions (aka generator comprehensions)
    - How to define them
    - How to use them

In [1]:
# Lots of objects in Python are iterable

for one_item in 'abcde':
    print(one_item)

a
b
c
d
e


In [2]:
for one_item in [10, 20, 30, 40, 50]:
    print(one_item)

10
20
30
40
50


In [3]:
d = {'a':1, 'b':2, 'c':3}

for one_item in d:
    print(one_item)

a
b
c


In [5]:
# d.items returns an object of type "dict_items"
# it is iterable, also!
# it returns a (key, value) tuple with each iteration

for key, value in d.items():
    print(f'{key}: {value}')

a: 1
b: 2
c: 3


In [6]:
# let's ask d if it is iterable!

i = iter(d)   # normally, don't use "iter" in your programs

In [7]:
i

<dict_keyiterator at 0x10955c6d0>

In [8]:
next(i)

'a'

In [9]:
next(i)

'b'

In [10]:
next(i)

'c'

In [11]:
next(i)

StopIteration: 

In [12]:
def myfunc():
    return 1
    return 2
    return 3


In [13]:
myfunc()

1

In [14]:
import dis  # disassemble our Python code

dis.dis(myfunc)

  2           0 LOAD_CONST               1 (1)
              2 RETURN_VALUE


In [17]:
# here, I define a generator function!
# Python knows it's a generator function because it uses "yield"
# the result of invocing a generator function is a generator object
# generators are iterable -- they know how to behave inside of a "for" loop

def myfunc():
    yield 1
    yield 2
    yield 3

In [18]:
myfunc()

<generator object myfunc at 0x1095f09e0>

In [19]:
myfunc()

<generator object myfunc at 0x1095f0ba0>

In [20]:
myfunc()

<generator object myfunc at 0x1095f0cf0>

In [21]:
g = myfunc()

next(g)  # if g, our generator, is iterable, then it knows how to respond to "next"

1

In [22]:
next(g)

2

In [23]:
next(g)

3

In [24]:
next(g)

StopIteration: 

# What's happening?

Running `next` on a generator object executes the generator's function body through the next `yield`.  You get the value back, and then the generator function goes to sleep just after the `yield`, waking up when you next call `next` on it.

In [25]:
def myfunc():
    print('At start')
    yield 1
    print('In the middle')
    yield 2
    print('Almost done!')
    yield 3
    print('Now I am really done')

In [26]:
g = myfunc()

In [27]:
next(g)

At start


1

In [28]:
next(g)

In the middle


2

In [30]:
next(g)

Almost done!


3

In [31]:
next(g)

Now I am really done


StopIteration: 

In [32]:
def double_numbers(numbers):
    for one_number in numbers:
        yield one_number * 2

In [33]:
double_numbers([10, 20, 30])

<generator object double_numbers at 0x1095ff900>

In [34]:
for one_item in double_numbers([10, 20, 30]):
    print(one_item)

20
40
60


In [37]:
list(double_numbers([10, 20, 30]))

[20, 40, 60]

In [38]:
g

<generator object myfunc at 0x1095ff2e0>

In [39]:
type(g)

generator

In [40]:
# can I create a new instance of generator? ... turns out, I can't.
type(g)()

TypeError: cannot create 'generator' instances

# Exercise: Only evens

Write a generator function that takes a list (or any other iterable) of integers as an argument. It should return, with each iteration, the next *EVEN* number in that list of integers.  When we get to the end of the input list, then the generator ends.

In [42]:
def only_evens(numbers):
    for one_number in numbers:
        if one_number % 2 == 0:
            yield one_number
        
        
for one_item in only_evens(range(5, 13)):
    print(one_item)

6
8
10
12


In [46]:
from typing import Iterable, Generator

def only_evens(numbers:Iterable[int]) -> Generator[int, None, None]:
    for one_number in numbers:
        if one_number % 2 == 0:
            yield one_number
        
        
for one_item in only_evens(range(5, 13)):
    print(one_item)

6
8
10
12


In [47]:
def fib():
    first = 0
    second = 1
    
    while True:
        yield first  
        first, second = second, first+second

In [49]:
for one_item in fib():
    if one_item > 100_000_000_000:
        break
        
    print(one_item, end=' ')

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 4807526976 7778742049 12586269025 20365011074 32951280099 53316291173 86267571272 

# Exercise: read_n

Define a generator function, `read_n`, which takes two arguments:
    - `filename` (a string, with a filename)
    - `n` (an integer)
    
Normally, when we iterate over a file, we get one line at a time. `read_n` should return `n` lines at a time, as a single string.

If you get to the end of the file and there aren't enough lines to complete `n`, then just return (or should I say `yield`?) what you have.

Hint: The `readline` method for files always return a string with the next line. If you're at the end of the file already, it returns an empty string.

In [57]:
def read_n(filename, n):
    f = open(filename)
    
    while True:
        output = []
        for i in range(n):
            output.append(f.readline())
            
        s = ''.join(output)
            
        if s:  # if we have a non-empty string s, then yield it
            yield s

        else:  # if s is empty, then we don't have any more lines to return
            break

for one_chunk in read_n('/etc/passwd', 9):
    print(one_chunk)       

##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.

##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false

_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:

In [58]:
g = read_n('/etc/passwd', 9)

In [59]:
next(g)

'##\n# User Database\n# \n# Note that this file is consulted directly only when the system is running\n# in single-user mode.  At other times this information is provided by\n# Open Directory.\n#\n# See the opendirectoryd(8) man page for additional information about\n# Open Directory.\n'

In [60]:
g

<generator object read_n at 0x1095f0f20>

In [61]:
dir(g)

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_yieldfrom',
 'send',
 'throw']

In [62]:
g.gi_code

<code object read_n at 0x1095efbe0, file "<ipython-input-57-b39c6aa64f9d>", line 1>

In [63]:
g.gi_code.co_varnames

('filename', 'n', 'f', 'output', 'i', 's')

In [64]:
g.gi_code.co_code

b't\x00|\x00\x83\x01}\x02g\x00}\x03t\x01|\x01\x83\x01D\x00]\x12}\x04|\x03\xa0\x02|\x02\xa0\x03\xa1\x00\xa1\x01\x01\x00q\x14d\x01\xa0\x04|\x03\xa1\x01}\x05|\x05rB|\x05V\x00\x01\x00q\x08qBq\x08d\x00S\x00'

In [65]:
dis.dis(g.gi_code.co_code)

          0 LOAD_GLOBAL              0 (0)
          2 LOAD_FAST                0 (0)
          4 CALL_FUNCTION            1
          6 STORE_FAST               2 (2)
    >>    8 BUILD_LIST               0
         10 STORE_FAST               3 (3)
         12 LOAD_GLOBAL              1 (1)
         14 LOAD_FAST                1 (1)
         16 CALL_FUNCTION            1
         18 GET_ITER
    >>   20 FOR_ITER                18 (to 40)
         22 STORE_FAST               4 (4)
         24 LOAD_FAST                3 (3)
         26 LOAD_METHOD              2 (2)
         28 LOAD_FAST                2 (2)
         30 LOAD_METHOD              3 (3)
         32 CALL_METHOD              0
         34 CALL_METHOD              1
         36 POP_TOP
         38 JUMP_ABSOLUTE           20
    >>   40 LOAD_CONST               1 (1)
         42 LOAD_METHOD              4 (4)
         44 LOAD_FAST                3 (3)
         46 CALL_METHOD              1
         48 STORE_FAST               

In [66]:
dir(g.gi_frame)

['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'f_back',
 'f_builtins',
 'f_code',
 'f_globals',
 'f_lasti',
 'f_lineno',
 'f_locals',
 'f_trace',
 'f_trace_lines',
 'f_trace_opcodes']

In [67]:
g.gi_frame.f_locals

{'filename': '/etc/passwd',
 'n': 9,
 'f': <_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>,
 'output': ['##\n',
  '# User Database\n',
  '# \n',
  '# Note that this file is consulted directly only when the system is running\n',
  '# in single-user mode.  At other times this information is provided by\n',
  '# Open Directory.\n',
  '#\n',
  '# See the opendirectoryd(8) man page for additional information about\n',
  '# Open Directory.\n'],
 'i': 8,
 's': '##\n# User Database\n# \n# Note that this file is consulted directly only when the system is running\n# in single-user mode.  At other times this information is provided by\n# Open Directory.\n#\n# See the opendirectoryd(8) man page for additional information about\n# Open Directory.\n'}

In [69]:
g.gi_frame.f_lineno

12

In [70]:
f1 = open('mydata.txt', 'w')
f2 = open('mydata.txt', 'w')

In [71]:
f1.write('aaaaaa\n')
f2.write('bbbbb\n')

6

In [72]:
f1.close()
f2.close()

In [73]:
!cat mydata.txt

bbbbb

