# Generators & Comprehensions - Part 3

This will cover more advanced generators and comprehensions (and other iterator) topics, but it starts with some review (and tips).

In [1]:
# Just mapping (no filtering).
[x**2 for x in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [2]:
# Mapping and filtering.
[x**2 for x in range(10) if x % 3 != 0]

[1, 4, 16, 25, 49, 64]

In [3]:
# Filtering. (Technically also mapping.)
[x for x in range(10) if x % 3 != 0]

[1, 2, 4, 5, 7, 8]

In [4]:
s = ['hello', 'my', 'name', 'is', 'David', 'Vassallo']

In [5]:
# A list of the all-lowercase words in s.
# Only filters.
[word for word in s if word.islower()]

['hello', 'my', 'name', 'is']

In [6]:
# A list of the lengths of all the words in s.
# Only maps.
[len(word) for word in s]

[5, 2, 4, 2, 5, 8]

In [7]:
# A list of the lengths of the all-lowercase words in s
# Maps and filters. 
[len(word) for word in s if word.islower()]

[5, 2, 4, 2]

In [8]:
# A list of the lengths of the all-lowercase words in s, using map and filter.
# This does not use any comprehensions of any kind.
list(map(len, filter(str.islower, s)))

[5, 2, 4, 2]

In [9]:
# Use LBYL to show that map and filter are iterator types.
# Then likewise show, with instances of them, that they are iterator objects.
from collections.abc import Iterator, Mapping

In [10]:
issubclass(map, Iterator)

True

In [11]:
issubclass(filter, Iterator)

True

In [12]:
isinstance(map(len, s), Iterator)

True

In [13]:
isinstance(filter(str.islower, s), Iterator)

True

In [14]:
# The builtin map type should not be confused with mappings (i.e., dict-like objects).
# Show with LBYL that map is not a mapping type.
issubclass(map, Mapping)

False

**Tip 1:** Iterators are iterable. That is, you can call `next` on iterators, but you can also call `iter` on an iterator, which gives you back an equivalent iterator, almost always the very same iterator object.

**Tip 2:** Do not confuse callables with iterators. A callable gives a value when called; an iterator gives a value when passed to `next`.

**Tip 3:** Strongly prefer high-level constructs like `for` over explicitly calling `iter` or `next`.

It is fairly rare to call `next` outside of code that performs general iterator operations such as those performed by the code in `itertools`.

You shouldn't be afraid to use the `iter` and `next` builtin (in any context). But every time you use them, you should be able to clearly articulate why you need or want to use them instead of higher level constructs like `for`.

**Tip 4:** Anytime you write something like:

```python
(x for x in some_expression)
```

Or:

```python
((x, y) for x, y in some_expression)
```

Make sure you understand why you are writing that instead of just:

```python
some_expression
```

Only in rare cases should you write comprehensions of that form.

**Tip 5:** Remember the `sum` builtin.

**Tip 6:** Materialize an iterable when *both*:

1. it might be consumed by iteration (in almost all cases, this is when it's an iterator) *and*

2. you need, or may need, to iterate through its values multiple times.

You should not usually perform materialization unless both of these conditions hold.

Note also that `a = list(b)` and `a = tuple(b)` materialize `b`, but `a = b` does not, because assignment just copies a reference. There is no situation in which running `a = b` and then iterating through `a` behaves any differently from just iterating through `b`.



In [15]:
import inspect
import itertools

In [16]:
print(inspect.getdoc(itertools))

Functional tools for creating and using iterators.

Infinite iterators:
count(start=0, step=1) --> start, start+step, start+2*step, ...
cycle(p) --> p0, p1, ... plast, p0, p1, ...
repeat(elem [,n]) --> elem, elem, elem, ... endlessly or up to n times

Iterators terminating on the shortest input sequence:
accumulate(p[, func]) --> p0, p0+p1, p0+p1+p2
chain(p, q, ...) --> p0, p1, ... plast, q0, q1, ...
chain.from_iterable([p, q, ...]) --> p0, p1, ... plast, q0, q1, ...
compress(data, selectors) --> (d[0] if s[0]), (d[1] if s[1]), ...
dropwhile(pred, seq) --> seq[n], seq[n+1], starting when pred fails
groupby(iterable[, keyfunc]) --> sub-iterators grouped by value of keyfunc(v)
filterfalse(pred, seq) --> elements of seq where pred(elem) is False
islice(seq, [start,] stop [, step]) --> elements from
       seq[start:stop:step]
pairwise(s) --> (s[0],s[1]), (s[1],s[2]), (s[2], s[3]), ...
starmap(fun, seq) --> fun(*seq[0]), fun(*seq[1]), ...
tee(it, n=2) --> (it1, it2 , ... itn) splits one it

In [17]:
help(inspect.getgeneratorstate)

Help on function getgeneratorstate in module inspect:

getgeneratorstate(generator)
    Get current state of a generator-iterator.
    
    Possible states are:
      GEN_CREATED: Waiting to start execution.
      GEN_RUNNING: Currently being executed by the interpreter.
      GEN_SUSPENDED: Currently suspended at a yield expression.
      GEN_CLOSED: Execution has completed.



## Generator states

In [18]:
def gen1(): 
    yield 'a'
    yield 'b'
    yield 'c'

In [19]:
g = gen1()

In [20]:
inspect.getgeneratorstate(g)

'GEN_CREATED'

In [21]:
next(g)

'a'

In [22]:
inspect.getgeneratorstate(g)

'GEN_SUSPENDED'

In [23]:
next(g)

'b'

In [24]:
inspect.getgeneratorstate(g)

'GEN_SUSPENDED'

In [25]:
next(g)

'c'

In [26]:
inspect.getgeneratorstate(g)

'GEN_SUSPENDED'

In [27]:
next(g)

StopIteration: 

In [28]:
inspect.getgeneratorstate(g)

'GEN_CLOSED'

In [29]:
c = gen1()

In [30]:
inspect.getgeneratorstate(c)

'GEN_CREATED'

In [31]:
c.close()

In [32]:
inspect.getgeneratorstate(c)

'GEN_CLOSED'

In [33]:
next(c)

StopIteration: 

In [34]:
squares = (x**2 for x in itertools.count(start=1))

In [35]:
inspect.getgeneratorstate(squares)

'GEN_CREATED'

In [36]:
next(squares)

1

In [37]:
inspect.getgeneratorstate(squares)

'GEN_SUSPENDED'

In [38]:
squares.close()

In [39]:
inspect.getgeneratorstate(squares)

'GEN_CLOSED'

In [40]:
next(squares)

StopIteration: 

## `finally` blocks in generators

In [41]:
def gen(): 
    try: 
        yield 1
        yield 2
        yield 3
    finally: 
        print('Done.')

In [42]:
[x for x in gen()] # don't do this

Done.


[1, 2, 3]

In [43]:
list(gen()) # do this instead

Done.


[1, 2, 3]

In [44]:
def f(): 
    gen()

In [45]:
f()

In [46]:
def f2(): 
    g = gen()
    return next(g)

In [47]:
f2()

Done.


1

## `GeneratorExit` is raised when destroying suspended generators

In [48]:
print(GeneratorExit.__doc__)

Request that a generator exit.


In [49]:
GeneratorExit.__bases__

(BaseException,)

In [50]:
ValueError.__bases__

(Exception,)

In [51]:
TypeError.__bases__

(Exception,)

In [52]:
StopIteration.__bases__

(Exception,)

In [53]:
SystemExit.__bases__

(BaseException,)

In [54]:
Exception.__bases__

(BaseException,)

In [55]:
def gen2(): 
    try: 
        yield 1
        yield 2
        yield 3
    except GeneratorExit: 
        print('Interrupted.')
    finally: 
        print('Done.')

In [56]:
list(gen2())

Done.


[1, 2, 3]

In [57]:
def f3(): 
    g = gen2()
    return next(g)

In [58]:
f3()

Interrupted.
Done.


1

## It is an error to yield after `GeneratorExit`

In [59]:
def gen_bad(): 
    try: 
        yield 1
        yield 2
        yield 3
    except GeneratorExit: 
        print('Interrupted.')
        yield 4
    finally: 
        print('Done.')

In [60]:
def f4(): 
    g = gen_bad()
    return next(g)

In [61]:
f4()

Exception ignored in: <generator object gen_bad at 0x0000012CD68A9540>
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Temp\ipykernel_9596\1205670228.py", line 1, in <cell line: 1>
RuntimeError: generator ignored GeneratorExit


Interrupted.


1

In [62]:
# s = [1]
# s.append(s)

## Generator objects with reference cycles

In [63]:
def gen_rcycle():     
    try: 
        s = [1]
        yield s
    finally: 
        print('Done')

In [64]:
def do_stuff(): 
    ob = gen_rcycle() 
    next(ob).append(ob)

In [65]:
do_stuff()

In [66]:
import gc

In [67]:
gc.collect()

Done


1390

## Generator objects are not context managers

In [68]:
def do_more_stuff(): 
    ob = gen_rcycle() 
    next(ob).append(ob)
    return ob

In [69]:
with do_more_stuff(): 
    print('Use resource acquired')

Done


AttributeError: __enter__

In [70]:
gc.collect()

465

## Generator objects do have a `close` method

In [71]:
ob1 = do_more_stuff()

In [72]:
type(ob1)

generator

In [73]:
dir(ob1)

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_yieldfrom',
 'send',
 'throw']

In [74]:
help(ob1.close)

Help on built-in function close:

close(...) method of builtins.generator instance
    close() -> raise GeneratorExit inside generator.



In [75]:
ob1 = do_more_stuff()
try: 
    print('If I use ob1, I would use it here')
finally: 
    ob1.close()

If I use ob1, I would use it here
Done


In [76]:
import contextlib

In [77]:
with contextlib.closing(do_more_stuff()): 
    print('Use resource acquired')

Use resource acquired
Done


**TODO:** Examine materialization and acquision/release in `itertools.tee`.

## Review of materialization

In [78]:
def product_three(one, two, three): 
    my_one = list(one)
    my_two = list(two)
    my_three = list(three)
    return ((x, y, z) for x in my_one for y in my_two for z in my_three)

In [79]:
groups = product_three(['a', 'b', 'c'], (x for x in range(4)), (100, 200, 300))

In [80]:
list(groups)

Done


[('a', 0, 100),
 ('a', 0, 200),
 ('a', 0, 300),
 ('a', 1, 100),
 ('a', 1, 200),
 ('a', 1, 300),
 ('a', 2, 100),
 ('a', 2, 200),
 ('a', 2, 300),
 ('a', 3, 100),
 ('a', 3, 200),
 ('a', 3, 300),
 ('b', 0, 100),
 ('b', 0, 200),
 ('b', 0, 300),
 ('b', 1, 100),
 ('b', 1, 200),
 ('b', 1, 300),
 ('b', 2, 100),
 ('b', 2, 200),
 ('b', 2, 300),
 ('b', 3, 100),
 ('b', 3, 200),
 ('b', 3, 300),
 ('c', 0, 100),
 ('c', 0, 200),
 ('c', 0, 300),
 ('c', 1, 100),
 ('c', 1, 200),
 ('c', 1, 300),
 ('c', 2, 100),
 ('c', 2, 200),
 ('c', 2, 300),
 ('c', 3, 100),
 ('c', 3, 200),
 ('c', 3, 300)]

## Spilling the `tee`

In [81]:
help(itertools.tee)

Help on built-in function tee in module itertools:

tee(iterable, n=2, /)
    Returns a tuple of n independent iterators.



In [82]:
s = [1, 2, 3, 4]

In [83]:
i1, i2 = itertools.tee(s)

In [84]:
next(i1)

1

In [85]:
next(i1)

2

In [86]:
next(i2)

1

In [87]:
next(i2)

2

In [88]:
list(i2)

[3, 4]

In [89]:
list(i1)

[3, 4]

In [90]:
squares = (x**2 for x in s)

In [91]:
s1, s2 = itertools.tee(squares)

In [92]:
list(s1)

[1, 4, 9, 16]

In [93]:
list(s2)

[1, 4, 9, 16]

In [94]:
inspect.getgeneratorstate(squares)

'GEN_CLOSED'

In [95]:
next(squares)

StopIteration: 

In [96]:
squares = (x**2 for x in itertools.count(start=1))

In [97]:
s1, s2 = itertools.tee(squares)

In [98]:
next(s1)

1

In [99]:
next(s1)

4

In [100]:
next(s1)

9

In [101]:
next(s1)

16

In [102]:
next(s1)

25

In [103]:
next(s2)

1

In [104]:
next(s2)

4

In [105]:
next(s2)

9

In [106]:
next(s2)

16

In [107]:
inspect.getgeneratorstate(squares)

'GEN_SUSPENDED'

In [108]:
squares.close()

In [109]:
next(s2)

25

In [110]:
next(s2)

StopIteration: 

In [111]:
next(s1)

StopIteration: 

In [118]:
x, y = itertools.tee(itertools.count())

In [119]:
next(y)

0

In [120]:
next(y)

1

In [121]:
next(y)

2

In [122]:
z = zip(x, y)

In [123]:
next(z)

(0, 3)

In [124]:
next(z)

(1, 4)

In [125]:
next(z)

(2, 5)

In [126]:
for _ in range(100_000_000): 
    next(z)

In [127]:
next(z)

(100000003, 100000006)

In [128]:
# Takes a long time, but still doesn't use much memory:

# for _ in range(1_000_000_000): 
#     next(z)