# Generators and Iterators

![Iterables and Iterators](./data/img/Iterable.png)

## Building your own generators with `yield`

In [1]:
def counter(start, end):
    current = start
    while current < end:
        yield current
        current += 1

In [2]:
counter(1, 10)

<generator object counter at 0x1080f11b0>

In [3]:
x = counter(1,10)
next(x)

1

In [4]:
next(x)

2

In [5]:
next(x)

3

In [6]:
x = counter(1,10)
list(x)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [7]:
for item in counter(1, 10):
    print(item, end=' ')

1 2 3 4 5 6 7 8 9 

`yield` can also be used as a function, along with the `send()` method

In [8]:
def accumulator(start=0):
    current = start
    while True:
        current += yield current

In [9]:
x = accumulator()
next(x)

0

In [10]:
x.send(1)

1

In [11]:
x.send(1)

2

In [12]:
x.send(10)

12

## The iterator protocol

What does `for x in sequence:` *really* do?

In [13]:
seq = range(4)
for x in seq: 
    print(x)

0
1
2
3


In [14]:
iter_seq = iter(seq)
print(iter_seq)

<range_iterator object at 0x1080d4870>


In [15]:
iter_seq = iter(seq)
try:
    while True:
        x = next(iter_seq)
        print(x)
except StopIteration:
    pass

0
1
2
3


In [17]:
lst = [1,2,3]
next(iter(lst))

1

Generators are their own iterators:

In [20]:
x = counter(0, 4)
print(x)
print(iter(x))
x is iter(x)

<generator object counter at 0x1080f12a0>
<generator object counter at 0x1080f12a0>


True

In [21]:
for item in counter(0, 4): 
    print(item)

0
1
2
3


In [22]:
x = counter(0, 4)
while True:
    next(x)

StopIteration: 

We can also define our own iterator classes (though generators are usually more readable):

In [23]:
class Counter(object):
    def __init__(self, start, end):
        self._start = start
        self._end = end
    def __iter__(self):
        '''This is often implemented as a generator function'''
        return CounterIterator(self._start, self._end)
    
class CounterIterator(object):
    def __init__(self, start, end):
        self._cur = start
        self._end = end
    def __next__(self):
        result = self._cur
        self._cur += 1
        if result < self._end:
            return result
        else:
            raise StopIteration

ctr = Counter(0, 5)
print(list(ctr))

[0, 1, 2, 3, 4]


# Set and dict comprehensions

In [27]:
{x for x in range(4)}

{0, 1, 2, 3}

In [28]:
{x:'y' for x in range(4)}

{0: 'y', 1: 'y', 2: 'y', 3: 'y'}

## Generator expressions

In [29]:
[ x for x in range(10) if x % 2 == 0 ]

[0, 2, 4, 6, 8]

In [30]:
( x for x in range(10) if x % 2 == 0 )

<generator object <genexpr> at 0x10815d2a0>

In [31]:
gen = ( x for x in range(10) if x % 2 == 0 )

In [32]:
next(gen)

0

In [33]:
next(gen)

2

In [34]:
list(gen)

[4, 6, 8]

## The `itertools` module

`itertools` provides a number of "higher-order iterators" that allow you to combine iterators in interesting ways.

In [35]:
from itertools import chain, count, groupby

In [36]:
# chain links multiple iterators end-to-end
xs = range(10)
ys = 'abcdef'
list(chain(xs, ys))


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 'a', 'b', 'c', 'd', 'e', 'f']

In [40]:
# The Python 3 built-in "zip" lets us iteratively zip multiple iterators. 
#  Useful when building a giant dictionary:
import string
dict(zip(string.ascii_lowercase, string.ascii_uppercase[:10]))

{'a': 'A',
 'b': 'B',
 'c': 'C',
 'd': 'D',
 'e': 'E',
 'f': 'F',
 'g': 'G',
 'h': 'H',
 'i': 'I',
 'j': 'J'}

In [43]:
# count() gives us a simple iterator of consecutive values

for i, letter in zip(count(), string.ascii_letters[:10]):
    print(i, letter)

0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j


In [44]:
for i, letter in enumerate(string.ascii_letters[:10]):
    print(i, letter)

0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j


In [45]:
# Python anti-pattern
for i in range(len(string.ascii_letters[:10])):
    print(i, string.ascii_letters[i])

0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j


In [46]:
# also an anti-pattern
d = dict(zip(string.ascii_lowercase, string.ascii_uppercase[:10]))
for key in d.keys():
    print(key, d[key])

a A
b B
c C
d D
e E
f F
g G
h H
i I
j J


In [47]:
for key, value in d.items():
    print(key, value)

a A
b B
c C
d D
e E
f F
g G
h H
i I
j J


`groupby()` allows us to efficiently group values from an iterator into sub-values. For instance, we might have 
some datetime-based data that we wish to convert to date-based data:

In [48]:
from random import random
from datetime import datetime, timedelta

trades = []
dt = datetime(2016, 4, 24)
while dt < datetime(2016,4,27):
    trades.append((dt, random()))
    dt += timedelta(hours=1)
    
print(len(trades))

72


In [49]:
trades[:10]

[(datetime.datetime(2016, 4, 24, 0, 0), 0.4062262904729086),
 (datetime.datetime(2016, 4, 24, 1, 0), 0.9546287039144711),
 (datetime.datetime(2016, 4, 24, 2, 0), 0.7137118496803933),
 (datetime.datetime(2016, 4, 24, 3, 0), 0.49130302667736847),
 (datetime.datetime(2016, 4, 24, 4, 0), 0.8231214691196771),
 (datetime.datetime(2016, 4, 24, 5, 0), 0.29791319904964353),
 (datetime.datetime(2016, 4, 24, 6, 0), 0.4297033215505065),
 (datetime.datetime(2016, 4, 24, 7, 0), 0.760113400906256),
 (datetime.datetime(2016, 4, 24, 8, 0), 0.287101024045889),
 (datetime.datetime(2016, 4, 24, 9, 0), 0.2090267671315641)]

In [50]:
def day_of_trade(val):
    dt, value = val
    return dt.date()

for date, date_trades in groupby(trades, key=day_of_trade):
    print(date, len(list(date_trades)))


2016-04-24 24
2016-04-25 24
2016-04-26 24


In [51]:
for date, date_trades in groupby(trades, key=day_of_trade):
    date_trades = list(date_trades)
    print(date, sum(v for dt, v in date_trades) / len(date_trades))


2016-04-24 0.5581507674295924
2016-04-25 0.5082541052648527
2016-04-26 0.519133742640758


In [52]:
import random
random.shuffle(trades)

for date, date_trades in groupby(trades, key=day_of_trade):
    date_trades = list(date_trades)
    print(date, sum(v for dt, v in date_trades) / len(list(date_trades)))


2016-04-25 0.6965800824766801
2016-04-24 0.9546287039144711
2016-04-25 0.6048828787193409
2016-04-24 0.46158402171081026
2016-04-26 0.06477836496438405
2016-04-25 0.8654600858500318
2016-04-24 0.7104957772761222
2016-04-26 0.3851766984106576
2016-04-24 0.4062262904729086
2016-04-26 0.04023345876410289
2016-04-25 0.36426857668086365
2016-04-24 0.29791319904964353
2016-04-26 0.1948499127123523
2016-04-25 0.9181714457279584
2016-04-26 0.9061262654719939
2016-04-25 0.7507955548945594
2016-04-26 0.5629173844863353
2016-04-24 0.366657091087062
2016-04-25 0.021812700570819632
2016-04-26 0.3118535537757263
2016-04-25 0.6921835117683327
2016-04-24 0.5545709422071232
2016-04-25 0.49502926006576187
2016-04-26 0.5991308170332507
2016-04-25 0.26181212431294265
2016-04-24 0.2339033551813694
2016-04-26 0.5724252185585591
2016-04-24 0.760113400906256
2016-04-26 0.5181361784314862
2016-04-25 0.1642743771463796
2016-04-24 0.2090267671315641
2016-04-26 0.8110453389195543
2016-04-25 0.01609394997996394
20

### Note that your data *must* already be sorted in a "grouped" order if you use `groupby`. If you wish to group *unsorted* data, you should use a `defaultdict` instead.

# Lab

Open [Generators and Iterators Lab][iteration-lab]

[iteration-lab]: ./iteration-lab.ipynb