# Implementing Iteration

## Agenda

1. Review: Iteration
2. Details: *iterables*, *iterators*, `iter`, and `next`
3. Implementing iterators with classes
4. Implementing iterators with *generators* and `yield`

## 1. Review: Iteration

*Iteration* simply refers to the process of accessing — one by one — the items stored in some container. The order of the items, and whether or not the iteration is comprehensive, depends on the container.

In Python, we typically perform iteration using the `for` loop.

In [1]:
# e.g., iterating over a list
l = [2**x for x in range(10)]
for n in l:
    print(n)

1
2
4
8
16
32
64
128
256
512


In [2]:
# e.g., iterating over the key-value pairs in a dictionary
d = {x:2**x for x in range(10)}
for k,v in d.items():
    print(k, '=>', v)

0 => 1
1 => 2
2 => 4
3 => 8
4 => 16
5 => 32
6 => 64
7 => 128
8 => 256
9 => 512


## 2. Review: *iterables*, *iterators*, `iter`, and `next`

We can iterate over anything that is *iterable*. Intuitively, if something can be used as the source of items in a `for` loop, it is iterable.

But how does a `for` loop really work? (Review time!)

In [3]:
l = [2**x for x in range(10)]

In [4]:
it = iter(l)
while True:
    try:
        n = next(it)
        print(n)
    except StopIteration:
        break

1
2
4
8
16
32
64
128
256
512


## 3. Implementing iterators with classes

In [5]:
class MyIterator:
    def __init__(self, max):
        self.max = max
        self.curr = 0
        
    # the following methods are required for iterator objects
    
    def __next__(self):
        if self.curr < self.max:
            ret = self.curr
            self.curr += 1
            return ret
        else:
            raise StopIteration()
    
    def __iter__(self):
        return self

In [6]:
it = MyIterator(10)

In [7]:
next(it)

0

In [8]:
it = MyIterator(10)
while True:
    try:
        print(next(it))
    except StopIteration:
        break

0
1
2
3
4
5
6
7
8
9


In [10]:
it = MyIterator(5)
for i in it:
    print(i)

0
1
2
3
4


An iterator is a *one time use object*! I.e., once we've used it to iterate over elements we cannot typically reset or "rewind" iteration. Iterable objects that can be traversed repeatedly return fresh iterators for each traversal.

In [11]:
l = ['a', 'b', 'c', 'd', 'e']
for _ in range(3):
    for x in l:
        print(x, end=' ')

a b c d e a b c d e a b c d e 

In [12]:
l = ['a', 'b', 'c', 'd', 'e']
for _ in range(3):
    it = iter(l) # we obtain and "use up" an iterator each loop!
    while True:
        try:
            x = next(it)
            print(x, end=' ')
        except StopIteration:
            break

a b c d e a b c d e a b c d e 

For a container type, we need to implement an `__iter__` method that returns an iterator.

In [13]:
class ArrayList:
    def __init__(self):
        self.data = []
        
    def append(self, val):
        self.data.append(None)
        self.data[len(self.data)-1] = val
        
    def __iter__(self):
        class ArrayListIterator:
            def __init__(self, data):
                self.data = data
                self.idx = 0

            def __next__(self):
                if self.idx < len(self.data):
                    ret = self.data[self.idx]
                    self.idx += 1
                    return ret
                else:
                    raise StopIteration()

            def __iter__(self):
                return self
                
        return ArrayListIterator(self.data)

In [14]:
l = ArrayList()
for x in range(10):
    l.append(2**x)

In [15]:
it = iter(l)

In [16]:
type(it)

__main__.ArrayList.__iter__.<locals>.ArrayListIterator

In [17]:
next(it)

1

In [18]:
for x in l:
    print(x)

1
2
4
8
16
32
64
128
256
512


## 4. Implementing iterators with generators

What's a "generator"?

In [19]:
l = [2*x for x in range(10)]
g = (2*x for x in range(10))

In [20]:
type(l), type(g)

(list, generator)

In [21]:
for x in l:
    print(x)

0
2
4
6
8
10
12
14
16
18


In [22]:
for x in g:
    print(x)

0
2
4
6
8
10
12
14
16
18


In [23]:
dir(g)

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_yieldfrom',
 'send',
 'throw']

In [24]:
%timeit -n 1000 [2*x for x in range(10_000)]

919 µs ± 221 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [25]:
%timeit -n 1000 (2*x for x in range(10_000))

873 ns ± 178 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [26]:
g = (2*x for x in range(10_000))

In [27]:
g[100]

TypeError: 'generator' object is not subscriptable

In [28]:
g[:100]

TypeError: 'generator' object is not subscriptable

In [29]:
sum(g)

99990000

In [30]:
sum(g)

0

A *generator expression* syntactically resembles a list comprehension, and is similar in that it evaluates to an iterable sequence of values. However, a generator does not represent a fully fleshed out collection of values; instead, values are returned only as they are required through the iteration API (i.e., `next`) --- we refer to this as *lazy evaluation*. 

This makes a generator more efficient than a list (since we don't need to keep all values in the sequence around), but generators can't replace lists in all scenarios (e.g., when we need to jump around in the sequence or revisit values).

### Creating generator functions: `yield`

In [31]:
def foo():
    yield

In [32]:
foo()

<generator object foo at 0x0000020928A09E00>

In [33]:
type(foo())

generator

In [34]:
def foo():
    print('hello!')
    yield
    print('goodbye!')

In [35]:
foo()

<generator object foo at 0x0000020928A0A1F0>

In [36]:
g = foo()

In [38]:
next(g)

goodbye!


StopIteration: 

In [39]:
def foo():
    yield 1
    yield 2
    yield 3

In [45]:
g = foo()

In [44]:
next(g)

StopIteration: 

In [46]:
for x in g:
    print(x)

1
2
3


In [47]:
def countdown(n):
    for x in range(n, 0, -1):
        yield x
    yield 'Boom!'

In [48]:
for x in countdown(5):
    print(x)

5
4
3
2
1
Boom!


In [49]:
list(countdown(10))

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 'Boom!']

A *generator function* is a function that contains one or more `yield` statements. When called, a generator function returns a generator object, which effectively allows us to incrementally execute the function using the iteration API. Each call to `next` on the generator will execute the function up to the next `yield` statement; if/when the function completes the generator will raise a `StopIteration` exception (just like an iterator).

### Generators as Data Structure Iterators

In [50]:
class ArrayList:
    def __init__(self):
        self.data = []
        
    def append(self, val):
        self.data.append(None)
        self.data[len(self.data)-1] = val
        
    def __iter__(self):
        for i in range(len(self.data)):
            yield self.data[i]

In [51]:
l = ArrayList()
for x in range(10):
    l.append(2**x)

In [52]:
for x in l:
    print(x)

1
2
4
8
16
32
64
128
256
512


In [53]:
class ArrayList(ArrayList):
    def __repr__(self):
        return '[' + ', '.join(repr(x) for x in self) + ']'

In [54]:
l = ArrayList()
for x in range(10):
    l.append(2**x)
l

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]