Iterators: get next item, no index needed (consumable)

Iterables: collections that implement iterators

### Iterating Collections

Iteration can be more general than sequential indexing. All we need is:
- a bucket of items (collection, container)
- the ability to get the *next* item, no ordering needed

__Building an "Iterable" from Scratch__

This approach has some issues:
- cannot iterate using for loops, comprehensions
- once the iteration starts it cannot be restarted
- once all items have been iterated over, the object becomes useless

In [6]:
class Squares:
    def __init__(self, length):
        self.i = 0
        self.length = length
        
    def __next__(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result

In [7]:
sq = Squares(5)

while True:
    try:
        item = next(sq)
        print(item)
    except StopIteration:
        break

0
1
4
9
16


### Iterators

__The Iterator Protocol__

A __protocol__ is simply a way of saying our class implements certain functionality that Python can count on.

A class implements the iterator protocol by defining two methods:
- `__iter__`: returns the class instance itself
- `__next__`: handles returning next items and raising StopIteration

If an object is an *iterator* (implements the protocol), then it can be used with for loops and in comprehensions, etc.

In [8]:
class Squares:
    def __init__(self, length):
        self.i = 0
        self.length = length
        
    def __next__(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result
        
    def __iter__(self):
        return self

In [9]:
sq = Squares(5)

for item in sq:
    print(item)

0
1
4
9
16


This approach still has issues, because we still cannot restart the iteration. If we want to iterate again, we would have to create a new object.

### Iterables

__Separating the Collection from the Iterator__

We would prefer to separate these two functionalities:
- maintaining the data of the collection (iterable)
- iterating over the data (iterator)

The *iterable* is created once, but the *iterator* is created every time we need to start a fresh iteration.

In [10]:
class Cities:
    def __init__(self):
        self.cities = ['Paris', 'Amsterdam', 'London', 'Vienna']
        self.i = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result

The `Cities` instance is the iterator, but it also maintains the collection. So the collection must be recreated every time we start a new iteration, this is wasteful.

In [12]:
class Cities:
    def __init__(self):
        self.cities = ['Paris', 'Amsterdam', 'London', 'Vienna']
        
    def __len__(self):
        return len(self.cities)

In [17]:
class CityIterator:
    def __init__(self, cities):
        self._cities = cities
        self._index = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._cities):
            raise StopIteration
        else:
            result = self._index ** 2
            self._index += 1
            return result

In [18]:
cities = Cities()
cities_iter = CityIterator(cities)

for city in cities_iter:
    print(city)

0
1
4
9


In [19]:
# Here we can restart iteration without recreating the whole collection of cities
cities_iter = CityIterator(cities)

for city in cities_iter:
    print(city)

0
1
4
9


Now we can iterate over the collection many times, but we have to recreate the Iterator every time. How can we iterate over the Cities object instead?

__The Iterable Protocol__

A class implements the iterable protocol by defining the `__iter__` method. But the method returns a new instance of the *iterator* instead of the object itself.

In [20]:
class Cities:
    def __init__(self):
        self.cities = ['Paris', 'Amsterdam', 'London', 'Vienna']
        
    def __len__(self):
        return len(self.cities)
    
    def __iter__(self):
        return CityIterator(self)

__Iterable vs. Iterator__

An __iterable__ is an object that implements:
- `__iter__`: which returns an *iterator* (a new instance)

An __iterator__ is an object that implements:
- `__iter__`: which returns itself (an iterator)
- `__next__`: returns the next element

So iterators are themselves iterables, but they are iterables that become exhausted.

Iterables never become exhausted, because they always return a new iterator.

### Lazy Iterables

__Lazy Evaluation__

Often used for class properties, this is the technique of evaluating the value of a property when it is requested, rather than at instantiation.

In [22]:
# Example
class Actor:
    def __init__(self, id):
        self.id = id
        self.bio = lookup_actor_in_db(self.id)
        self.movies = None
        
    @property
    def movies(self):
        if self.movies is None:
            self.movies = lookup_movies_in_db(self.id)
        return self.movies

__Application to Iterables__

An example would be an iterable of Factorial(n), which does not pre-compute all the factorials, but waits until the nbext one is called, then calculates it (lazy evaluation).

Or an iterable of Posts, where each call to next will fetch more posts from te database and return them (lazy loading).

__Beware__: Lazy evaluation can mean having infinite iterables, so using them in a for loop requires caution.

In [31]:
import math

class Factorials:        
    def __iter__(self):
        return self.FactIter()
        
    class FactIter:
        def __init__(self):
            self.i = 0
            
        def __iter__(self):
            return self
        
        def __next__(self):
            result = math.factorial(self.i)
            self.i += 1
            return result

In [32]:
facts = Factorials()

In [33]:
fact_iter = iter(facts)

In [34]:
for _ in range(5):
    print(next(fact_iter))

1
1
2
6
24


### The `iter()` Function

What happens when Python performs iteration over an iterable?

Firstly, Python calls the `iter()` function on the object, if the object implements the `__iter__` method, it is called and Python uses the returned iterator.

What happens when an object does not implement the `__iter__` method?

For example, sequence types, which maybe only implement `__getitem__`. The `iter()` function will detect only a `__getitem__` method and return an iterator type object for us.It does this by incrementing an index until an IndexError is raised.

`iter(obj)` is called:

    -> Python looks for an __iter__ method
        -> If it exists, use it
        -> If not:
            -> Python looks for a __getitem__ method
                -> If it exists, it creates an iterator object and returns it
                -> If not, raise a TypeError

__Iterating a Callable__

There is are two forms of `iter()`:
- `iter(iterable)`
- `iter(callable, sentinel)`

The second form will return an iterator that will:
- call the callable when `next()` is called
- raise `StopIteration` if the result of the call is equal to the sentinel value

In [60]:
def counter():
    i = 0
    def inc():
        nonlocal i
        i += 1
        return i
    
    return inc

In [61]:
class CallableIter:
    def __init__(self, callable_, sentinel):
        self.callable_ = callable_
        self.sentinel = sentinel
        self.consumed = False
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.consumed:
            raise StopIteration
        else:
            result = self.callable_()
            if result == self.sentinel:
                self.consumed = True
                raise StopIteration
            else:
                return result

In [62]:
cnt = counter()
cnt_iter = CallableIter(cnt, 5)

In [63]:
for c in cnt_iter:
    print(c)

1
2
3
4


In [64]:
# Without the consumed flag, the iterator would resume after the sentinel value.
# Since in this example, the counter would increment to 6 and wouldn't raise StopIteration anymore.
next(cnt_iter)

StopIteration: 

__Using `iter` with a Callable__

In [69]:
cnt = counter()
cnt_iter = iter(cnt, 5)

In [70]:
for c in cnt_iter:
    print(c)

1
2
3
4


In [71]:
next(cnt_iter)

StopIteration: 

### Reverse Iteration

__Iterating a Sequence in Reverse__

In [72]:
seq = [1, 2, 3, 4, 5]

In [73]:
# This approach may be wasteful since we are creating a copy of the sequence.
for item in seq[::-1]:
    print(item)

5
4
3
2
1


In [75]:
# This is more efficient, but the syntax is messy
for i in range(len(seq)-1, -1, -1):
    print(seq[i])

5
4
3
2
1


The below would be the best approach, since it creates an iterator and does not copy the sequence. __But__ the sequence must have `__getitem__` and `__len__` implemented. You can override how `reversed()` works by implementing the `__reversed__` method.

In [78]:
# Best approach
for item in reversed(seq):
    print(item)

5
4
3
2
1


__Iterating an Iterable in Reverse__

When we call `reversed()` on a custom iterable, Python will look for the `__reversed__` method. This function should return an iterator.

Just like `iter()`, when we call `reversed()`:
- Python looks for and calls `__reversed__`
- If not implemented, it will use `__getitem__` and `__len__` to create an iterator for us
- Otherwise, raises an exception

