# Iterators

## Contents ##
1. [Intro: Iterables vs. Iterators](#intro)
2. [A sequential object is an iterable ](#sequence)
3. [An iterable and an iterator](#iterable)
4. [A generator](#generator)
5. [Further reading](#further_reading)


<a href="#intro"></a>
## Intro: iterables vs. iterators ##

An iterator is a container object implementing the iterator protocol which is based on two methods:

- `__next__`    returns the next item of the container

- `__iter__`:    returns the iterator itself

Note: we call an object that implements the `__iter__` method an *iterable*.
An object which also implements `__next__` is called an *iterator*.
I will explain the meaning of these two words below, but first some examples:

Using the built-in function `iter` we can make an iterator out of a sequence (something which implements the sequence protocol, i.e., implements `__getitem__`).

In [1]:
it = iter(('apple', 'banana'))
print(next(it))
print(next(it))
print(next(it))

apple
banana


StopIteration: 

When the iteration reached the end of the sequence, a StopIteration exception is raised. A for loop catches this exception, so we can use iterators in for loops:

In [2]:
it = iter(('apple', 'banana'))
for fruit in it:
    print(fruit)

apple
banana


Of course, in python we are used to write the last code snippet as:

In [3]:
for fruit in ('apple', 'banana'):
    print(fruit)

apple
banana


What actually happens when we put something in a for loop?
First Python checks if this something is an iterable (note: iterators are iterables, but not the other way round).
If its only an iterable, it builds an iterator 'it' out of the iterable and executes every step of the for loop by calling next(it).
When it reaches the end, next(it) raises a StopIteration exception and the iterator object is released.



<a href="#sequence"></a>
## A sequential object is an iterable ##

Enough theory, let's create a simple iterable (a sequence object implementing the '__getitem__' protocol: A sentence can be seen as a sequence of words:

In [19]:
import re
RE_WORD = re.compile('\w+')

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __getitem__(self, index):
        return self.words[index]

    def __len__(self):
        return len(self.words)
    
    def __repr__(self):
        return 'Sentence(%s)' % self.text


quote = Sentence("I'm not crazy, my reality is just different than yours.")
print(quote)

for word in quote:
    print(word)

Sentence(I'm not crazy, my reality is just different than yours.)
I
m
not
crazy
my
reality
is
just
different
than
yours


Cool. We wrote a class with 4 functions, all of them underscore functions :-)

A very important and fundamental thing to know: python operates via protocols. We work here inside the so called Python data model: https://docs.python.org/3/reference/datamodel.html (<-- most important doc page for python ;))

The principle is this: whenever we encounter a top-level function (or syntax) as initializing an object or asking for a basic representation of an object, there is a corresponding underscore function to implement that.
Calling print(quote) invokes `Sentence.__repr__` . If we hadnt provided this function, Python would print out sth. like: `<__main__.Sentence object at 0x03605F90>` (under the curtains, python implements a default `__repr__`, just not as pretty as ours)


<a href="#iterable"></a>
## An iterable and an iterator ##

Back to iterables: the most important function above of course is `__getitem__` : if the interpreter encounters an iteration (our for loop above), it checks first, if the object implements the function `__iter__` (a 'real' iterable). If not, it checks for the second best: `__getitem__` which our Sentence class above implements. It then builds an iterator by fetching the items in order (starting by 0). If all this fails, we get a TypeError.

Let's write a 'genuine' iterable: a object implementing `__iter__`. And also an iterator: an object implement `__iter__` and `__next__`:

In [2]:
import re
RE_WORD = re.compile('\w+')

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __iter__(self):
        return SentenceIterator(self.words)

class SentenceIterator:
    def __init__(self, words):
        self.words = words
        self.index = 0

    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word

    def __iter__(self):
        return self
    
quote = Sentence("I'm not crazy, my reality is just different than yours.")

for word in quote:
    print(word)

I
m
not
crazy
my
reality
is
just
different
than
yours


So the Sentence class implements the iterable protocol: it has an `__iter__` function which instantiates a `SentenceIterator` (handing over a reference to the words) and returns the iterator.

`SentenceIterator` implements the iterator protocol: it has a `__next__` function returning the next element in the sequence. (it also needs an `__iter__` function).


<a href="#generator"></a>
## A generator ##

We can achieve the same without writing our Iterator class, by using a generator function instead: 

In [7]:
import re
RE_WORD = re.compile('\w+')

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __iter__(self):
        for word in self.words:
            yield word
            
quote = Sentence("I'm not crazy, my reality is just different than yours.")

for word in quote:
    print(word)

I
m
not
crazy
my
reality
is
just
different
than
yours


... or even simpler:

In [8]:
def __iter__(self):
        return iter(self.words)

<a href="#further_reading"></a>
## Further reading ##

- See also: [Generators](generators.ipynb)