### Iterators and Iterables

Previously we saw that we could create **iterator** objects by simply implementing:

* a `__next__` method that returns the next element in the container
* an `__iter__` method that just returns the object itself (the iterator object)

Doing that we could use a `for` loop, list comprehensions, and in fact use that iterator object anywhere an iterable was expected (like `enumerate`, `sorted`, and so on).

However, we had two outstanding issues/questions:
* when we looped over the iterator using a `for` loop (or a comprehension, or other functions that do some form of iteration), we saw that the `__iter__` was always called first.
* the iterator gets exhausted after we have finished iterating it fully - which means we have to create a new iterator every time we want to use a new iteration over the collection - can we somehow avoid having to remember to do that every time?

The answer to both of these questions are related.

Let's start by looking at how we might avoid having to create a new instance of the collection every time we want to iterate over it.

After all, we don't need a new instance of the elements, just some kind of *resetting* of *current* item.

Let's start with a simple example that has those issues:

In [1]:
class Cities:
    def __init__(self):
        self._cities = ['Paris', 'Berlin', 'Rome', 'Madrid', 'London']
        self._index = 0
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._cities):
            raise StopIteration
        else:
            item = self._cities[self._index]
            self._index += 1
            return item

Now, we have an **iterator** object, but we need to re-create it every time we want to start the iterations from the beginning:

In [2]:
cities = Cities()
list(enumerate(cities))

[(0, 'Paris'), (1, 'Berlin'), (2, 'Rome'), (3, 'Madrid'), (4, 'London')]

In [3]:
cities = Cities()
[item.upper() for item in cities]

['PARIS', 'BERLIN', 'ROME', 'MADRID', 'LONDON']

In [4]:
cities = Cities()
sorted(cities)

['Berlin', 'London', 'Madrid', 'Paris', 'Rome']

In [5]:
sorted(cities)

[]

So, we basically have to "restart" an iterator by **creating a new one each time**.

But in this case, we are also re-creating the underlying data every time - seems wasteful!

Instead, maybe we can split the **iterator** part of our code from the **data** part of our code.

In [6]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)

And let's create our iterator this way:

In [7]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

So now we can create our `Cities` instance **once**:

In [8]:
cities = Cities()

and create as many iterators as we want, but passing it the same `Cities` instance everyt time:

In [9]:
iter_1 = CityIterator(cities)

In [10]:
for city in iter_1:
    print(city)

New York
Newark
New Delhi
Newcastle


In [12]:
iter_2 = CityIterator(cities)
[city.upper() for city in iter_2]

['NEW YORK', 'NEWARK', 'NEW DELHI', 'NEWCASTLE']

In [13]:
[city.upper() for city in iter_2]

[]

So, we're almost at a solution now. At least we can create the **iterator** objects without having to recreate the `Cities` object every time.

But, we still have to remember to create a new iterator, **and** we can no longer iterate over the `cities` object anymore!

In [None]:
for city in cities:
    print(city)

TypeError: 'Cities' object is not iterable

This is where the first question we asked comes into play. Whenever we iterated our iterator, the first thing Python did was call `__iter__`.

In fact, let's just check that again:

In [14]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        print('Calling CityIterator __init__')
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        print('Calling CityIterator instance __iter__')
        return self
    
    def __next__(self):
        print('Calling __next__')
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

In [15]:
iter_1 = CityIterator(cities)

Calling CityIterator __init__


In [16]:
for city in iter_1:
    print(city)

Calling CitiyIterator instance __iter__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


#### Iterables

Now we finally come to how an **iterable** is defined in Python.

An **iterable** is an object that:
* implements the `__iter__` method
* and that method returns an **iterator** which can be used to iterate over the object

What would happen if we put an `__iter__` method in the `Cities` object and then try to iterate?

When we try to iterate over the `Cities` instance, Python will first call `__iter__`. The `__iter__` method should then return an **iterator** which Python will use for the iteration.

We actually have everything we need to now make `Cities` an **iterable** since we already have the `CityIterator` created:

In [None]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        print('Calling CityIterator __init__')
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        print('Calling CitiyIterator instance __iter__')
        return self
    
    def __next__(self):
        print('Calling __next__')
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        print('Calling Cities instance __iter__')
        return CityIterator(self)

In [None]:
cities = Cities()

In [None]:
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


And watch what happens if we try to run that loop again:

In [None]:
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


A new **iterator** was created when the `for` loop started.

In fact, same happens for anything that is going to iterate our iterable - it first calls the `__iter__` method of the itrable to get a **new** iterator, then uses the iterator to call `__next__`.

In [None]:
list(enumerate(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
Calling __next__
Calling __next__
Calling __next__
Calling __next__


[(0, 'New York'), (1, 'Newark'), (2, 'New Delhi'), (3, 'Newcastle')]

In [None]:
sorted(cities, reverse=True)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
Calling __next__
Calling __next__
Calling __next__
Calling __next__


['Newcastle', 'Newark', 'New York', 'New Delhi']

Now we can put the iterator class inside our `Cities` class to keep the code self-contained:

In [34]:
class Squares:
    def __init__(self, length):
        self.length = length

    def __length__(self):
        return self.length

    def __iter__(self):
        return self.SquaresIterator(self)

    class SquaresIterator:
        def __init__(self, square_obj):
            self.index = 0
            self.length = len(square_obj)

        def __next__(self):
            if self.index >= self.length:
                raise StopIteration
            else:
                result = self.index ** 2
                self.index += 1
                return result

In [90]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'London', 'Madrid', 'Paris']

    def __len__(self): # OJO es __len__ no __length__
        return len(self._cities)

    def __iter__(self):
        print('calling Cities __iter__ 1')
        return self.CityIterator(self)

    class CityIterator:
        def __init__(self, city_obj):
            print('calling CityIterator __init__ 2')
            self.city_obj = city_obj
            self.index = 0
            self.length = len(self.city_obj)

        def __iter__(self):
            return self

        def __next__(self):
            print('calling CityIterator __next__ 3')
            if self.index >= self.length:
                raise StopIteration
            else:
                result = self.city_obj._cities[self.index]
                self.index += 1
                return result



In [91]:
cities = Cities()

In [94]:
for i in range(3):
    cities.CityIterator(cities)

calling CityIterator __init__ 2
calling CityIterator __init__ 2
calling CityIterator __init__ 2


In [92]:
for item in cities:
    print(item)

calling Cities __iter__ 1
calling CityIterator __init__ 2
calling CityIterator __next__ 3
New York
calling CityIterator __next__ 3
London
calling CityIterator __next__ 3
Madrid
calling CityIterator __next__ 3
Paris
calling CityIterator __next__ 3


In [66]:
del CityIterator  # just to make sure CityIterator is not in our global scope

### THE ITERATOR PROTOCOL

In [35]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        print('Calling Cities instance __iter__ 1')
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            # cities is an instance of Cities
            print('Calling CityIterator __init__ 2')
            self._city_obj = city_obj
            self._index = 0

        def __iter__(self):
            print('Calling CityIterator instance __iter__')
            return self

        def __next__(self):
            print('Calling __next__ 3')
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                item = self._city_obj._cities[self._index]
                self._index += 1
                return item

In [36]:
cities = Cities()

In [37]:
for i in cities:
    print(i)

Calling Cities instance __iter__ 1
Calling CityIterator __init__ 2
Calling __next__ 3
New York
Calling __next__ 3
Newark
Calling __next__ 3
New Delhi
Calling __next__ 3
Newcastle
Calling __next__ 3


In [38]:
[city for city in cities]

Calling Cities instance __iter__ 1
Calling CityIterator __init__ 2
Calling __next__ 3
Calling __next__ 3
Calling __next__ 3
Calling __next__ 3
Calling __next__ 3


['New York', 'Newark', 'New Delhi', 'Newcastle']

In [33]:
list(enumerate(cities))

Calling Cities instance __iter__ 1
Calling CityIterator __init__ 2
Calling __next__ 3
Calling __next__ 3
Calling __next__ 3
Calling __next__ 3
Calling __next__ 3


[(0, 'New York'), (1, 'Newark'), (2, 'New Delhi'), (3, 'Newcastle')]

Technically we can even get an iterator instance ourselves directly, by calling `iter()` on the `cities` object:

In [39]:
iter_1 = iter(cities)
iter_2 = iter(cities)

Calling Cities instance __iter__ 1
Calling CityIterator __init__ 2
Calling Cities instance __iter__ 1
Calling CityIterator __init__ 2


As you can see, Python created and returned two different instances of the `CityIterator` object.

In [None]:
id(iter_1), id(iter_2)

(1741231353928, 1741231354320)

And now we also have should understand why **iterators** also implement the `__iter__` method (that just returns themselves) - it makes them **iterables** too!

#### Mixing Iterables and Sequences

`Cities` is an iterable, but it is not a sequence type:

In [40]:
cities = Cities()

In [41]:
len(cities)

4

In [42]:
cities[1]

TypeError: 'Cities' object is not subscriptable

Since our Cities **could** also be a sequence, we could also decide to implement the `__getitem__` method to make it into a sequence:

In [43]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __getitem__(self, s):
        print('getting item...')
        return self._cities[s]
    
    def __iter__(self):
        print('Calling Cities instance __iter__')
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            # cities is an instance of Cities
            print('Calling CityIterator __init__')
            self._city_obj = city_obj
            self._index = 0

        def __iter__(self):
            print('Calling CitiyIterator instance __iter__')
            return self

        def __next__(self):
            print('Calling __next__')
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                item = self._city_obj._cities[self._index]
                self._index += 1
                return item

In [44]:
cities = Cities()

It's a sequence:

In [45]:
cities[0]

getting item...


'New York'

It's also an iterable:

In [46]:
next(iter(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__


'New York'

Now that Cities is both a sequence type (`__getitem__`) and an iterable (`__iter__`), when we loop over `cities`, is Python going to use `__getitem__` or `__iter__`?

In [47]:
cities = Cities()
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


It uses the iterator - so Python will use the iterator if there is one, otherwise it will fall back to using `__getitem__`. If neither is implemented, we'll get an exception.

Of course, for selection by index or slice, the `__getitem__` method **must** be implemented.

We'll come back to this very topic in an upcoming video, because behind the scenes, even if we only implement the `__getitem__` method, Python will auto-generate an iterator for us!

### Python Built-In Iterables and Iterators

The way iterables and iterators work in our custom `Cities` example is exactly the way Python iterables work too.

In [95]:
l = [1, 2, 3]

Since lists are iterables, they implement the `__iter__` method and we can get an **iterator** for the list:

In [96]:
for item in l:
    print(item)

1
2
3


In [97]:
[i**2 for i in l]

[1, 4, 9]

In [98]:
'__iter__' in dir(l)

True

In [99]:
'__next__' in dir(l)

False

In [100]:
iter_l = iter(l)
#or could use iter_1 = l.__iter__()

In [101]:
'__iter__' in dir(iter_l)

True

In [102]:
'__next__' in dir(iter_l)

True

In [50]:
type(iter_l)

list_iterator

In [103]:
for i in iter_l:
    print(i)

1
2
3


In [104]:
[i for i in iter_l]

[]

In [51]:
next(iter_l)

1

In [52]:
next(iter_l)

2

In [53]:
next(iter_l)

3

In [54]:
next(iter_l)

StopIteration: 

See? The same `StopIteration` exception is raised.

Since `iter_l` is an iterator, it also implements the `__iter__` method, which just returns the iterator itself:

In [55]:
id(iter_l), id(iter(iter_l))

(140498645786240, 140498645786240)

In [58]:
'__next__' in dir(iter_l)

True

In [56]:
'__next__' in dir(iter_l)

True

In [57]:
'__iter__' in dir(iter_l)

True

Since the list `l` is an iterable it also implements the `__iter__` method:

In [None]:
'__iter__' in dir(l)

True

but does not implement a `__next__` method:

In [None]:
'__next__' in dir(l)

False

Of course, since lists are also sequence types, they also implement the `__getitem__` method:

In [None]:
'__getitem__' in dir(l)

True

Sets and dictionaries on the other hand are not sequence types:

In [None]:
'__getitem__' in dir(set)

False

In [None]:
'__iter__' in dir(set)

True

In [None]:
s = {1, 2, 3}
'__next__' in dir(iter(s))

True

In [None]:
'__iter__' in dir(dict)

True

But what does the iterator for a dictionary actually return? It iterates over what? You should probably already guess the answer to that one!

In [None]:
d = dict(a=1, b=2, c=3)

In [None]:
iter_d = iter(d)

In [None]:
next(iter_d)

'a'

Dictionary iterators will iterate over the **keys** of the dictionary.

To iterate over the values, we could use the `values()` method which returns an **iterable** over the values of the dictionary:

In [None]:
iter_vals = iter(d.values())

In [None]:
next(iter_vals)

1

And to iterate over both the keys and values, dictionaries provide an `items()` iterable:

In [None]:
iter_items = iter(d.items())

In [None]:
next(iter_items)

('a', 1)

Here we get an iterator over key, value tuples

We'll examine the usefullness of being able to iterate using `next` instead of a `for` loop, or comprehension, in the next video.