### Iterables and Iterators

As we discussed in the lecture, iterables are collections of objects that implement the `__iter__` method.

This method creates a new **iterator**: i.e it implements `__next__` that can then be used to iterate over the iterable. Iterators are also iterables in that they also implement `__iter__`, but they just return themselves.

Let's see this with lists.

In [1]:
l = [1, 2, 3]

In [2]:
iterator = iter(l)

In [3]:
type(iterator)

list_iterator

Now that we have an iterator, we can use it to iterate over our list:

In [4]:
next(iterator)

1

In [5]:
next(iterator)

2

As you can see the iterator is obviously keeping track of where were at in the iteration.

In [6]:
next(iterator)

3

Now, we've retrieved all the elements, so what happens when we request the next one?

In [7]:
next(iterator)

StopIteration: 

We get a `StopIteration` exception.

And we can keep trying `next()` as many times as we want, we'll still get that `StopIteration` exception:

In [8]:
next(iterator)

StopIteration: 

As you can see, we cannot iterate over that list again using that iterator - it has been **exhausted**.

If we want to iterate over the list again, we have to get a new iterator:

In [9]:
iterator = iter(l)

And now we can use `next()` on that new iterator:

In [10]:
next(iterator)

1

When we run a `for` loop such as this:

In [11]:
l = [1, 2, 3, 4, 5]

for element in l:
    print(element)

1
2
3
4
5


Python is actually requesting a new iterator from the list and calling `next` on that iterator at each iteration of the loop.

But you'll notice we don't get a `StopIteration` exception - the loop just terminates and code resumes after that.

We can mimic what's happening internally this way:

In [12]:
l = [1, 2, 3, 4, 5]

iterator = iter(l)

try:
    while True:
        element = next(iterator)
        print(element)
except StopIteration:
    # done iterating, do nothing: silence error
    pass

1
2
3
4
5


So why am I mentioning this? It may be interesting to understand how Python does iteration fundamentally, but what has that got to do with us?

Often in Python, objects we get (from calling some functions), are not iterables, but just iterators.

In other words, they are not re-usable - in the sense that they become exhausted, and we cannot re-use them to iterate from the beginning.

Why do these tyes of objects even exist? Basically for performance reasons.

As we'll see later when we study generators, it is possible to have an iterator that allows us to iterate over a "virtual" collection - in the sense that the next element is calculated and returned, but no memory is wasted in holding all the elements, and th up-front computational cost of calculating all the elements is avoided - they are generated one at a time, and doled out one at a time.

This technique of calculating and doling out elements one at a time is called **lazy iteration**. Iterators are generally lzay, and some iterables can be lazy too.

A simple example of a lazy iterable this is the `range` object.

When we create a range object, the integers in that range are not actually "materialized" - instead an iterator is used to keep track of where the iteration is at, and calcualting and returning the "next" object when requested.

In [13]:
r = range(10)

To see what is in that range object, we can iterate through it using an iterator:

In [14]:
r_iter = iter(r)

In [15]:
next(r_iter)

0

In [16]:
next(r_iter)

1

We can even call the `list()` function, passing it the iterator - this function will iterate over each element and create a list from that:

In [17]:
list(r_iter)

[2, 3, 4, 5, 6, 7, 8, 9]

But as you can see, it's missing the first two elements (`0` and `1`).

The advantage of this approach, is that when we write something like this:

In [18]:
r = range(100_000_000)

The object is created almost instantly, with very little memory overhead!

We save on memory space, as well as possibly uncessary computations:

In [19]:
for i in range(100_000_000):
    print(i)
    if i > 4:
        break

0
1
2
3
4
5


Here we did a `break` after the first `6` elements - so no memory was wasted creating a collection with `100,000,000` integers, and we did not waste the time needed to calculate those integers either - just the first `6`.

Let's see some timing differences:

In [20]:
from time import perf_counter

In [21]:
start = perf_counter()
l = range(100_000_000)
end = perf_counter()
print(f'elapsed: {end - start}')

elapsed: 8.13000078778714e-05


Watch what happens when we turn that range into a list - which means Python has to allocate memory to store all `100,000,000` elements, as well as calculate them:

In [22]:
from time import perf_counter
start = perf_counter()
l = list(range(100_000_000))
end = perf_counter()
print(f'elapsed: {end - start}')

elapsed: 3.2889801999990596


This list is taking up a huge amount of memory right now, so I'm going to delete it and let Python clean up the memory. (You can probably see how much memory is being used on your system before and after doing this using tools specific to your OS)

In [23]:
del l

Let's look at an example of an iterator - the `enumerate()` function returns an iterator, not an iterable - so it can only be iterated over once.

In [24]:
enum = enumerate('abc')

In [25]:
list(enum)

[(0, 'a'), (1, 'b'), (2, 'c')]

This iterated over `enum`, and since it is an iterator, it is now exhausted:

In [26]:
list(enum)

[]

To re-run the iteration we would have to create a new enumeration:

In [27]:
list(enumerate('abc'))

[(0, 'a'), (1, 'b'), (2, 'c')]

Iterators support both the `iter()` and `next()` functions, as we can see with `enumerate`:

In [28]:
enum = enumerate('abc')

In [29]:
iter(enum) is enum

True

For iterators, `iter()` just returns the iterator itself (since it's already an iterator).

It also suports the `next()` function, and raises a `StopIteration` exception once all elements have been iterated over:

In [30]:
next(enum)

(0, 'a')

In [31]:
next(enum)

(1, 'b')

In [32]:
next(enum)

(2, 'c')

In [33]:
next(enum)

StopIteration: 