# Iteration in Python


## The StopIteration exception

When Python genrators are described it's common to say that they signal when they are exhausted - meaning finished or empty - by riasing a **StopIteration** exception.  It might seem reasonable to assume that the **StopIteration** exception is what actually breaks Python out of a loop and that we could do this manually.  Does this work?

### StopIteration in for loops

In [1]:
for x in range(10):
    print(x)
    if x == 7:
        raise StopIteration("Trying to manually escape a for loop")
print("This statement is outside the loop")

0
1
2
3
4
5
6
7


StopIteration: Trying to manually escape a for loop

#### Results
Here is what I see on my machine when I run the code above.
```python traceback
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[1], line 4
      2     print(x)
      3     if x == 7:
----> 4         raise StopIteration("Trying to manually escape a for loop")
      5 print("This statement is outside the loop")

StopIteration: Trying to manually escape a for loop
```

#### Why doesn't this work?

Raising a **StopIteration** exception inside the body of a **for** loop doesn't work as a way to break out of the loop because the body of the loop is not a part of the iterator.  The Python interpreter isn't watching it to catch and handle **StopIteration** in a special way, so it's just an regular uncaught exception.

In the example **for** loop above, it's the **range()** object that's being iterated over.  Specifically, the results of calling `iter(range(10))`.  The body of the **for** loop is just regular Python code.

### StopIteration in generators

So we can't use **StopIteration** exceptions to break out of **for** loops, because the loop body isn't an iterator.  But Python generators create iterators, can we us them there?

In [2]:
def my_generator():
    i = 0
    while True:
        yield i
        i += 1
        if i == 7:
            raise StopIteration("Trying to escape a generator")

In [3]:
for x in my_generator():
    print(x)

0
1
2
3
4
5
6


RuntimeError: generator raised StopIteration

#### Results
Here is the output of the above code on my machine.

```python traceback
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[193], line 7
      6 if i == 7:
----> 7     raise StopIteration("Trying to escape a generator")

StopIteration: Trying to escape a generator

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[194], line 1
----> 1 for x in my_generator():
      2     print(x)

RuntimeError: generator raised StopIteration
```

This **RuntimeError** is the result of [PEP 479](https://peps.python.org/pep-0479/).  Very briefly, in practice when it looks like code is raising **StopIteration** exceptions manually it's usually a bug.  When generators handle these exceptions automatically by silently ending without errors, it makes what would otherwise be the highly-visible problem of an unhandled exception into a mostly invisible one.

#### StopIteration in custom iterable classes
So, if we can't manually use **StopIteration** exceptions inside loop bodies or generators, where can we use them?  Where do we implement the Python iterator protocol in the most manual possible way?

Classes can define special methods to participate in the interator protocol, and that's where you'd raise **StopIteration** exceptions yourself.

In [4]:
class CustomIterable:

    def __init__(self):
        self.i = 0

    def __iter__(self):
        return self

    def __next__(self):
        current_value = self.i
        self.i += 1
        if current_value >= 8:
            raise StopIteration("Escaping custom iterable with an exception")
        return current_value

In [5]:
for x in CustomIterable():
    print(x)

0
1
2
3
4
5
6
7


## Range objects


In [6]:
my_range = range(10)

In [7]:
print(type(my_range))

<class 'range'>


In [8]:
print(my_range)

range(0, 10)


In [9]:
print(list(my_range))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [10]:
print(my_range[2])

2


In [11]:
import collections.abc

In [12]:
print(f"Are ranges iterables? {isinstance(
    my_range, collections.abc.Iterable)}")
print(f"Are ranges iterators?  {isinstance(
    my_range, collections.abc.Iterator)}")

Are ranges iterables? True
Are ranges iterators?  False


In [13]:
print(f"Does calling iter() on a range return the same object each time? {
      iter(my_range) == iter(my_range)}")

Does calling iter() on a range return the same object each time? False


In [14]:
my_generator_expression = (x for x in range(10))

In [15]:
print(type(my_generator_expression))

<class 'generator'>


In [16]:
try:
    print(my_generator_expression[0])
except TypeError as e:
    print(e)

'generator' object is not subscriptable


In [17]:
print(f"Are generators iterables? {isinstance(
    my_generator_expression, collections.abc.Iterable)}")
print(f"Are ranges iterators?  {isinstance(
    my_generator_expression, collections.abc.Iterator)}")

Are generators iterables? True
Are ranges iterators?  True


In [18]:
print(f"Does calling iter() on a generator return the same object each time? {
      iter(my_generator_expression) == iter(my_generator_expression)}")

Does calling iter() on a generator return the same object each time? True


In [19]:
print(f"Does calling iter() on a generator just return itself? {
      iter(my_generator_expression) == my_generator_expression}")

Does calling iter() on a generator just return itself? True


## Where does this make a difference?


This matters because it changes what it means to pass the objects around. In particular, any time you want to continue iterating under different conditions, such as finding a matching value in a list or string, or finding the first warm day after a freeze in temperature data.


Imagine that you wanted to find the position of the matching quotes in the following string. You could iterate until until you find the first quotation mark, and then continue iterating until you find the second one.


In [20]:
some_pangrams = """Two common examples of pangrams are "The quick red fox jumps over the lazy brown dog." and "Sphynx of black quartz, judge my vow." """
print(some_pangrams)

Two common examples of pangrams are "The quick red fox jumps over the lazy brown dog." and "Sphynx of black quartz, judge my vow." 


Let us try with generator expressions and with range objects directly.

### Iterating with a generator expression

In [21]:
def find_quote_with_generator_expression(quote_string=some_pangrams, starting_position=0):
    quote_start_pos = None
    quote_end_pos = None
    pos_generator = (i for i in range(len(quote_string)))
    for possible_start_pos in pos_generator:
        if quote_string[possible_start_pos] == '"':
            quote_start_pos = possible_start_pos
            break
    # We have found the first quotation mark.  Now let us continue iterating until we find the next one.
    for possible_end_pos in pos_generator:
        # Uncomment the following print statement if you want a step-by-step view of the results
        # print(f"checking {possible_end_pos}, {quote_string[possible_end_pos]}")
        if quote_string[possible_end_pos] == '"':
            quote_end_pos = possible_end_pos
            break
    return (quote_start_pos, quote_end_pos)

In [22]:
print(find_quote_with_generator_expression())

(36, 85)


This seems to work the way that we might expect.  The second **for** loop continues the iteration.

### Iterating with range objects

In [23]:
def find_quote_with_range(quote_string=some_pangrams, starting_position=0):
    quote_start_pos = None
    quote_end_pos = None
    pos_range = range(starting_position, len(quote_string))
    for possible_start_pos in pos_range:
        if quote_string[possible_start_pos] == '"':
            quote_start_pos = possible_start_pos
            break
    # We have found the first quotation mark.  Now let us continue iterating until we find the next one.
    for possible_end_pos in pos_range:
        # Uncomment the following print statement if you want a step-by-step view of the results
        # print(f"checking {possible_end_pos}, {quote_string[possible_end_pos]}")
        if quote_string[possible_end_pos] == '"':
            quote_end_pos = possible_end_pos
            break
    return (quote_start_pos, quote_end_pos)

In [24]:
print(find_quote_with_range())

(36, 36)


This doesn't work the way that we migh expect.  The function finds the first quotation mark both times, because each for loop starts from the begining of the string.

This might seem counterintuitive.  In the first version we pass the same generator to both **for** loops and it works.  In the second version we pass the same **range()** object to both **for** loops and it doesn't work.  What's the difference?

### Explaining the difference between using range and generator objects

The two functions above look simlar but behave differently because Python's **for** loop doesn't iterate over the object you pass to the **in** clause.  Instead, it calls **iter()** on that object and iterates over whatever it returns.

When **iter()** is called on a generator expression, and on iterators generally, the result is themselves.  Each time **iter()** is called on a range object, a new iterator starting from the beginning is returned.

### Can we fix the range-based function?

To fix the range based version of the function we need to make it more like the generator based one.
1.  We need some object to hold the state of the iteration
2.  We need both **for** loops to iterate over that object
1.  **for** loops obtain the object that they iterate over by calling **iter()** on the object in their **in** clause
1.  Therefore, to control what **for** loops iterate over, we need to control what the **iter()** function returns
1.  iterators return themselves when **iter()** is called on them
1.  Therefore an iterator object can be used to preserve the state of iteration across loops

In [25]:
def find_quote_with_range_iterator(quote_string=some_pangrams, starting_position=0):
    quote_start_pos = None
    quote_end_pos = None
    pos_range = range(starting_position, len(quote_string))
    pos_range_iterator = iter(pos_range)
    for possible_start_pos in pos_range_iterator:
        if quote_string[possible_start_pos] == '"':
            quote_start_pos = possible_start_pos
            break
    # We have found the first quotation mark.  Now let us continue iterating until we find the next one.
    for possible_end_pos in pos_range_iterator:
        # Uncomment the following print statement if you want a step-by-step view of the results
        # print(f"checking {possible_end_pos}, {quote_string[possible_end_pos]}")
        if quote_string[possible_end_pos] == '"':
            quote_end_pos = possible_end_pos
            break
    return (quote_start_pos, quote_end_pos)

In [26]:
print(find_quote_with_range_iterator())

(36, 85)


## What about generator functions?

Each call to a generator function returns a new generator iterator.

This is more visible when using Python type hints, since a generator function that **yield**s, say, integers, doesn't return any integers.  It returns an iterator that produces them.

In [27]:
def my_generator_function():
    while True:
        yield 7

In [28]:
print(type(my_generator_function))

<class 'function'>


In [29]:
my_generator_function_result = my_generator_function()

In [30]:
type(my_generator_function_result)

generator

In [31]:
my_generator_iterator_result = next(my_generator_function_result)
print(f"The iterator produced {my_generator_iterator_result} of type {
      type(my_generator_iterator_result)}")

The iterator produced 7 of type <class 'int'>
