# Iterators

**Iterable** objects are Python objects that contain a countable number of values (data) and that can be iterated upon, i.e. you can traverse through their values one by one. Many standard Python objects are iterable:

- Sequences: lists, tuples, strings, etc.
- Dictionaries
- File objects
- Sets
- ...

Under each iterable is hidden an **iterator**. 

**Iterators** are simple tools to browse **iterable** objects. 

In [1]:
list_numbers = [1, 2, 3, 4, 5]

print(list_numbers.__iter__())

<list_iterator object at 0x7f371826ec50>


The iterator enables to decorrelate the objects that contain the data from the object that iterates on data, based on an **iteration protocol**.

Iteration protocol is formed by the existence of two methods:

- __iter__() returns the iterator itself
- __next__() returns:
    - The next item
    - StopIteration exception if there are no further items

<u> Remark </u>: an iterator is also an iterable object since it has a method **iter**(). Hence all methods iterating on iterable (for, while, etc.) can also take an iterator as input. 

<u> Example </u>:

In [2]:
list_numbers = [1, 2, 3, 4, 5]

I want to browse my list of numbers. For that, I can define an iterator

In [3]:
it = iter(list_numbers)

In [4]:
next(it)

1

The iterator retains the position of the iteration ! 

Once we reach the end of the object (StopIteration), we need to define a new iterator to browse the object a second time: each iterator is a **single-use object**. 

An iterator is also **ad-hoc** and **unique**.

In [5]:
list_numbers.__iter__() == list_numbers.__iter__()

False

NATURALLY, you can use **for** loops to do the same iteration on *list_numbers*. 

In practice, **for** loops are **iteration mechanisms** and are based on iterators: they apply __iter__() and __next__(). 

In [6]:
def for_loop(iterable, action_to_do):
    iterator = iter(iterable)
    done_looping = False
    returns = []
    while not done_looping:
        try:
            item = next(iterator)
            returns.append(action_to_do(item))
        except StopIteration:
            done_looping = True
            return returns

In [7]:
list_numbers = [1, 2, 3, 4, 5]
for_loop(list_numbers, lambda x: x**2)

[1, 4, 9, 16, 25]

##### <u> Question </u>: if standard iteration mechanisms (e.g. for loops) do it for you, why is it useful to know iterators?

<u> Answer 1</u>: you want to iterate on objects that are not loaded in memory

<u> Example 1 </u>: when you **open** a file, you create an iterator that reads the line of the file.

In [8]:
file = open('winequality-white.csv')

In [9]:
(file == file.__iter__()) and (file.__iter__() == file.__iter__())

True

The **file** is an iterator since it is its own iterator (that is then unique). Not loading the file into iterable object saves memory.

In [10]:
import os
import sys

print('Size of file (bytes): {}'.format(os.path.getsize('winequality-white.csv')))
print('Size of file iterator (bytes): {}'.format(sys.getsizeof(file)))

Size of file (bytes): 264426
Size of file iterator (bytes): 224


It is called **laziness**:

*Iterators allow us to both work with and create lazy iterables that don't do any work until we ask them for their next item.*

Because of their laziness, the iterators can help us to deal with infinitely long iterables. In some cases, we can't even store all the information in the memory, so we can use an iterator which can give us the next item every time we ask it. Iterators can save us a lot of memory and CPU time.

<u> Example 2 </u>: the **zip** function takes two iterable objects.

In [11]:
list_numbers = [1, 2, 3, 4, 5, 6]
list_letters = ['a', 'b', 'c', 'd', 'e']

z = zip(list_numbers, list_letters)

At each iteration *i*, a tuple (list_numbers[i], list_letters[i]) is created

In [12]:
for i in z:
    print(i)

(1, 'a')
(2, 'b')
(3, 'c')
(4, 'd')
(5, 'e')


In [13]:
for i in z:
    print(i)

In [14]:
(z == z.__iter__()) and (z.__iter__() == z.__iter__())

True

In [15]:
print('Size of lists (bytes): {}'.format(sys.getsizeof(list_numbers) + sys.getsizeof(list_letters)))
print('Size of zip (bytes): {}'.format(sys.getsizeof(z)))

Size of lists (bytes): 232
Size of zip (bytes): 72


Single-use:

In [16]:
for i in z:
    print(i)

The iterator is dead since already used.  

<u> Remarks </u>
- Iterators are cheap and simple to create also because of their single-use property!
- Iterators are hidden in many standard Python built-in functions: map(), enumerate(), reversed(), range(), etc.

<u> Answer 2</u>: you can create your own objects with appropriate iteration mechanism. 

In [17]:
class increment_numbers:
    def __init__(self, start_value, stop_value):
        self.current = start_value
        self.high = stop_value

    def __iter__(self):
        return self

    def __next__(self):
        if self.current > self.high:
            raise StopIteration
        else:
            self.current += 1
            return self.current - 1

numbers = increment_numbers(1, 5)

In [18]:
print(type(numbers))

<class '__main__.increment_numbers'>


In [19]:
print(next(numbers))
print(next(numbers))
print(next(numbers))
print(next(numbers))
print(next(numbers))
print(next(numbers))

1
2
3
4
5


StopIteration: 

### Iterators and generators

I presented the concept of **iterator**. I showed how to use it and that it is underlying many iteration mechanisms and functions in Python. 

Here I will present another interesting application of iterators: the **generator functions** and **generator expressions**.

#### Generator functions

<u> Definition from Pythpn docs</u>: a function which returns a **generator iterator**. It looks like a normal function except that it contains **yield** expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function.

#### Generator iterator

<u> Definition from Pythpn docs</u>: an object created by a generator function.

In [20]:
def increment_numbers(start_value, stop_value):
    while start_value <= stop_value:
        yield start_value
        start_value += 1
    return 42

numbers = increment_numbers(1, 5)

In [21]:
print(type(numbers))

<class 'generator'>


In [22]:
print(next(numbers))
print(next(numbers))
print(next(numbers))
print(next(numbers))
print(next(numbers))
print(next(numbers))

1
2
3
4
5


StopIteration: 42

In [23]:
numbers = increment_numbers(1, 5)
list(numbers)

[1, 2, 3, 4, 5]

The yield expression is the thing that separates a generation function from a normal function. This expression is helping us to use the iterator'ss laziness.

<u> From Python docs</u>:
    
*Each* **yield** *temporarily suspends processing, remembering the location execution state, and returns the value. When the generator iterator resumes, it picks up where it left off (in contrast to functions which start fresh on every invocation).*

#### Generator expressions

The generator expressions are very similar to the **list comprehensions**.

<u>Reminder on list comprehension</u>: [**output** for **iteration** in **iterable** if *(optional condition)*]

<u>Example</u>:

In [24]:
numbers = [1, 2, 3, 4, 5]
squares = [number**2 for number in numbers if number > 2]
print(squares)

[9, 16, 25]


There exists comprehension for all iterable objects: sequences, dictionaries, etc.

<u>Example</u>: set comprehension

In [25]:
prenoms = ['ana', 'eve', 'ALICE', 'Anne', 'bob', 'alice', 'AlIcE', 'Alice'] 
print(prenoms)

['ana', 'eve', 'ALICE', 'Anne', 'bob', 'alice', 'AlIcE', 'Alice']


In [26]:
a_prenoms = [p.lower() for p in prenoms if p.lower().startswith('a')] 
print(a_prenoms)

['ana', 'alice', 'anne', 'alice', 'alice', 'alice']


In [27]:
print(set(a_prenoms))

{'ana', 'alice', 'anne'}


In [28]:
a_prenoms = {p.lower() for p in prenoms if p.lower().startswith('a')} 
print(a_prenoms)

{'ana', 'alice', 'anne'}


<u>Example</u>: dictionary comprehension

In [29]:
ages = [('ana', 20), ('EVE', 30), ('bob', 40)] 
ages = dict(ages) 
print(ages)

{'ana': 20, 'EVE': 30, 'bob': 40}


In [30]:
ages_fix = {p.lower():a for p, a in ages.items()} 
print(ages_fix)

{'ana': 20, 'eve': 30, 'bob': 40}


In [31]:
ages_fix = {p.lower():a for p, a in ages.items() if a < 40} 
print(ages_fix)

{'ana': 20, 'eve': 30}


##### From comprehension to generation

A problem with a comprehension is that it creates a temporary structure

<u>Generator expression</u>: (**output** for **iteration** in **iterable** if *(optional condition)*)


In [32]:
%%time 
square = [x**2 for x in range(1000)]

CPU times: user 522 µs, sys: 0 ns, total: 522 µs
Wall time: 551 µs


In [33]:
%%time
sum(square)

CPU times: user 14 µs, sys: 4 µs, total: 18 µs
Wall time: 22.4 µs


332833500

In [34]:
%%time 
square = (x**2 for x in range(1000))

CPU times: user 14 µs, sys: 3 µs, total: 17 µs
Wall time: 21.9 µs


In [35]:
square

<generator object <genexpr> at 0x7f37181bfa50>

In [36]:
%%time
sum(square)

CPU times: user 338 µs, sys: 0 ns, total: 338 µs
Wall time: 343 µs


332833500

The sum on the generative expression (an iterator, NDLR) computes the square on-the-fly while the iterator iterates on the iterator range(1000)

<u>Remark</u> we can do it with the Python built-in function *map()*:

In [37]:
square = map(lambda x: x**2, range(1000))
sum(square)

332833500

Using iterator to avoid temporary data structure is very trendy in Python, in particular in Big Data problems. 

Since generator expressions have the same limitations than comprehensions (only one output expression can be defined) we can:

- chain the generator expressions

In [38]:
square = map(lambda x: x**2, range(1000))
palindrome = (x for x in square if str(x) == str(x)[::-1])

list(palindrome)

[0,
 1,
 4,
 9,
 121,
 484,
 676,
 10201,
 12321,
 14641,
 40804,
 44944,
 69696,
 94249,
 698896]

- use generative functions

In [39]:
square = map(lambda x: x**2, range(1000))
condition = lambda x: isinstance(x, (str, int)) and str(x) == str(x)[::-1]

In [40]:
def palindrome(iterator, condition):
    for i in iterator:
        if condition(i):
            yield i

In [41]:
p = palindrome(square, condition)

In [42]:
list(p)

[0,
 1,
 4,
 9,
 121,
 484,
 676,
 10201,
 12321,
 14641,
 40804,
 44944,
 69696,
 94249,
 698896]

In [44]:
condition = lambda x: isinstance(x, (str, int)) and str(x) == str(x)[::-1] and x%2==0

#### What will be the output of the next cell?

In [None]:
p = palindrome(square, condition)
list(p)

### Takeaways / Summary

- An iterable is something you can loop over.
- Sequences are a very common type of iterable.
- Many things in Python are iterables, but not all of them are sequences.
- An iterator is an object representing a stream of data. It does the iterating over an iterable. You can use an iterator to get the next value or to loop over it. Once, you loop over an iterator, there are no more stream values.
- Iterators use the lazy evaluation approach.
- Many built-in classes in Python are iterators.
- A generator function is a function which returns an iterator.
- A generator expression is an expression that returns an iterator.