# Iterators

Often an important piece of data analysis is repeating a similar calculation, over and over, in an automated fashion.

For example, you may have a table of a names that you'd like to split into first and last, or perhaps of dates that you'd like to convert to some standard format.

One of Python's answers to this is the *iterator* syntax.
We've seen this already with the ``range`` iterator

<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="fig/cover-small.jpg">
*This notebook contains an excerpt from the [Whirlwind Tour of Python](http://www.oreilly.com/programming/free/a-whirlwind-tour-of-python.csp) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/WhirlwindTourOfPython).*

*The text and code are released under the [CC0](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/LICENSE) license; see also the companion project, the [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook).*


<!--NAVIGATION-->
< [Errors and Exceptions](09-Errors-and-Exceptions.ipynb) | [Contents](Index.ipynb) | [List Comprehensions](11-List-Comprehensions.ipynb) >

In [53]:
#range(10)
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

Here we're going to dig a bit deeper.

- In Python 3, ``range`` is not a list, but is something called an *iterator*

## Iterating over lists
Iterators are perhaps most easily understood in the concrete case of iterating through a list.

Consider the following:

In [1]:
for value in [2, 4, 6, 8, 10]:
    # do some operation
    print(value + 1, end=' ')

3 5 7 9 11 

"``for x in y``" syntax allows us to repeat some operation for each value in the list.
- "*for [each] value in [the] list*"
- the Python interpreter checks whether it has an *iterator* interface
    - check with the built-in ``iter`` function:

In [62]:
iter([2, 4, 6, 8, 10])
# listi = iter([2, 4, 6, 8, 10])
# print(next(listi), next(listi), next(listi))

<list_iterator at 0x1099d2d68>

The iterator object provides the functionality required by the ``for`` loop.
- The ``iter`` object is a container
- it gives you access to the next object for as long as it's valid
    - which can be seen with the built-in function ``next``:

In [63]:
I = iter([2, 4, 6, 8, 10])

In [64]:
print(next(I))

2


In [65]:
print(next(I))

4


In [66]:
print(next(I))

6


This is incredibly useful
- because it allows Python to treat things as lists that are *not actually lists*.

## ``range()``
#### The most common example of indirect iteration 
- ``range()`` function in Python 3 
- ``xrange()`` in Python 2

``range()`` returns not a list, but a special ``range()`` object:

In [7]:
range(10)

range(0, 10)

``range``, like a list, exposes an iterator:

In [8]:
iter(range(10))

<range_iterator at 0x109488690>

So Python knows to treat it *as if* it's a list:

In [10]:
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

The benefit of the iterator indirection
- in python 3, ``range()`` does **NOT** explicitly create *the full list!*
- in Python 2, ``range`` creates a list
    - so in python 2 running a for-loop for a huge list would overwhelm our system memory

In [67]:
N = 10 ** 12
for i in range(N):
    if i >= 10: break
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 

If we actually create a list of one trillion values
- it would occupy tens of terabytes of machine memory
- it is a waste, because our calcualtion does not need so many values.

In fact, iterators do NOT have to end at all!
- Python's ``itertools`` library contains a ``count`` function that acts as an infinite range:

In [68]:
from itertools import count

for i in count():
    if i >= 10:
        break 
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 

Without a loop break here, it would go on happily counting until the process is manually interrupted or killed (using, for example, ``ctrl-C``).

## Some Useful Iterators

Here we'll cover some of the more useful iterators in the Python language:

### ``enumerate``

Often you need to 
- iterate not only the values in an array, 
- but also keep track of the index.

You might be tempted to do things this way:

In [14]:
L = [2, 4, 6, 8, 10]
for k in range(len(L)):
    print(k, L[k])

0 2
1 4
2 6
3 8
4 10


Although this does work, Python provides a cleaner syntax using the ``enumerate`` iterator:

In [71]:
#enumerate(L)
for k, i in enumerate(L):
    print(k, i) 

enumerate

This is the more "Pythonic" way to enumerate the indices and values in a list.

### ``zip``

to iterate over multiple lists simultaneously.

- it is better to use the ``zip`` iterator
    - which zips together iterables:

In [15]:
L = [2, 4, 6, 8, 10]
R = [3, 6, 9, 12, 15]
for li, ri in zip(L, R):
    print(li, ri)

2 3
4 6
6 9
8 12
10 15


In [19]:
list(zip(L, R))

[(2, 3), (4, 6), (6, 9), (8, 12), (10, 15)]

Any number of iterables can be zipped together, 
- and if they are different lengths, the shortest will determine the length of the ``zip``.

In [20]:
L = [2, 4, 6]
R = [3, 6, 9, 12, 15]
for li, ri in zip(L, R):
    print(li, ri)

2 3
4 6
6 9


### ``map`` and ``filter``
The ``map`` iterator 
- takes a function 
- and applies the function to the values in an iterator

In [21]:
# find the first 10 square numbers
square = lambda x: x ** 2
for val in map(square, range(10)):
    print(val, end=' ')

0 1 4 9 16 25 36 49 64 81 

In [23]:
list(map(square, range(10)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The ``filter`` iterator looks similar, except it only passes-through values for which the filter function evaluates to True:

In [24]:
# find values up to 10 for which x % 2 is zero
is_even = lambda x: x % 2 == 0
for val in filter(is_even, range(10)):
    print(val, end=' ')

0 2 4 6 8 

The ``map`` and ``filter`` functions, along with the ``reduce`` function (which lives in Python's ``functools`` module) are fundamental components of the *functional programming* style

see, for example, the [pytoolz](https://toolz.readthedocs.org/en/latest/) library.

### Iterators as function arguments

We have learnt [``*args`` and ``**kwargs``: Flexible Arguments](#*args-and-**kwargs:-Flexible-Arguments). 

- ``*args`` and ``**kwargs`` can be used to pass sequences and dictionaries to functions.

the ``*args`` syntax works not just with sequences, but with any iterator:

# ``*args`` in a function turns Iterators into function's iteratable arguments

In [73]:
print(*range(10))

0 1 2 3 4 5 6 7 8 9


In [33]:
*range(10)

SyntaxError: can't use starred expression here (<ipython-input-33-822f3c7b2e82>, line 1)

In [31]:
print(range(10))

range(0, 10)


In [32]:
print(list(range(10)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


So, for example, we can get tricky and compress the ``map`` example from before into the following:

In [82]:
print(  *map( lambda x: x ** 2, range(10) )  )

# square = lambda x: x**2
# map_object = map(square, range(10))
# print(*map_object)

0 1 4 9 16 25 36 49 64 81


In [30]:
map_object = map( lambda x: x ** 2, range(10) ) 
# it is a map object
list( map_object )

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

why is there no ``unzip()`` function which does the opposite of ``zip()``?
- the opposite of ``zip()`` is... ``zip()``! 
- The key is that ``zip()`` can zip-together any number of iterators or sequences. Observe:

In [20]:
L1 = (1, 2, 3, 4)
L2 = ('a', 'b', 'c', 'd')

In [21]:
z = zip(L1, L2)
print(*z)

(1, 'a') (2, 'b') (3, 'c') (4, 'd')


In [38]:
new_L1, new_L2 = zip((1, 'a'), (2, 'b'), (3, 'c') ,(4, 'd'))
print(new_L1, new_L2)

(1, 2, 3, 4) ('a', 'b', 'c', 'd')


In [22]:
L1 = (1, 2, 3, 4)
L2 = ('a', 'b', 'c', 'd')
z = zip(L1, L2)
new_L1, new_L2 = zip(*z)
print(new_L1, new_L2)

(1, 2, 3, 4) ('a', 'b', 'c', 'd')


Ponder this for a while. 
- If you understand why it works, you'll have come a long way in understanding Python iterators!

## Specialized Iterators: ``itertools``


The ``itertools`` module contains a whole host of useful iterators
- We have briefly looked at the infinite ``range`` iterator, ``itertools.count``.
- consider the ``itertools.permutations`` function, 
    - which iterates over all permutations of a sequence:

In [39]:
from itertools import permutations
p = permutations(range(3))
print(*p)

(0, 1, 2) (0, 2, 1) (1, 0, 2) (1, 2, 0) (2, 0, 1) (2, 1, 0)


In [41]:
p

<itertools.permutations at 0x109485bf8>

Similarly, the ``itertools.combinations`` function iterates over all unique combinations of ``N`` values within a list:

In [43]:
from itertools import combinations
c = combinations(range(4), 2)
print(*c)

(0, 1) (0, 2) (0, 3) (1, 2) (1, 3) (2, 3)


Somewhat related is the ``product`` iterator, which iterates over all sets of pairs between two or more iterables:

In [44]:
from itertools import product
p = product('ab', range(3))
print(*p)

('a', 0) ('a', 1) ('a', 2) ('b', 0) ('b', 1) ('b', 2)


Many more useful iterators exist in ``itertools``: the full list can be found, along with some examples, in Python's [online documentation](https://docs.python.org/3.5/library/itertools.html).

<!--NAVIGATION-->
< [Errors and Exceptions](09-Errors-and-Exceptions.ipynb) | [Contents](Index.ipynb) | [List Comprehensions](11-List-Comprehensions.ipynb) >