# <font color='red'> III. Useful tools

*The text and code are released under the [CC0](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/LICENSE) license; see also the companion project, the [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook).*

# Iterators

Often an important piece of data analysis is repeating a similar calculation, over and over, in an automated fashion.
For example, you may have a table of a names that you'd like to split into first and last, or perhaps of dates that you'd like to convert to some standard format.
One of Python's answers to this is the *iterator* syntax.
We've seen this already with the ``range`` iterator:

In [None]:
for i in range(10):
    print(i, end=' ')

## Iterating over lists
Iterators are perhaps most easily understood in the concrete case of iterating through a list.
Consider the following:

In [None]:
for value in [2, 4, 6, 8, 10]:
    # do some operation
    print(value + 1, end=' ')

## ``range()``: A List Is Not Always a List
Perhaps the most common example of this indirect iteration is the ``range()`` function in Python 3 (named ``xrange()`` in Python 2), which returns not a list, but a special ``range()`` object:

In [None]:
range(10)

``range``, like a list, exposes an iterator:

So Python knows to treat it *as if* it's a list:

In [None]:
for i in range(10):
    print(i, end=' ')

The benefit of the iterator indirection is that *the full list is never explicitly created!*
We can see this by doing a range calculation that would overwhelm our system memory if we actually instantiated it (note that in Python 2, ``range`` creates a list, so running the following will not lead to good things!):

In [None]:
N = 10 ** 12
for i in range(N):
    if i >= 10: break
    print(i, end=', ')

If ``range`` were to actually create that list of one trillion values, it would occupy tens of terabytes of machine memory: a waste, given the fact that we're ignoring all but the first 10 values!

In fact, there's no reason that iterators ever have to end at all!
Python's ``itertools`` library contains a ``count`` function that acts as an infinite range:

In [None]:
from itertools import count

for i in count():
    if i >= 10:
        break
    print(i, end=', ')

Had we not thrown-in a loop break here, it would go on happily counting until the process is manually interrupted or killed (using, for example, ``ctrl-C``).

## Useful Iterators
This iterator syntax is used nearly universally in Python built-in types as well as the more data science-specific objects we'll explore in later sections.
Here we'll cover some of the more useful iterators in the Python language:

### ``enumerate``
Often you need to iterate not only the values in an array, but also keep track of the index.
You might be tempted to do things this way:

In [None]:
L = [2, 4, 6, 8, 10]
for i in range(len(L)):
    print(i, L[i])

Although this does work, Python provides a cleaner syntax using the ``enumerate`` iterator:

In [None]:
for i, val in enumerate(L):
    print(i, val)

This is the more "Pythonic" way to enumerate the indices and values in a list.

### ``zip``
Other times, you may have multiple lists that you want to iterate over simultaneously.
You could certainly iterate over the index as in the non-Pythonic example we looked at previously, but it is better to use the ``zip`` iterator, which zips together iterables:

In [None]:
L = [2, 4, 6, 8, 10]
R = [3, 6, 9, 12, 15]
for lval, rval in zip(L, R):
    print(lval, rval)

Any number of iterables can be zipped together, and if they are different lengths, the shortest will determine the length of the ``zip``.

## Defining Functions
Functions become even more useful when we begin to define our own, organizing functionality to be used in multiple places.
In Python, functions are defined with the ``def`` statement.
For example, we can encapsulate a version of our Fibonacci sequence code from the previous section as follows:

In [None]:
def fibonacci(N):
    L = []
    a, b = 0, 1
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L

Now we have a function named ``fibonacci`` which takes a single argument ``N``, does something with this argument, and ``return``s a value; in this case, a list of the first ``N`` Fibonacci numbers:

In [None]:
fibonacci(10)

## ``*args`` and ``**kwargs``: Flexible Arguments
Sometimes you might wish to write a function in which you don't initially know how many arguments the user will pass.
In this case, you can use the special form ``*args`` and ``**kwargs`` to catch all arguments that are passed.
Here is an example:

In [None]:
def catch_all(*args, **kwargs):
    print("args =", args)
    print("kwargs = ", kwargs)

In [None]:
catch_all(1, 2, 3, a=4, b=5)

In [None]:
catch_all('a', keyword=2)

ere it is not the names ``args`` and ``kwargs`` that are important, but the ``*`` characters preceding them.
``args`` and ``kwargs`` are just the variable names often used by convention, short for "arguments" and "keyword arguments".
The operative difference is the asterisk characters: a single ``*`` before a variable means "expand this as a sequence", while a double ``**`` before a variable means "expand this as a dictionary".
In fact, this syntax can be used not only with the function definition, but with the function call as well!

## Anonymous (``lambda``) Functions
Earlier we quickly covered the most common way of defining functions, the ``def`` statement.
You'll likely come across another way of defining short, one-off functions with the ``lambda`` statement.
It looks something like thi

In [None]:
add = lambda x, y: x + y
add(1, 2)

This lambda function is roughly equivalent to

In [None]:
def add(x, y):
    return x + y

### ``map`` and ``filter``
The ``map`` iterator takes a function and applies it to the values in an iterator:

In [None]:
# find the first 10 square numbers
square = lambda x: x ** 2
for val in map(square, range(10)):
    print(val, end=' ')

The ``filter`` iterator looks similar, except it only passes-through values for which the filter function evaluates to True:

In [None]:
# find values up to 10 for which x % 2 is zero
is_even = lambda x: x % 2 == 0
for val in filter(is_even, range(10)):
    print(val, end=' ')

## Specialized Iterators: ``itertools``

We briefly looked at the infinite ``range`` iterator, ``itertools.count``.
The ``itertools`` module contains a whole host of useful iterators; it's well worth your while to explore the module to see what's available.
As an example, consider the ``itertools.permutations`` function, which iterates over all permutations of a sequence:

In [None]:
from itertools import permutations
p = permutations(range(3))
print(*p)

Similarly, the ``itertools.combinations`` function iterates over all unique combinations of ``N`` values within a list:

In [None]:
from itertools import combinations
c = combinations(range(4), 2)
print(*c)

Somewhat related is the ``product`` iterator, which iterates over all sets of pairs between two or more iterables:

In [None]:
from itertools import product
p = product('ab', range(3))
print(*p)

Many more useful iterators exist in ``itertools``: the full list can be found, along with some examples, in Python's [online documentation](https://docs.python.org/3.5/library/itertools.html).

## List Comprehensions

If you read enough Python code, you'll eventually come across the terse and efficient construction known as a *list comprehension*.
This is one feature of Python I expect you will fall in love with if you've not used it before; it looks something like this:

In [None]:
[i for i in range(20) if i % 3 > 0]

### Basic List Comprehensions
List comprehensions are simply a way to compress a list-building for-loop into a single short, readable line.
For example, here is a loop that constructs a list of the first 12 square integers:

In [None]:
L = []
for n in range(12):
    L.append(n ** 2)
L

In [None]:
[n ** 2 for n in range(12)]

### Multiple Iteration
Sometimes you want to build a list not just from one value, but from two. To do this, simply add another ``for`` expression in the comprehension:

In [None]:
[(i, j) for i in range(2) for j in range(3)]

### Conditionals on the Iterator
You can further control the iteration by adding a conditional to the end of the expression.
In the first example of the section, we iterated over all numbers from 1 to 20, but left-out multiples of 3.
Look at this again, and notice the construction:

In [None]:
[val for val in range(20) if val % 3 > 0]

In [None]:
L = []
for val in range(20):
    if val % 3:
        L.append(val)
L

In [None]:
midpoint = 5 # set the midpoint

lower = []; upper = [] # make two empty lists

for i in range(10): # split the numbers into lower and upper
    if (i < midpoint):
        lower.append(i)
    else:
        upper.append(i)
        
print("lower:", lower)
print("upper:", upper)