# **Itertools**

Source URL: https://realpython.com/python-itertools/

In [2]:
list(zip([1, 2, 3], ['a', 'b', 'c']))

[(1, 'a'), (2, 'b'), (3, 'c')]

Under the hood, the `zip()` function works, in essence, by calling `iter()` on each of its arguments, then advancing each iterator returned by `iter()` with `next()` and aggregating the results into tuples. The iterator returned by `zip()` iterates over these tuples.

The `map()` built-in function is another “iterator operator” that, in its simplest form, applies a single-parameter function to each element of an iterable one element at a time:

In [1]:
list(map(len, ['abc', 'de', 'fghi']))

[3, 2, 4]

The `map()` function works by calling `iter()` on its second argument, advancing this iterator with `next()` until the iterator is exhausted, and applying the function passed to its first argument to the value returned by `next()` at each step. In the above example, `len()` is called on each element of `['abc', 'de', 'fghi']` to return an iterator over the lengths of each string in the list.

In [3]:
# Composing map() and zip() to produce an iterator over combinations of elements in more than one iterable
list(map(sum, zip([1, 2, 3], [4, 5, 6])))

[5, 7, 9]

There are two main reasons why such an “iterator algebra” is useful: improved memory efficiency (via lazy evaluation) and faster execution time. To see this, consider the following problem:

```
Given a list of values inputs and a positive integer n, write a function that splits inputs into groups of length n. For simplicity, assume that the length of the input list is divisible by n. For example, if inputs = [1, 2, 3, 4, 5, 6] and n = 2, your function should return [(1, 2), (3, 4), (5, 6)].
```

In [5]:
# Considering a naive approach
def naive_grouper(inputs, n):
    num_groups = len(inputs) // n
    return [tuple(inputs[i * n: (i + 1) * n]) for i in range(num_groups)]

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
naive_grouper(nums, 2)

[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

What happens when you try to pass it a list with, say, 100 million elements? You will need a whole lot of available memory! Even if you have enough memory available, your program will hang for a while until the output list is populated.

To see this, store the following in a script called `naive.py`:

In [None]:
## DO NOT RUN THIS CELL !!!
def naive_grouper(inputs, n):
    num_groups = len(inputs) // n
    return [tuple(inputs[i*n:(i+1)*n]) for i in range(num_groups)]


for _ in naive_grouper(range(100000000), 10):
    pass

From the console, you can use the `time` command (on UNIX systems) to measure memory usage and CPU user time. ***Make sure you have at least 5GB of free memory before executing the following:***

```
time -f "Memory used (kB): %M\nUser time (seconds): %U" python3 naive.py
```
*(On Ubuntu, you may need to run `/usr/bin/time` instead of `time`)*

The list and tuple implementation in `naive_grouper()` requires approximately 4.5GB of memory to process range(100000000). Working with iterators drastically improves this situation. Consider the following:

In [8]:
def better_grouper(inputs, n):
    iters = [iter(inputs)] * n
    return zip(*iters)

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
list(better_grouper(nums, 2))

[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

The `better_grouper()` function is better because, without the reference to the `len()` built-in, it can take any iterable as an argument (even infinite iterators). Also, by returning an iterator rather than a list, `better_grouper()` can process enormous iterables without trouble and uses much less memory.