# Itertools and Advanced Classes

`itertools` represents in Python 3 a heavy part of the Python backbone, and indeed many tasks can be achieved effortlessly through the use of `itertools`. As you may have experimented, one of the primary drawbacks of Python is the slow nature of **looping**, since the Python interpreter is not optimized for handling complex code within a `for` or `while` loop. This aspect will be covered more in the *Performance* notebook.

One of the ways around this is by using the `itertools` library suite. Also, the functions themselves are intuitively named, modestly fast, elegant and memory-efficient. `itertools` provides building-block functions inspired by constructs from APL, Haskell and SML. Together they form a kind of "iterator algebra" making it possible to construct a specialized tool very succinctly and efficiently in Python. For more on this, look at the `itertools` [documentation](https://docs.python.org/3/library/itertools.html) and this very [helpful guide](https://realpython.com/python-itertools/) where this teaching material is primarily drawn from.

Loosely speaking, this means that functions in `itertools` build on top of iterators to produce more complex ones. For example, `zip()` is an *in-built* Python function which takes an arbitrary number of iterables as arguments and returns an iterator over tuples of their corresponding elements:

In [1]:
list(zip([1, 2, 3], ["a","b","c"]))

[(1, 'a'), (2, 'b'), (3, 'c')]

In this case, `[1, 2, 3]` and `["a", "b", "c"]` are lists, and are iterable, which means they can return their elements one at a time. As an extension to this, **any Python object** that implements the `.__iter__()` or `.__getitem__()` methods is *iterable*.

The `iter()` built-in function, when called on an iterable, returns an iterable object for that iterable:

In [2]:
iter([1, 2, 3, 4])

<list_iterator at 0x18e9536b320>

Under the hood, `zip()` works by calling `iter()` on each of its arguments, then advancing each iterator return by `iter()` with `next()` and aggregating into tuples. 

The `map()` built-in function is another operator, where, it applies a single-parameter function to each element of an iterable one element at a time:

In [3]:
list(map(len, ["abc", "de", "fghi"]))

[3, 2, 4]

Like `zip()`, `map()` also makes use of `iter()` to advance the iterator over the list with `next()` until the iterator is exhausted, and apply the `len` function (or any function) to the value returned by `next()` at each step. 

Since iterators are *iterable*, you can compose `zip()` and `map()` together to produce an iterator over combinations of elements in more than one iterable. Take the following example:

In [4]:
list(map(sum, zip([1,2,3], [4,5,6])))

[5, 7, 9]

This is what is meant by functions in `itertools` forming an iterator algebra; this helps to form specialized data pipelines.

There are two positive reasons which such iterator algebra may be useful: firstly it improves memory efficiency and secondly faster execution time. Consider the following problem:

    Given a list of values _inputs and a positive integer _n, write a function that splits _inputs into groups of length _n. For simplicity, assume that the length of the input list is divisible by _n. For example, if _inputs = [1, 2, 3, 4, 5, 6] and _n = 2, your function should return [(1,2), (3,4), (5,6)].
    
With a naive approach, we may write something like:

In [5]:
def naive_grouper(inputs, n):
    # integer division
    n_groups = len(inputs) // n
    return [tuple(inputs[i*n:(i+1)*n]) for i in range(n_groups)]

Testing this works as expected:

In [6]:
naive_grouper([1,2,3,4,5,6], 2)

[(1, 2), (3, 4), (5, 6)]

But what about if you try to pass a list with 100 million elements? You will need a lot of memory! Even if you have the memory, the program will hang until the output list is populated. Try below at your peril (if you have $\lt$ 5GB DRAM, enjoy):

In [9]:
def call_naive():
    for _ in naive_grouper(range(100000000), 10):
        pass

In [10]:
%timeit call_naive()

12.2 s ± 130 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


The primary issue is processing the `range` object that creates 100 million numbers before passing to the `naive_grouper` function. Working with an iterator considerably changes the game:

In [11]:
def better_grouper(inputs, n):
    iters = [iter(inputs)]*n
    return zip(*iters)

In the above function, there is a lot going on, so let's break it down piece-by-piece:

1. The expression `[iters(inputs)] * n` creates a list of `n` references to the same iterator.
2. Next `zip(*iters)` returns an iterator over pairs of corresponding element of each iterator in `iters`.

In [14]:
nums = [1, 2, 3, 4, 5, 6]
iters = [iter(nums)] * 2
list(id(i) for i in iters)

[1711936708792, 1711936708792]

In this case, by creating two `iters`, when we use `zip()`, the first element, 1, is taken from the first iterator, the second iterator now starts at 2 since it is just a reference to the first iterator and has therefore been advanced one step. So the first tuple produced by `zip()` is `(1, 2)`, and so on.

In [15]:
list(better_grouper(nums, 2))

[(1, 2), (3, 4), (5, 6)]

Now let's check the performance:

In [16]:
def call_better():
    for _ in better_grouper(range(100000000), 10):
        pass

In [17]:
%timeit call_better()

2.18 s ± 45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## The grouper recipe

One of the problems with our implementation is that it doesn't handle situations where the value passed to the second argument isn't a factor of the length of the iterable in the first argument:

In [19]:
nums = [i+1 for i in range(10)]
list(better_grouper(nums, 4))

[(1, 2, 3, 4), (5, 6, 7, 8)]

The elements 9 and 10 are missing from the output, due to `zip()` stopping aggregation once the shortest iterable passed is exhausted. This can be modified with `itertools` to select the *longest zip* and fill missing values with something of your choice, or `None` as default:

In [20]:
import itertools as it

In [21]:
x = [i+1 for i in range(5)]
y = ["a", "b", "c"]
list(zip(x, y))

[(1, 'a'), (2, 'b'), (3, 'c')]

In [22]:
list(it.zip_longest(x, y))

[(1, 'a'), (2, 'b'), (3, 'c'), (4, None), (5, None)]

## Brute force?

Let's take the following problem:

    You have three $20 dollar bills, five $10 dollar bills, two $5 dollar bills, and five $1 dollar bills. How many ways can you make change for a $100 dollar bill?
    
A standard way to solve this would be a *brute force* approach. Start listing off the ways there are to choose one bill from a wallet, check whether any of these make change for \$100, then list the ways to pick two bills from your wallet, check again, and repeat.

But as a programmer, this is ardous work, and we are lazy.

In [24]:
bills = [20, 20, 20, 10, 10, 10, 10, 10, 5, 5, 1, 1, 1, 1, 1]

A choice of $k$ things from a set of $n$ things is called a **combination**, and this is one area where `itertools` shines. `it.combinations()` function takes two arguments:

1. An iterable `inputs`
2. A positive integer $n$

and produces an iterator over tuples of all combinations of $n$ elements in `inputs`.

For instance, every 3-bill combination can be found simply as:

In [26]:
list(it.combinations(bills, 3))

[(20, 20, 20),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 5),
 (20, 20, 5),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 5),
 (20, 20, 5),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 10),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),

To solve the problem, we loop over positive integers from 1 to `len(bills)`, then check the combinations of each size that add to \$100:

In [28]:
makes_100 = []
for n in range(1, len(bills) + 1):
    for comb in it.combinations(bills, n):
        if sum(comb) == 100:
            makes_100.append(comb)

makes_100

[(20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 10, 10, 10, 10, 10, 5, 5),
 (20, 20, 10, 10, 10, 10, 10, 5, 5),
 (20, 20, 10, 10, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),

Notice that there are a number of duplicate combinations. To eliminate these, we can convert to a set:

In [29]:
set(makes_100)

{(20, 20, 10, 10, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 10, 10, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 10)}

If we were to allow any combination of `$50`, `$20`, `$10`, `$5` and `$1` dollar bills, this method would break down. For example:

In [30]:
list(it.combinations([1, 2], 2))

[(1, 2)]

`combinations()` does not allow elements to be repeated in the tuples it returns: we instead can use `combinations_with_replacement()`:

In [31]:
list(it.combinations_with_replacement([1, 2], 2))

[(1, 1), (1, 2), (2, 2)]

Another **brute force** `itertools` function is `permutations()`, which accepts a single iterable and produces all possible permutations (rearrangements) of its elements:

In [32]:
list(it.permutations(["a","b","c"]))

[('a', 'b', 'c'),
 ('a', 'c', 'b'),
 ('b', 'a', 'c'),
 ('b', 'c', 'a'),
 ('c', 'a', 'b'),
 ('c', 'b', 'a')]

The number of permutations of longer iterables grows extremely fast, as it follows that:

$$
n!=n(n-1)(n-2)(n-3)\dots(2)(1)
$$

$n$ elements returns $n!$ ($n$ factorial) list elements.