# Itertools and Advanced Classes

`itertools` represents in Python 3 a heavy part of the Python backbone, and indeed many tasks can be achieved effortlessly through the use of `itertools`. As you may have experimented, one of the primary drawbacks of Python is the slow nature of **looping**, since the Python interpreter is not optimized for handling complex code within a `for` or `while` loop. This aspect will be covered more in the *Performance* notebook.

One of the ways around this is by using the `itertools` library suite. Also, the functions themselves are intuitively named, modestly fast, elegant and memory-efficient. `itertools` provides building-block functions inspired by constructs from APL, Haskell and SML. Together they form a kind of "iterator algebra" making it possible to construct a specialized tool very succinctly and efficiently in Python. For more on this, look at the `itertools` [documentation](https://docs.python.org/3/library/itertools.html) and this very [helpful guide](https://realpython.com/python-itertools/) where this teaching material is primarily drawn from.

Loosely speaking, this means that functions in `itertools` build on top of iterators to produce more complex ones. For example, `zip()` is an *in-built* Python function which takes an arbitrary number of iterables as arguments and returns an iterator over tuples of their corresponding elements:

In [2]:
list(zip([1, 2, 3], ["a","b","c"]))

[(1, 'a'), (2, 'b'), (3, 'c')]

In this case, `[1, 2, 3]` and `["a", "b", "c"]` are lists, and are iterable, which means they can return their elements one at a time. As an extension to this, **any Python object** that implements the `.__iter__()` or `.__getitem__()` methods is *iterable*.

The `iter()` built-in function, when called on an iterable, returns an iterable object for that iterable:

In [3]:
iter([1, 2, 3, 4])

<list_iterator at 0x7f6fd02c25c0>

Under the hood, `zip()` works by calling `iter()` on each of its arguments, then advancing each iterator return by `iter()` with `next()` and aggregating into tuples. 

The `map()` built-in function is another operator, where, it applies a single-parameter function to each element of an iterable one element at a time:

In [4]:
list(map(len, ["abc", "de", "fghi"]))

[3, 2, 4]

Like `zip()`, `map()` also makes use of `iter()` to advance the iterator over the list with `next()` until the iterator is exhausted, and apply the `len` function (or any function) to the value returned by `next()` at each step. 

Since iterators are *iterable*, you can compose `zip()` and `map()` together to produce an iterator over combinations of elements in more than one iterable. Take the following example:

In [5]:
list(map(sum, zip([1,2,3], [4,5,6])))

[5, 7, 9]

This is what is meant by functions in `itertools` forming an iterator algebra; this helps to form specialized data pipelines.

There are two positive reasons which such iterator algebra may be useful: firstly it improves memory efficiency and secondly faster execution time. Consider the following problem:

    Given a list of values _inputs and a positive integer _n, write a function that splits _inputs into groups of length _n. For simplicity, assume that the length of the input list is divisible by _n. For example, if _inputs = [1, 2, 3, 4, 5, 6] and _n = 2, your function should return [(1,2), (3,4), (5,6)].
    
With a naive approach, we may write something like:

In [6]:
def naive_grouper(inputs, n):
    # integer division
    n_groups = len(inputs) // n
    return [tuple(inputs[i*n:(i+1)*n]) for i in range(n_groups)]

Testing this works as expected:

In [7]:
naive_grouper([1,2,3,4,5,6], 2)

[(1, 2), (3, 4), (5, 6)]

But what about if you try to pass a list with 100 million elements? You will need a lot of memory! Even if you have the memory, the program will hang until the output list is populated. Try below at your peril (if you have $\lt$ 5GB DRAM, enjoy):

In [8]:
def call_naive():
    for _ in naive_grouper(range(100000000), 10):
        pass

In [9]:
%timeit call_naive()

8.14 s ± 334 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


The primary issue is processing the `range` object that creates 100 million numbers before passing to the `naive_grouper` function. Working with an iterator considerably changes the game:

In [10]:
def better_grouper(inputs, n):
    iters = [iter(inputs)]*n
    return zip(*iters)

In the above function, there is a lot going on, so let's break it down piece-by-piece:

1. The expression `[iters(inputs)] * n` creates a list of `n` references to the same iterator.
2. Next `zip(*iters)` returns an iterator over pairs of corresponding element of each iterator in `iters`.

In [11]:
nums = [1, 2, 3, 4, 5, 6]
iters = [iter(nums)] * 2
list(id(i) for i in iters)

[140118210295064, 140118210295064]

In this case, by creating two `iters`, when we use `zip()`, the first element, 1, is taken from the first iterator, the second iterator now starts at 2 since it is just a reference to the first iterator and has therefore been advanced one step. So the first tuple produced by `zip()` is `(1, 2)`, and so on.

In [12]:
list(better_grouper(nums, 2))

[(1, 2), (3, 4), (5, 6)]

Now let's check the performance:

In [13]:
def call_better():
    for _ in better_grouper(range(100000000), 10):
        pass

In [14]:
%timeit call_better()

1.16 s ± 23.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## The grouper recipe

One of the problems with our implementation is that it doesn't handle situations where the value passed to the second argument isn't a factor of the length of the iterable in the first argument:

In [15]:
nums = [i+1 for i in range(10)]
list(better_grouper(nums, 4))

[(1, 2, 3, 4), (5, 6, 7, 8)]

The elements 9 and 10 are missing from the output, due to `zip()` stopping aggregation once the shortest iterable passed is exhausted. This can be modified with `itertools` to select the *longest zip* and fill missing values with something of your choice, or `None` as default:

In [16]:
import itertools as it

In [17]:
x = [i+1 for i in range(5)]
y = ["a", "b", "c"]
list(zip(x, y))

[(1, 'a'), (2, 'b'), (3, 'c')]

In [18]:
list(it.zip_longest(x, y))

[(1, 'a'), (2, 'b'), (3, 'c'), (4, None), (5, None)]

## Brute force?: combinations

Let's take the following problem:

    You have three $20 dollar bills, five $10 dollar bills, two $5 dollar bills, and five $1 dollar bills. How many ways can you make change for a $100 dollar bill?
    
A standard way to solve this would be a *brute force* approach. Start listing off the ways there are to choose one bill from a wallet, check whether any of these make change for \$100, then list the ways to pick two bills from your wallet, check again, and repeat.

But as a programmer, this is ardous work, and we are lazy.

In [19]:
bills = [20, 20, 20, 10, 10, 10, 10, 10, 5, 5, 1, 1, 1, 1, 1]

A choice of $k$ things from a set of $n$ things is called a **combination**, and this is one area where `itertools` shines. `it.combinations()` function takes two arguments:

1. An iterable `inputs`
2. A positive integer $n$

and produces an iterator over tuples of all combinations of $n$ elements in `inputs`.

For instance, every 3-bill combination can be found simply as:

In [20]:
list(it.combinations(bills, 3))

[(20, 20, 20),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 5),
 (20, 20, 5),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 10),
 (20, 20, 5),
 (20, 20, 5),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 20, 1),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 10),
 (20, 10, 10),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 10),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 5),
 (20, 10, 5),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),
 (20, 10, 1),

To solve the problem, we loop over positive integers from 1 to `len(bills)`, then check the combinations of each size that add to \$100:

In [21]:
makes_100 = []
for n in range(1, len(bills) + 1):
    for comb in it.combinations(bills, n):
        if sum(comb) == 100:
            makes_100.append(comb)

makes_100

[(20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 10),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 10, 10, 10, 10, 10, 5, 5),
 (20, 20, 10, 10, 10, 10, 10, 5, 5),
 (20, 20, 10, 10, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),

Notice that there are a number of duplicate combinations. To eliminate these, we can convert to a set:

In [22]:
set(makes_100)

{(20, 20, 10, 10, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 10, 10, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
 (20, 20, 20, 10, 10, 10, 5, 5),
 (20, 20, 20, 10, 10, 10, 10)}

If we were to allow any combination of `$50`, `$20`, `$10`, `$5` and `$1` dollar bills, this method would break down. For example:

In [23]:
list(it.combinations([1, 2], 2))

[(1, 2)]

`combinations()` does not allow elements to be repeated in the tuples it returns: we instead can use `combinations_with_replacement()`:

In [24]:
list(it.combinations_with_replacement([1, 2], 2))

[(1, 1), (1, 2), (2, 2)]

Another **brute force** `itertools` function is `permutations()`, which accepts a single iterable and produces all possible permutations (rearrangements) of its elements:

In [25]:
list(it.permutations(["a","b","c"]))

[('a', 'b', 'c'),
 ('a', 'c', 'b'),
 ('b', 'a', 'c'),
 ('b', 'c', 'a'),
 ('c', 'a', 'b'),
 ('c', 'b', 'a')]

The number of permutations of longer iterables grows extremely fast, as it follows that:

$$
n!=n(n-1)(n-2)(n-3)\dots(2)(1)
$$

$n$ elements returns $n!$ ($n$ factorial) list elements.

## Sequences of Numbers: count

With `itertools`, we can easily generate iterators over infinite sequences. In this section, we'll explore sequence generation.

### Evens and Odds

In this first example, we'll create a pair of iterators over even and odd integers *without explicitly doing any arithmetic*. Below is an arithmetic solution using generators:

In [26]:
def evens():
    """Generate even integers, starting with 0."""
    n = 0
    while True:
        yield n
        n += 2

def odds():
    """Generate odd integers, starting with 1."""
    n = 1
    while True:
        yield n
        n += 2

evens = evens()
list(next(evens) for _ in range(5))

[0, 2, 4, 6, 8]

In [27]:
odds = odds()
list(next(odds) for _ in range(6))

[1, 3, 5, 7, 9, 11]

The `yield` statement returns a **generator** rather than the calculated value; meaning that the value is not computed until `list()` is called.

With `itertools`, this can be achieved more compactly using `itertools.count()`, which counts starting by default with number 0:

In [28]:
counter = it.count()
list(next(counter) for _ in range(5))

[0, 1, 2, 3, 4]

We can start counting from any number of our choice by setting the `start` keyword argument, defaulted to 0, and set a `step` argument to determine the interval, defaulted to 1.

In [29]:
evens = it.count(step=2)
list(next(evens) for _ in range(5))

[0, 2, 4, 6, 8]

In [30]:
odds = it.count(start=1, step=2)
list(next(odds) for _ in range(5))

[1, 3, 5, 7, 9]

In [31]:
c_floats = it.count(start=0.5, step=0.5)
list(next(c_floats) for _ in range(5))

[0.5, 1.0, 1.5, 2.0, 2.5]

In [32]:
neg_count = it.count(start=1, step=-.5)
list(next(neg_count) for _ in range(5))

[1, 0.5, 0.0, -0.5, -1.0]

`count()` acts in many ways to the in-built function `range()`, but `count()` returns an infinite sequence. You might wonder what the point of it is; but one nice feature is that since it's length is not set, it can easily be associated to any other Python `list`:

In [33]:
list(zip(it.count(), ["a", "b", "c"]))

[(0, 'a'), (1, 'b'), (2, 'c')]

The example above enumerates a list without a `for` loop and without knowing the length of the list ahead of time.

## Recurrence Relations: repeat, cycle, accumulate

A recurrence relation can describe a sequence of numbers with a recursive formula. One of the most famous recurrence relations is the **Fibonacci sequence**:

$$
F_n=F_{n-1}+F_{n-2}, \qquad F_0=1, F_1=1
$$

For instance, producing the sequence with a generator makes sense since the sequence is *infinite*:

In [34]:
def fibs():
    a,b = 0,1
    while True:
        yield a
        a,b = b, a+b
        
fibs = fibs()

The Fibonacci sequence is a *second-order* recurrence relation formula, since it requires two numbers behind it in order to calculate the new value.

You can think of the previous example, `count()` as a *first-order* recurrence relation, where `step` parameter acts as a multiplier, and `start` acts as an additive offset:

$$
F_n=F_{n-1}+z, \qquad F_0=c
$$

where $z$ is the step, $c$ is the start. Another example of a *first-order* recurrence relation is a constant sequence $n, n, \dots, n$, where $n$ is a chosen value. This is achieved with the `repeat` function:

In [35]:
all_ones = it.repeat(1)
list(next(all_ones) for _ in range(5))

[1, 1, 1, 1, 1]

In [36]:
list(next(it.repeat(2)) for _ in range(3))

[2, 2, 2]

Alternatively the second parameter can specify the number of values to produce:

In [40]:
list(it.repeat(5,4))

[5, 5, 5, 5]

Another *first-order* recurrence is a **cycle** of alternating numbers, for example $[-1,1,-1,1,\dots]$. This is implemented with `cycle()`, given a list of elements to cycle over:

In [42]:
alt_ones = it.cycle([-1, 1])
list(next(alt_ones) for _ in range(6))

[-1, 1, -1, 1, -1, 1]

To generate an *arbitrary* first-order recurrence relation, we can use `accumulate()`. This function takes two arguments: `inputs` which is a list of values, and `func` which is a function which exactly two inputs.

In [48]:
import operator
list(it.accumulate([1,2,3,4], operator.add))

[1, 3, 6, 10]

The first value returned from `accumulate()` is the first value in the input sequence. This acts similar to NumPy's `cumsum()` function, i.e the cumulative sum over the values in the array. The only difference here is that it is performed over a potentially infinite series.

In [54]:
list(it.accumulate([1,2,3,4], lambda x,y: x * y / 2))

[1, 1.0, 1.5, 3.0]

## A Deck of Cards: product, islice, tee

Let's imagine we're building a Poker app. We need a deck of cards, so you might want to start by defining a list of ranks (King, Queen, Jack, etc), and a list of suits (hearts, diamonds, clubs, etc):

In [55]:
ranks = ["2","3","4","5","6","7","8","9","10","J","Q","K","A"]
suits = ["H","D","C","S"]

You could represent a card as a tuple whose first element is a rank, and the second element is a suit. A deck of cards would be collection of such tuples. The deck should act like the real thing, so it makes sense to define a generator that yields cards one at a time and becomes exhausted once all the cards are dealt.

One way to achieve this is to write a generator with a nested `for` loop over `ranks` and `suits`:

In [56]:
def cards():
    for rank in ranks:
        for suit in suits:
            yield rank, suit

Or alternatively:

In [57]:
cards = ((rank, suit) for rank in ranks for suit in suits)

This acts as a **Cartesian product** of two or more iterables. In mathematics, the Cartesian product of two sets $A$ and $B$ is the set of all tuples of the form $(a,b)$ where $a$ is an element of $A$ and $b$ is an element of $B$. See below for an example with `itertools`:

In [59]:
list(it.product([1,2], ["a","b"]))

[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

The `product()` function is not limited to two variables, you can pass it as many as you like - and they don't need to be of the same size!

In [61]:
list(it.product([1,2,3], ["a","b"], ["e"]))

[(1, 'a', 'e'),
 (1, 'b', 'e'),
 (2, 'a', 'e'),
 (2, 'b', 'e'),
 (3, 'a', 'e'),
 (3, 'b', 'e')]

**Fair Warning**: The `product()` function is another *brute force* function and can lead to combinatorial explosion.

In [64]:
import random

cards = it.product(ranks, suits)

def shuffle(deck):
    deck = list(deck)
    random.shuffle(deck)
    return iter(tuple(deck))

cards = shuffle(cards)

Cutting the deck is pretty important when playing poker. If you imagine the cards being neatly stacked on a table, you have the user pick a number $n$ and then remove the first $n$ cards from the top of the stack and move them to the bottom. This is known as **slicing**.

In [66]:
def cut(deck, n):
    if n < 0:
        raise ValueError("n must be positive integer")
    deck = list(deck)
    return iter(deck[n:] + deck[:n])

cards = cut(cards, 26)

The `cut()` function above is nice and simple, but it suffers from some problems; when you slice a list, you make a copy of the original list and return a new list with the selected elements. With a deck of 52 cards, this increase in space complexity is trivial, but we could reduce the memory overhead using `itertools`. To do this, we need a few functions: `tee()`, `islice()` and `chain()`.

Let's explore these functions.

The `tee()` function is used to create any number of independent iterators from a single iterable. It takes two arguments: list `inputs` and number `n` of independent iterators over `inputs` to return. 

In [80]:
it1, it2 = it.tee([1,2,3,4,5], 2)
list(it1)

[1, 2, 3, 4, 5]

In [81]:
# list now exhausted
list(it1)

[]

In [82]:
# it2 works independently of it1
list(it2)

[1, 2, 3, 4, 5]

Each `tee()` works to create $n$ independent iterators, with each iterator working on it's own FIFO queue.

The `islice()` function works similarly to slicing a list or tuple. You pass it an iterable, a starting and stopping point, and the slice returned stops at the index just before the stopping point. The main difference of course, is that `islice` returns an iterable:

In [84]:
list(it.islice("ABCDEFG",2,5))

['C', 'D', 'E']

In [85]:
list(it.islice([1,2,3,4,5], 0, 5, 2))

[1, 3, 5]

In [86]:
list(it.islice(range(10), 3, None))

[3, 4, 5, 6, 7, 8, 9]

In [88]:
list(it.islice("ABCDE", 4))

['A', 'B', 'C', 'D']

The last two functions are useful for truncating iterables. You can use this to replace the list slicing used in `cut()` to select the top and bottom of the deck. As an added bonus, `islice()` doesn't accept negative indices for positions, so no bounding checks are necessary.

The last function needed is `chain()`, this is very simple in that it concatenates iterables together. For example:

In [89]:
list(it.chain("ABC","DEF"))

['A', 'B', 'C', 'D', 'E', 'F']

In [90]:
list(it.chain([1,2], [3,4,5], [6,7,8,9]))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

## Flattening A List of Lists

As a nice addendum to `chain()`, it is trivial to flatten lists using `from_iterable()` class method from `chain`. Since the elements of the iterable must themselves be iterable, the net effect is flattening:

In [91]:
list(it.chain.from_iterable([[1,2,3],[4,5,6]]))

[1, 2, 3, 4, 5, 6]

This rule applies to any infinite series, such that you could emulate the behavior of `cycle()`, for instance:

In [93]:
cyc = it.chain.from_iterable(it.repeat("abc"))
list(it.islice(cyc, 8))

['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b']

This is very useful when you need to build an iterator over data that has been 'chunked'.

# Tasks

## Task 1

Write a generic first-order recurrence function `first_order` that accepts three arguments: `p`, `q` and `initial_val`, and returns a sequence defined as:

$$
F_n=pF_{n-1}+q
$$

using `itertools` functions that you know. Test your function on previous first-order recurrence examples above:

In [49]:
# your codes here
def first_order(p, q, initial_val):
    return it.accumulate(it.repeat(initial_val), lambda s,_: p*s+q)

## Task 2

Write a generic second-order recurrence function `second_order` that accepts four arguments: `p`, `q`, `r` and `initial_values`, and returns a sequence defined as:

$$
F_n=pF_{n-1}+qF_{n-2}+r
$$

using `itertools` functions and built-in Python methods. Test your function as the Fibonacci sequence.

In [50]:
# your codes here
def second_order(p, q, r, initial_values):
    intermediate = it.accumulate(it.repeat(initial_values), lambda s,_: (s[1], p*s[1] + q*s[0] + r))
    return map(lambda x: x[0], intermediate)