<img src='img/logo.png' />

<img src='img/title.png'>

<img src='img/py3k.png'>

# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
	* [A digression on the `itertools` module](#A-digression-on-the-itertools-module)
* [Large sequences, even if not quite infinitely long](#Large-sequences,-even-if-not-quite-infinitely-long)
* [Chaining iterables](#Chaining-iterables)

# Learning Objectives:

After completion of this module, learners should be able to:

* understand the `itertools` module

## A digression on the `itertools` module

The module `itertools` is a collection of very powerful—and carefully designed—functions for performing *iterator algebra*.  That is, these permit *function composition* with iterators in sophisticated ways while minimizing concrete instantiation of terms in iterable sequences. In addition to the basic functions in the module itself, the [module documentation](https://docs.python.org/3.5/library/itertools.html) provides a number of short recipes for additional functions using two or three of the basic module functions in combination. *Be aware that it is easy to get these recipes subtly wrong*. The third-party module `more_itertools` provides additional functions that are likewise designed to avoid common pitfalls and edge cases.

The basic goal of using the building blocks inside `itertools` is to avoid performing computations before they are required, to avoid the memory requirements of large collections, to avoid potentially slow I/O until strictly necessary, and so on. Iterators are lazy sequences rather than realized collections; when combined with functions or recipes in `itertools`, they retain this property.

Here is a quick example of combining a few things. Rather than the stateful `Fibonacci` class to let us keep a running sum, we might simply create a single lazy iterator to generate both the current number and this sum:

In [None]:
from itertools import count, tee
mycount = count()
next(mycount), next(mycount), next(mycount)

In [None]:
# Assume that this is code we cannot modify ourselves (3rd party, etc.)
def fibonacci_gen():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a+b
fibonacci = fibonacci_gen()
print(next(fibonacci))

In [None]:
list(zip("ABC", [1,2,3], range(100,103)))

In [None]:
from itertools import accumulate
# Iterate over both an iterable of numbers and running total of the sequence
def item_with_total(iterable):
    "Generically transform a stream of numbers into a pair of (num, running_sum)"
    s, t = tee(iterable) # unpacking tuples
    yield from zip(t, accumulate(s))
    # Equivalent to:
    # for item, total in zip(t, accumulate(s)):
    #     yield item, total

fibs = fibonacci_gen()
for n, (fib, total) in zip(range(10), item_with_total(fibs)):
    print("%3d. Item: %3d; Total: %3d" % (n+1, fib, total))

The documentation for the `itertools` module contain details on its combinatorial functions as well as a number of short recipes for combining them. Note that for practical purposes, `zip()`, `map()`, `filter()`, and `range()` (which is, in a sense, just a terminating `itertools.count()`) could well live in `itertools` if they were not built-ins.  That is, all of those functions lazily generate sequential items (mostly based on existing iterables) without creating a concrete sequence. Built-ins like `all()`, `any()`, `sum()`, `min()`, `max()`, and `functools.reduce()` also act on iterables, but all of them, in the general case, need to exhaust the iterator rather than remain lazy.

# Large sequences, even if not quite infinitely long

```python
log1 = open('huge.log')
seq = itertools.count()
rows = db.execute("select * from big_data")
z = zip(log1, seq, rows)
for line, num, row in z:
    if something:
        break
    something_else(line, num, row)
```

# Chaining iterables

The functions `itertools.chain()` and `itertools.chain.from_iterable()` combine multiple iterables.  Built-in `zip()` and `itertools.zip_longest()` also do this, but in manners that allow incremental advancement through the iterables.  A consequence of this is that while chaining infinite iterables is valid syntactically and semantically, no actual program will exhaust the earlier iterable. For example:

```python
from itertools import chain, count
thrice_to_inf = chain(count(), count(), count())
```

Conceptually, `thrice_to_inf` will count to infinity three times, but in practice once would always be enough.  However, for merely *large* iterables—not for infinite ones—chaining can be very useful and parsimonious.

In [None]:
from glob import glob
from itertools import chain, islice
def from_logs(fnames):
    yield from (open(file) for file in fnames)

logdir ='data/babynames/*'
logs = glob(logdir)
lines = chain.from_iterable(from_logs(logs))
for line in islice(lines, 16002, 16006):
    print(line, end='')

In [None]:
next(lines)

In [None]:
next(lines)

In [None]:
r = range(100000000)
r1, r2 = tee(r)
next(r1),next(r1),next(r1),next(r1),next(r1),next(r1)

In [None]:
next(r1)

In [None]:
next(r2)

Besides the chaining with `itertools`, we should mention `collections.ChainMap()` in the same breath. Dictionaries (or generally any `collections.abc.Mapping`) are iterable (over their keys). Just as we might want to chain multiple sequence-like iterables, we sometimes want to chain together multiple mappings without needing to create a single larger concrete one. `ChainMap()` is handy, and does not alter the underlying mappings used to construct it.

<img src='img/copyright.png'>