title: A wild itertools appeared!
description: A walkthrough and deployment use cases of the itertools standard library
thumbnail: https://upload.wikimedia.org/wikipedia/commons/c/c3/Python-logo-notext.svg
tags: [wild python, functional programming, tutorials]

# A wild _itertools_ appears!

## About this series
This blog post is the first of a new series I'm starting called "Wild Python," aka use cases of selected python libraries in deployment. Posts of this series will generally consist of a breakdown of the library and intended use cases, followed by several examples of how it is used in the context of several popular GitHub repositories. This series will be continually ongoing, partially to act as a personal refresher course.

## What is _itertools_?
Itertools is Python implementation of a common design pattern to stream data in [functional programming](https://www.dataquest.io/blog/introduction-functional-programming-python/). Effectively, this allows a way to take an iterable (```list, tuple, dict, string``` etc.) and apply a very succinct method to lazily iterate through them based on several commonly used pieces of logic. Let's take a toy example, one of the [most overused interview questions](https://www.tomdalling.com/blog/software-design/fizzbuzz-in-too-much-detail/):


In [7]:
from itertools import cycle, count, islice

fizzes = cycle(['', '', 'fizz'])
buzzes = cycle(['', '', '', '', 'buzz'])
numbers = count(1)
fizzbuzz = (f'{fizz}{buzz}' or n 
            for  fizz, buzz, n in zip(fizzes, buzzes, numbers))
for result in islice(fizzbuzz, 16):
    print(result)

1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
17
fizz
19
buzz
fizz
22
23
fizz
buzz


Let's break this down: ```itertools.cycle``` consumes an iterator, then loops back over it from the beginning infinitely, or until a stop condition is reached. As the name implies, this is very useful for cyclic or periodic data. ```count``` is another infinite iterator, that accepts a "start" and "step" argument, similar to ```range()``` Finally, we have ```islice```, another piece of syntactic sugar that is equivalent to ```for i in range(number): do_something(iterable[i])```

However, in addition to being more readable, this has the advantage of speed and memory efficiency. The ```for``` loop above using ```range``` would most likely be used with a previously existing iterable, which is stored in RAM. This is opposed to the ```islice``` implementation, which doesn't store any values, only using RAM in the case that it is called. Note that an equivalent alternative to the islice implementation above is to say ```for i in range(16): print(next(fizzbuzz))``` Next is an important method to keep in mind when working with generators.

Let's do another example with another favorite interview question, the [Knapsack Problem](https://en.wikipedia.org/wiki/Knapsack_problem), approximated using a brute-force approach:

In [15]:
from collections import namedtuple
from itertools import combinations

item = namedtuple('Item', 'name value mass')
pants = item('pants', 40, 5)
shirt = item('shirt', 10, 2)
shoes = item('shoes', 50, 4)
stove = item('stove', 30, 6)
socks = item('socks', 50, 1)
water = item('water', 70, 7)
tent = item('tent', 20, 6)
hammock = item('hammock', 40, 1)
headlamp = item('headlamp', 50, 3)
possibilities = [pants, shirt, shoes, stove, socks, 
                 water, tent, hammock, headlamp]

#assuming a knapsack that can carry 10kg:
max_weight = dict()
for n in range(1, len(possibilities)+1):
    for combination in combinations(possibilities, n):
        if sum([thing.mass for thing in combination]) == 10:
            answer = '+'.join([thing.name for thing in combination])
            max_weight[answer] = sum([thing.value for thing in combination])
            
print(sorted(max_weight.items(), key=lambda d: d[1], reverse=True))



[('pants+socks+hammock+headlamp', 180), ('shirt+shoes+socks+headlamp', 160), ('shirt+shoes+hammock+headlamp', 150), ('pants+shoes+socks', 140), ('pants+shoes+hammock', 130), ('shirt+socks+water', 130), ('stove+socks+headlamp', 130), ('shirt+stove+socks+hammock', 130), ('water+headlamp', 120), ('shirt+water+hammock', 120), ('stove+hammock+headlamp', 120), ('socks+tent+headlamp', 120), ('shirt+socks+tent+hammock', 120), ('tent+hammock+headlamp', 110), ('pants+shirt+headlamp', 100), ('shoes+stove', 80), ('shoes+tent', 70)]


_Note: This is neither [the most efficient solution](http://www.es.ele.tue.nl/education/5MC10/Solutions/knapsack.pdf), nor is it recommended to go out into the wilderness without a shirt or shoes._

Let's break this down: we have several items with a cost and weight associated with them. There are many data structures that can represent this, but I decided to go with named tuples for clarity's sake (this will probably warrant another Wild Python article). We then iterate through all possible combinations of items by finding combinations of different length within a nested for loop. We add them to a dictionary if it maxes out the knapsack carrying capacity. Finally, we see what the highest value combination is by ordering the resulting dictionary by values rather than keys, then reversing it. We now have a very fast and memory-efficient brute force solution!
### Chain chain chain
One method that deserves some additional explanation is ```chain``` vs. ```chain_from_iterable```. Chain takes two or more iterables as arguments, and chains them together, consuming them in the order passed. From_iterable takes a _single_ iterator as an argument, effectively flattening it (think smushing a matrix into an array)

## Production use cases
[Keras]() utilizes ```itertools.compress``` to check whether all functions have been properly documented:


In [18]:
from itertools import compress
def assert_args_presence(args, doc, member, name):
    args_not_in_doc = [arg not in doc for arg in args]
    if any(args_not_in_doc):
        raise ValueError(
            "{} {} arguments are not present in documentation ".format(name, list(
                compress(args, args_not_in_doc))), member.__module__)

[False, False, True, True]


Compress is usually used to map an iterable to the "truthiness" of individual components of that iterable. The list comprehension above contains booleans based on list membership, which is then mapped back onto the function arguments during the string formatting. This tells anyone submitting a pull request exactly what function arguments need to be documented.

[Glom]() is possibly Python's best answer for nested JSON data. uses ```chain.from_iterable``` 