# Itertools and Collections

In [1]:
import itertools

letters = ['a', 'b', 'c', 'd', 'e', 'f']
booleans = [1, 0, 1, 0, 0, 1]
numbers = [23, 20, 44, 32, 7, 12]
decimals = [0.1, 0.7, 0.4, 0.4, 0.5]

### chain()

chain() does exactly what you’d expect it to do: give it a list of lists/tuples/iterables and it chains them together for you. Remember making links of paper with tape as a kid? This is that, but in Python.

In [3]:
print(itertools.chain(letters, booleans, decimals))

<itertools.chain object at 0x000001C593144128>


Relax. The iter in itertools stands for iterable, which is hopefully a term you’ve run into before. Printing iterables in Python isn’t exactly the hardest thing in the world, since you just need to cast it to a list:

In [4]:
print(list(itertools.chain(letters, booleans, decimals)))

['a', 'b', 'c', 'd', 'e', 'f', 1, 0, 1, 0, 0, 1, 0.1, 0.7, 0.4, 0.4, 0.5]


## count()

Let’s say you’re trying to do a sensitivity analysis of a super important business simulation. Your entire super important business simulation hinges on the hopes that the average cost of a widget is $10, but demand for that widget might explode over the new few months and you make sure you won’t hemorrhage money if it costs more money. So you want a list of theoretical widget costs to pass to magic_business_simulation().

With list comprehensions, that might look something like:

In [None]:
[(i * 0.25) + 10 for i in range(100)]

In [None]:
itertools.count(10, 0.25)

It never stops. count() and many other itertools methods generate infinitely, until aborted (via, say, break). No, really — again, itertools is all about iterables, and infinite iterables might be scary right now but they are incredibly helpful down the road.

So let’s say we only want the values of the above method up until $20 (this widget has very elastic demand, apparently). How do we cut off count() like a stern mother scolding a sugar-addled child?

In [7]:
list_test = []
for i in itertools.count(10, 0.25):
    if i < 20:
        list_test.append(i)
    else:
        break
print(list_test)

[10, 10.25, 10.5, 10.75, 11.0, 11.25, 11.5, 11.75, 12.0, 12.25, 12.5, 12.75, 13.0, 13.25, 13.5, 13.75, 14.0, 14.25, 14.5, 14.75, 15.0, 15.25, 15.5, 15.75, 16.0, 16.25, 16.5, 16.75, 17.0, 17.25, 17.5, 17.75, 18.0, 18.25, 18.5, 18.75, 19.0, 19.25, 19.5, 19.75]


## compress()

compress() is by far what gets the most of my use. It’s perfect: given two lists a and b, return the elements of a for which the corresponding elements of b are True.

In [8]:
print(list(itertools.compress(letters, booleans)))

['a', 'c', 'f']


## imap()

The final method I’m going to go over is one that should be a simple addition for readers well-versed in the functional programming staples of map and filter: imap() is just a version of map that produces an iterable. By passing it a function, it systematically grabs arguments and throws them at the function, returning the results:

# Collections

Much of what you need to do with Python can be done using built-in containers like dict, list, set, and tuple. But these aren't always the most optimal. In this guide, I'll cover why and when to use collections and provide interesting examples of each. This is designed to supplement the documentation with examples and explanation, not replace it. 

## Counter



A counter is a dictionary-like object designed to keep tallies. With a counter, the key is the item to be counted and value is the count. You could certainly use a regular dictionary to keep a count, but a counter provides much more control.

A counter object ends up looking just like a dictionary and even contains a dictionary interface.

In [9]:
from collections import Counter

ctr = Counter({'birds': 200, 'lizards': 340, 'hamsters': 120})
ctr['hamsters'] # 120

120

In [None]:
#Get the most common word in a text file

import re
words = re.findall(r'\w+', open('ipencil.txt').read().lower())
Counter(words).most_common(1) # [('the', 148)]

In [10]:
numbers = """
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
"""

In [12]:
import re
numbers = re.sub("\n", "", numbers)
Counter(numbers).most_common()

[('2', 112),
 ('5', 107),
 ('4', 107),
 ('6', 103),
 ('9', 100),
 ('8', 100),
 ('1', 99),
 ('0', 97),
 ('7', 91),
 ('3', 84)]

## defaultdict


Suppose you have a sequence of key-value pairs. Perhaps you are keeping track of how many miles you run each day, and you want to know which day of the week you are most active.

In [15]:
from collections import defaultdict

days = [('monday', 2.5), ('wednesday', 2), ('friday', 1.5), ('monday', 3), ('tuesday', 3.5), ('thursday', 2), ('friday', 2.5)]
active_days = defaultdict(float)
for k, v in days:
    active_days[k] += v
print(active_days)

defaultdict(<class 'float'>, {'friday': 4.0, 'wednesday': 2.0, 'monday': 5.5, 'thursday': 2.0, 'tuesday': 3.5})


This can be accomplished using many other data types, but defaultdict allows us to specify the default type of the value. This is simpler and faster than using a regular dict with dict.setdefault.

You pass in the default type upon instantiation. Then you can immediately begin setting values even if the key is not yet set. This would obviously throw a KeyError if you tried this with a normal dictionary.

Here is an example using a list as the default value. Here we have a list of sets. Each set has a letter and a number, and the letters are both uppercase and lowercase. Suppose we want to make a list of values grouped by letter ignoring case.

In [16]:
letters = [('A', 10), ('B', 3), ('C', 4), ('a', 36), ('b', 8), ('c', 10)]
grouped_letters = defaultdict(list)
for k, v in letters:
    grouped_letters[k.lower()].append(v)
print(grouped_letters)

defaultdict(<class 'list'>, {'c': [4, 10], 'a': [10, 36], 'b': [3, 8]})


## OrderedDict

OrderedDicts act just like regular dictionaries except they remember the order that items were added. This matters primarily when you are iterating over the OrderedDict as the order will reflect the order in which the keys were added.

A regular dictionary doesn't care about order:

In [20]:
from collections import OrderedDict

In [21]:
d = {}
d['a'] = 1
d['b'] = 10
d['c'] = 8
for letter in d:
    print(letter)

c
a
b


In [22]:
d = OrderedDict()
d['a'] = 1
d['b'] = 10
d['c'] = 8
for letter in d:
    print(letter)

a
b
c
