Python’s collections module has specialized container datatypes that can be used to replace Python’s general purpose containers (dict, tuple, list, and set).

1. ChainMap
2. defaultdict
3. deque
4. namedtuple
5. OrderedDict

There is a sub-module of collections called abc or Abstract Base Classes. These will not be covered in this chapter.



## Chain Map

A ChainMap is a class that provides the ability to link multiple mappings together such that they end up being a single unit. If you look at the documentation, you will notice that it accepts **maps**, which means that a ChainMap will accept any number of mappings or dictionaries and turn them into a single view that you can update.

In [1]:
from collections import ChainMap

car_parts = {'hood': 500, 'engine': 5000, 'front_door': 750}
car_options = {'A/C': 1000, 'Turbo': 2500, 'rollbar': 300}
car_accessories = {'cover': 100, 'hood_ornament': 150, 'seat_cover': 99}

chainmap = ChainMap(car_accessories, car_options, car_parts)
print(chainmap["hood"])

500


 the ChainMap will go through each map in order to see if that key exists and has a value. If it does, then the ChainMap will return the first value it finds that matches that key.

This is especially useful if you want to set up defaults. Let’s pretend that we want to create an application that has some defaults. The application will also be aware of the operating system’s environment variables. If there is an environment variable that matches one of the keys that we are defaulting to in our application, the environment will override our default. Let’s further pretend that we can pass arguments to our application. These arguments take precendence over the environment and the defaults. This is one place where a ChainMap can really shine

```
import argparse
import os

from collections import ChainMap


def main():
    app_defaults = {'username':'admin', 'password':'admin'}

    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--username')
    parser.add_argument('-p', '--password')
    args = parser.parse_args()
    command_line_arguments = {key:value for key, value 
                              in vars(args).items() if value}

    chain = ChainMap(command_line_arguments, os.environ, 
                     app_defaults)
    print(chain['username'])

    
if __name__ == '__main__':
    main()
    os.environ['username'] = 'test'
    main()
    
```    

You will notice that argparse doesn’t provide a way to get a dictionary object of its arguments, so we use a dict comprehension to extract what we need. The other cool piece here is the use of Python’s built-in vars. If you were to call it without an argument, vars would behave like Python’s built-in locals. But if you do pass in an object, then vars is the equivalent to object’s __dict__ property.

In other words, vars(args) equals args.__dict__. Finally create our ChainMap by passing in our command line arguments (if there are any), then the environment variables and finally the defaults. At the end of the code, we try calling our function, then setting an environment variable and calling it again. Give it a try and you’ll see that it prints out admin and then test as expected. Now let’s try calling the script with a command line argument:


## Counter

The collections module also provides us with a neat little tool that supports convenient and fast tallies. This tool is called Counter. You can run it against most iterables. 

In [2]:
from collections import Counter

print(Counter('superfluous'))

# return a Counter object that is a subclass of Python's dictionary

Counter({'u': 3, 's': 2, 'p': 1, 'e': 1, 'r': 1, 'f': 1, 'l': 1, 'o': 1})


In [3]:
counter = Counter('superfluous')
print(counter['u'])

3


The Counter provides a few methods that might interest you. For example, you can call elements which will an iterator over the elements that are in the dictionary, but in an arbitrary order. You can kind of think of this function as a “scrambler” as the output in this case is a scrambled version of the string.



In [4]:
print(list(counter.elements()))

['s', 's', 'u', 'u', 'u', 'p', 'e', 'r', 'f', 'l', 'o']


Another useful method is most_common. You can ask the Counter what the most common items are by passing in a number that represents what the top recurring “n” items are:



In [6]:
print(counter.most_common())
print(counter.most_common(2))

[('u', 3), ('s', 2), ('p', 1), ('e', 1), ('r', 1), ('f', 1), ('l', 1), ('o', 1)]
[('u', 3), ('s', 2)]


The other method that I want to cover is the subtract method. The subtract method accepts an iterable or a mapping and the uses that argument to subtract

In [8]:
from collections import Counter

counter_one = Counter('superfluous')
print (counter_one)
#Counter({'u': 3, 's': 2, 'l': 1, 'r': 1, 'e': 1, 'o': 1, 'p': 1, 'f': 1})

counter_two = Counter('super')
print(counter_one.subtract(counter_two))
#None

print (counter_one)
#Counter({'u': 2, 'l': 1, 'o': 1, 's': 1, 'f': 1, 'r': 0, 'e': 0, 'p': 0})

Counter({'u': 3, 's': 2, 'p': 1, 'e': 1, 'r': 1, 'f': 1, 'l': 1, 'o': 1})
None
Counter({'u': 2, 's': 1, 'f': 1, 'l': 1, 'o': 1, 'p': 0, 'e': 0, 'r': 0})


 you can use the Counter against any iterable or mapping, so you don’t have to just use strings. You can also pass it tuples, dictionaries and lists! Give it a try on your own to see how it works with those other data types.

## defaultdict

The collections module has a handy tool called defaultdict. The defaultdict is a subclass of Python’s dict that accepts a default_factory as its primary argument. The default_factory is usually a Python type, such as int or list, but you can also use a function or a lambda too.

In [1]:
sentence = "The red for jumped over the fence and ran to the zoo for food"
words = sentence.split(' ')

reg_dict={}
for word in words:
    if word in reg_dict:
        reg_dict[word] += 1
    else:
        reg_dict[word] = 1
print(reg_dict)

{'The': 1, 'red': 1, 'for': 2, 'jumped': 1, 'over': 1, 'the': 2, 'fence': 1, 'and': 1, 'ran': 1, 'to': 1, 'zoo': 1, 'food': 1}


In [3]:
from collections import defaultdict
    
sentence = "The red for jumped over the fence and ran to the zoo for food"
word = sentence.split(' ')

d = defaultdict(int)
for word in words:
    d[word] += 1
    
print(d)

defaultdict(<class 'int'>, {'The': 1, 'red': 1, 'for': 2, 'jumped': 1, 'over': 1, 'the': 2, 'fence': 1, 'and': 1, 'ran': 1, 'to': 1, 'zoo': 1, 'food': 1})


You will notice right away that the code is much simpler. The defaultdict will automatically assign zero as the value to any key it doesn’t already have in it. We add one so it makes more sense and it will also increment if the word appears multiple times in the sentence.

In [5]:
my_list = [(1234, 100.23), (345, 10.45), (1234, 75.00),
           (345, 222.66), (678, 300.25), (1234, 35.67)]

reg_dict = {}
for acct_num, value in my_list:
    if acct_num in reg_dict:
        reg_dict[acct_num].append(value)
    else:
        reg_dict[acct_num] = [value]
        
print(reg_dict)
        

{1234: [100.23, 75.0, 35.67], 345: [10.45, 222.66], 678: [300.25]}


Now let’s re-implement this code using defaultdict:

In [7]:
from collections import defaultdict

my_list = [(1234, 100.23), (345, 10.45), (1234, 75.00),
           (345, 222.66), (678, 300.25), (1234, 35.67)]

d = defaultdict(list)
for acct_num, value in my_list:
    d[acct_num].append(value)
    
print(d)

defaultdict(<class 'list'>, {1234: [100.23, 75.0, 35.67], 345: [10.45, 222.66], 678: [300.25]})


This is some pretty cool stuff! Let’s go ahead and try using a lambda too as our default_factory!



In [13]:
from collections import defaultdict
animal = defaultdict(lambda: "Monkey")
animal['Sam'] = 'Tiger'
print(animal)
print(animal['Nick'])

print(animal)

defaultdict(<function <lambda> at 0x0000018B3403A168>, {'Sam': 'Tiger'})
Monkey
defaultdict(<function <lambda> at 0x0000018B3403A168>, {'Sam': 'Tiger', 'Nick': 'Monkey'})


Here we create a defaultdict that will assign ‘Monkey’ as the default value to any key. The first key we set to ‘Tiger’, then the next key we don’t set at all. If you print the second key, you will see that it got assigned ‘Monkey’. In case you haven’t noticed yet, it’s basically impossible to cause a KeyError to happen as long as you set the default_factory to something that makes sense. The documentation does mention that if you happen to set the default_factory to None, then you will receive a KeyError. Let’s see how that works:

In [15]:
from collections import defaultdict
x = defaultdict(None)
x['Mike']

KeyError: 'Mike'

In this case, we just created a very broken defaultdict. It can no longer assign a default to our key, so it throws a KeyError instead. Of course, since it is a subclass of dict, we can just set the key to some value and it will work. But that kind of defeats the purpose of the defaultdict.

## deque

According to the Python documentation, deques “are a generalization of stacks and queues”. They are pronounced “deck” which is short for "double-ended queue". They are a replacement container for the Python list. Deques are thread-safe and support memory efficient appends and pops from either side of the deque. A list is optimized for fast fixed-length operations. You can get all the gory details in the Python documentation. A deque accepts a **maxlen** argument which sets the bounds for the deque. Otherwise the deque will grow to an arbitrary size. When a bounded deque is full, any new items added will cause the same number of items to be popped off the other end.

As a general rule, if you need fast appends or fast pops, use a deque. If you need fast random access, use a list. Let’s take a few moments to look at how you might create and use a deque.

In [24]:
from collections import deque
import string 
d = deque(string.ascii_lowercase)
for letter in d:
    print(letter)

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z


Here we import the deque from our collections module and we also import the string module. To actually create an instance of a deque, we need to pass it an iterable. In this case, we passed it string.ascii_lowercase, which returns a list of all the lower case letters in the alphabet. Finally, we loop over our deque and print out each item. Now let’s look at at a few of the methods that deque possesses.

In [25]:
d.append('bork')
print(d)

deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'bork'])


In [26]:
d.appendleft('test')
print(d)

deque(['test', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'bork'])


In [42]:
d.rotate(1)
print(d)

deque(['bork', 'test', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'])


In [43]:
# negative number
d.rotate(-2)
print(d)

deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'bork', 'test'])


Let’s break this down a bit. First we append a string to the right end of the deque. Then we append another string to the left side of the deque… Lastly, we call rotate on our deque and pass it a one, which causes it to rotate one time to the right. In other words, it causes one item to rotate off the right end and onto the front. You can pass it a negative number to make the deque rotate to the left instead.



In [44]:
from collections import deque

def get_list(filename, n=5):
    """
    Returns the last n line from the file
    """
    try:
        with open(filename) as f:
            return deque(f,n)
    except OSError:
        print("Error opening file: {}".format(filename))
        raise

This code works in much the same way as Linux’s tail program does. Here we pass in a filename to our script along with the n number of lines we want returned. The deque is bounded to whatever number we pass in as n. This means that once the deque is full, when new lines are read in and added to the deque, older lines are popped off the other end and discarded. I also wrapped the file opening with statement in a simple exception handler because it’s really easy to pass in a malformed path. This will catch files that don’t exist for example.



In [49]:
print(get_list("C:\\abc.txt"))

deque(['third line\n', 'fourth line\n', 'fifth line\n', 'sixth line\n', 'seventh line\n'], maxlen=5)


In [50]:
print(get_list("abc.txt"))

deque(['third line\n', 'fourth line\n', 'fifth line\n', 'sixth line\n', 'seventh line\n'], maxlen=5)


## named tuple

The one that we’ll be focusing on in this section is the namedtuple which you can use to replace Python’s tuple. Of course, the namedtuple is not a drop-in replacement as you will soon see. I have seen some programmers use it like a struct. If you haven’t used a language with a struct in it, then that needs a little explanation. A struct is basically a complex data type that groups a list of variables under one name.

In [53]:
from collections import namedtuple

Parts = namedtuple('Parts', 'id_num desc cost amount')
auto_parts = Parts(id_num='1234', 
                   desc='Ford Engine',
                   cost= 120.0,
                   amount = 10)
print(auto_parts)

Parts(id_num='1234', desc='Ford Engine', cost=120.0, amount=10)


Here we import **namedtuple** from the collections module. Then we called namedtuple, which will return a new subclass of a tuple but with named fields. So basically we just created a new tuple class. you will note that we have a strange string as our second argument. This is a space delimited list of properties that we want to create.

Now that we have our shiny new class, let’s create an instance of it! As you can see above, we do that as our very next step when we create the **auto_parts** object. Now we can access the various items in our auto_parts using dot notation because they are now properties of our Parts class.

One of the benefits of using a namedtuple over a regular tuple is that you no longer have to keep track of each item’s index because now each item is named and accessed via a class property.

In [54]:
auto_parts = ('1234', 'FordEngine', 12000.0,10)
print(auto_parts[2])



12000.0


In [56]:
id_num, desc, cost, total_amount = auto_parts
print(total_amount)

10


In the code above, we create a regular tuple and access the cost of the vehicle engine by telling Python the appropriate index we want. Alternatively, we can also extract everything from the tuple using multiple assignment. Personally, I prefer the namedtuple approach just because it fits the mind easier and you can use Python’s **dir()** method to inspect the tuple and find out its properties. Give that a try and see what happens!

The other day I was looking for a way to convert a Python dictionary into an object and I came across some code that did something like this:

In [57]:
from collections import namedtuple

Parts = {'id_num': '1234', 
        'desc': 'FordEngine',
        'cost': 12000.00,
        'amoubt': 10}
parts = namedtuple('Parts',Parts.keys())(**Parts)
print(parts)

Parts(id_num='1234', desc='FordEngine', cost=12000.0, amoubt=10)


This is some weird code, so let’s take it a piece at a time. The first line we import namedtuple as before. Next we create a Parts dictionary. So far, so good. Now we’re ready for the weird part. Here we create our namedtuple class and name it ‘Parts’. The second argument is a list of the keys from our dictionary. The last piece is this strange piece of code: (\**Parts). The double asterisk means that we are calling our class using keyword arguments, which in this case is our dictionary. We could split this line into two parts to make it a little clearer:

In [59]:
parts = namedtuple('Parts',Parts.keys())
print(parts)

<class '__main__.Parts'>


In [60]:
auto_parts = parts(**Parts)
print(auto_parts)

Parts(id_num='1234', desc='FordEngine', cost=12000.0, amoubt=10)


So here we do the same thing as before, except that we create the class first, then we call the class with our dictionary to create an object. The only other piece I want to mention is that namedtuple also accepts a verbose argument and a rename argument. The **verbose** argument is a flag that will print out class definition right before it’s built if you set it to True. The **rename** argument is useful if you’re creating your namedtuple from a database or some other system that your program doesn’t control as it will automatically rename the properties for you.

## OrderedDict

As the name implies, this dictionary keeps track of the order of the keys as they are added. If you create a regular dict, you will note that it is an unordered data collection

In [2]:
d = {'banana': 3, 'orange': 2, 'apple': 4, 'pear':1}
print(d)

{'banana': 3, 'orange': 2, 'apple': 4, 'pear': 1}


In [3]:
keys = d.keys()
print(keys)

dict_keys(['banana', 'orange', 'apple', 'pear'])


In [4]:
keys = sorted(keys)
print(keys)

['apple', 'banana', 'orange', 'pear']


In [7]:
for key in keys:
    print(key, d[key])

apple 4
banana 3
orange 2
pear 1


Let’s create an instance of an OrderedDict using our original dict, but during the creation, we’ll sort the dictionary’s keys:

In [10]:
from collections import OrderedDict
d = {'banana':3,'orange':2, 'apple': 4, 'pear':1}
print(d)

new_d = OrderedDict(sorted(d.items()))
print(new_d)

for key in new_d:
    print(key, new_d[key])

{'banana': 3, 'orange': 2, 'apple': 4, 'pear': 1}
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
apple 4
banana 3
orange 2
pear 1


Here we create our OrderedDict by sorting it on the fly using Python’s sorted built-in function. The sorted function takes in the dictionary’s items, which is a list of tuples that represent the key pairs of the dictionary. It sorts them and then passes them into the OrderedDict, which will retain their order. Thus when we go to print our the keys and values, they are in the order we expect. If you were to loop over a regular dictionary (not a sorted list of keys), the order would change all the time.

Note that if you add new keys, they will be added to the end of the OrderedDict instead of being automatically sorted.

Something else to note about OrderDicts is that when you go to compare two OrderedDicts, they will not only test the items for equality, but also that the order is correct. A regular dictionary only looks at the contents of the dictionary and doesn’t care about its order.

Finally, OrderDicts have two new methods in Python 3: **popitem** and **move_to_end**. The popitem method will return and remove a (key, item) pair. The move_to_end method will move an existing key to either end of the OrderedDict. The item will be moved right end if the last argument for OrderedDict is set to True (which is the default), or to the beginning if it is False.

Interestingly, OrderedDicts support reverse iteration using Python’s reversed built-in function:

In [12]:
for key in reversed(new_d):
    print(key, new_d[key])

pear 1
orange 2
banana 3
apple 4
