# The Collections Module

Python’s collections module has specialized container datatypes that can be used to replace Python’s general purpose containers (dict, tuple, list, and set). We will be studying the following parts of this fun module:

* ChainMap
* defaultdict
* deque
* namedtuple
* OrderedDict

## ChainMap

In [5]:
# Basic Usage
from collections import ChainMap

car_parts = {'hood': 500, 'engine': 5000, 'front_door': 750}
car_options = {'A/C': 1000, 'Turbo': 2500, 'rollbar': 300}
car_accessories = {'cover': 100, 'hood_ornament': 150, 'seat_cover': 99}
car_pricing = ChainMap(car_accessories, car_options, car_parts)

print(car_pricing['hood'])

500


In [7]:
# Using Chainmap to override defaults in a simple application
# chain_map.py
import argparse
import os

from collections import ChainMap


def main():
    app_defaults = {'username':'admin', 'password':'admin'}

    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--username')
    parser.add_argument('-p', '--password')
    args = parser.parse_args()
    command_line_arguments = {key:value for key, value 
                              in vars(args).items() if value}

    chain = ChainMap(command_line_arguments, os.environ, 
                     app_defaults)
    print(chain['username'])

if __name__ == '__main__':
    main()
    os.environ['username'] = 'test'
    main()

In [8]:
# Testing the above code
!python 2_collections_demos/chain_map.py -u mike

mike
mike


## Counter

In [13]:
# Counter can be used for easy and fast tallies
from collections import Counter

print(Counter('superfluous'))

counter = Counter('superfluous')
print(counter['u'])

Counter({'u': 3, 's': 2, 'p': 1, 'e': 1, 'r': 1, 'f': 1, 'l': 1, 'o': 1})
3


In [12]:
# Listing all elements represented in the dictionary

print(list(counter.elements()))

['s', 's', 'u', 'u', 'u', 'p', 'e', 'r', 'f', 'l', 'o']


In [14]:
# Printing the most common elements in the dictionary

print(counter.most_common(2))

[('u', 3), ('s', 2)]


In [17]:
# Subracting one counter from another

from collections import Counter

counter_one = Counter('superfluous')
print(counter_one)


counter_two = Counter('super')
print(counter_one.subtract(counter_two))


print(counter_one)

Counter({'u': 3, 's': 2, 'p': 1, 'e': 1, 'r': 1, 'f': 1, 'l': 1, 'o': 1})
None
Counter({'u': 2, 's': 1, 'f': 1, 'l': 1, 'o': 1, 'p': 0, 'e': 0, 'r': 0})


## defaultdict

The defaultdict is a subclass of Python’s dict that accepts a default_factory as its primary argument. The default_factory is usually a Python type, such as int or list, but you can also use a function or a lambda too

In [18]:
# Count occurences of words in a sentence
# Regular python dictionary

sentence = "The red for jumped over the fence and ran to the zoo for food"
words = sentence.split(' ')

reg_dict = {}
for word in words:
    if word in reg_dict:
        reg_dict[word] += 1
    else:
        reg_dict[word] = 1

print(reg_dict)

{'The': 1, 'red': 1, 'for': 2, 'jumped': 1, 'over': 1, 'the': 2, 'fence': 1, 'and': 1, 'ran': 1, 'to': 1, 'zoo': 1, 'food': 1}


In [19]:
# Same task, but utilizing defaultdict

from collections import defaultdict


sentence = "The red for jumped over the fence and ran to the zoo for food"
words = sentence.split(' ')

d = defaultdict(int)
for word in words:
    d[word] += 1

print(d)

defaultdict(<class 'int'>, {'The': 1, 'red': 1, 'for': 2, 'jumped': 1, 'over': 1, 'the': 2, 'fence': 1, 'and': 1, 'ran': 1, 'to': 1, 'zoo': 1, 'food': 1})


In [20]:
# Building dictionary from list of tuples
# List second values associates with first partial in list
# Regular python dictionary

my_list = [(1234, 100.23), (345, 10.45), (1234, 75.00),
           (345, 222.66), (678, 300.25), (1234, 35.67)]

reg_dict = {}
for acct_num, value in my_list:
    if acct_num in reg_dict:
        reg_dict[acct_num].append(value)
    else:
        reg_dict[acct_num] = [value]

print(reg_dict)

{1234: [100.23, 75.0, 35.67], 345: [10.45, 222.66], 678: [300.25]}


In [22]:
# Building dictionary from list of tuples
# List second values associates with first partial in list
# Using defaultdict

from collections import defaultdict


my_list = [(1234, 100.23), (345, 10.45), (1234, 75.00),
           (345, 222.66), (678, 300.25), (1234, 35.67)]

d = defaultdict(list)
for acct_num, value in my_list:
    d[acct_num].append(value)
    # No need to add key if it doesnt exist

print(d)

defaultdict(<class 'list'>, {1234: [100.23, 75.0, 35.67], 345: [10.45, 222.66], 678: [300.25]})


In [25]:
# Using Lambda as the default_factory

from collections import defaultdict
animal = defaultdict(lambda: "Monkey")
animal['Sam'] = 'Tiger'
print(animal['Jacob'])
#Monkey

print(animal)
#defaultdict(<function <lambda> at 0x7f32f26da8c0>, {'Nick': 'Monkey', 'Sam': 'Tiger'})

Monkey
defaultdict(<function <lambda> at 0x10b4ec280>, {'Sam': 'Tiger', 'Jacob': 'Monkey'})


In [26]:
# Setting your default factory to None will return error

from collections import defaultdict
x = defaultdict(None)
x['Mike']
#Traceback (most recent call last):
#  File "/usercode/__ed_file.py", line 3, in <module>
# x['Mike']
#KeyError: 'Mike'

KeyError: 'Mike'

## deque

According to the Python documentation, deques “are a generalization of stacks and queues”. They are pronounced “deck” which is short for "double-ended queue". They are a replacement container for the Python list. Deques are thread-safe and support memory efficient appends and pops from either side of the deque.

A deque accepts a maxlen argument which sets the bounds for the deque. Otherwise the deque will grow to an arbitrary size. When a bounded deque is full, any new items added will cause the same number of items to be popped off the other end.

In [34]:
# Creating a deque
from collections import deque
import string

d = deque(string.ascii_lowercase)
for letter in d:
    print(letter)

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z


In [35]:
d.append('mung')
print(d)
print('\n')

d.appendleft('test')
print(d)
print('\n')

d.rotate(2)
print(d)
print('\n')

deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'mung'])


deque(['test', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'mung'])


deque(['z', 'mung', 'test', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y'])




In [40]:
from collections import deque


def get_last(filename, n=5):
    """
    Returns the last n lines from the file
    """
    try:
        with open(filename) as f:
            return deque(f, n)
    except OSError:
        print("Error opening file: {}".format(filename))
        raise

In [43]:
get_last('./2_collections_demos/collections_deque_data.csv', n=5)

deque(['Cleo\n', 'Dave\n', 'Anna\n', 'Leonardo\n', 'Einstein'])

## namedtuple

In [2]:
# Creating a named tuple

from collections import namedtuple

Parts = namedtuple('Parts', 'id_num desc cost amount')
auto_parts = Parts(id_num='1234', desc='Ford Engine',
               cost=1200.00, amount=10)

print(auto_parts.id_num)

1234


In [4]:
# Getting data via index
auto_parts = ('1234', 'Ford Engine', 1200.00, 10)
print(auto_parts[2]) # access the cost

# Assigning named tuple elements to their own variables
id_num, desc, cost, amount = auto_parts
print(id_num)


1200.0
1234


In [6]:
# Using a dictionary to create namedtuple
from collections import namedtuple

Parts = {'id_num':'1234', 'desc':'Ford Engine',
     'cost':1200.00, 'amount':10}
parts = namedtuple('Parts', Parts.keys())(**Parts)

print(parts)

Parts(id_num='1234', desc='Ford Engine', cost=1200.0, amount=10)


In [7]:
# Explanation for the above code

# First we create the named tuple keys
parts = namedtuple('Parts', Parts.keys())
print(parts)
#<class '__main__.Parts'>

# Next we assign the values from our dict
auto_parts = parts(**Parts)
print(auto_parts)
#Parts(amount=10, cost=1200.0, id_num='1234', desc='Ford Engine')

<class '__main__.Parts'>
Parts(id_num='1234', desc='Ford Engine', cost=1200.0, amount=10)


## OrderedDict

Essentially a dictionary with an inherit order.

Note that if you add new keys, they will be added to the end of the OrderedDict instead of being automatically sorted.

Something else to note about OrderDicts is that when you go to compare two OrderedDicts, they will not only test the items for equality, but also that the order is correct. A regular dictionary only looks at the contents of the dictionary and doesn’t care about its order.

Finally, OrderDicts have two new methods in Python 3: popitem and move_to_end. The popitem method will return and remove a (key, item) pair. The move_to_end method will move an existing key to either end of the OrderedDict. The item will be moved right end if the last argument for OrderedDict is set to True (which is the default), or to the beginning if it is False.

In [9]:
# Example of how python dictionaries are not ordered

d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

print(d)

{'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}


In [10]:
# In order to print a sorted list we must take the folliwng steps

# Assign kets to variable
keys = d.keys()
print (keys)

# Sort keys
keys = sorted(keys)
print (keys)

# Print values associated with sorted keys
for key in keys:
    print (key, d[key])

dict_keys(['banana', 'apple', 'pear', 'orange'])
['apple', 'banana', 'orange', 'pear']
apple 4
banana 3
orange 2
pear 1


In [15]:
# Using OrderedDict
from collections import OrderedDict

d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
new_d = OrderedDict(sorted(d.items()))
print(new_d)

for key in new_d:
    print(key, new_d[key])


OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
apple 4
banana 3
orange 2
pear 1


In [16]:
# Printing OrderedDict in reverse!
for key in reversed(new_d):
    print (key, new_d[key])

pear 1
orange 2
banana 3
apple 4
