# `collections`

Collections in Python are containers that are used to store collections of data, for example, list, dict, set, tuple etc. These are built-in collections. Several modules have been developed that provide additional data structures to store collections of data. One such module is the Python [`collections`](https://docs.python.org/2/library/collections.html) module.

Python `collections` module was introduced to improve the functionalities of the built-in collection containers. Python collections module was first introduced in its 2.4 release.

Sequences in python:
- Flat: string, numpy array
- Container: list, collection.*

## Collections Module

In this tutorial we will discuss 6 of the most commonly used data structures from the Python collections module. They are as follows:

- `Counter`
- `defaultdict`
- `OrderedDict`
- `deque`
- `ChainMap`
- `namedtuple()`

## `collections.Counter`

In [15]:
from collections import Counter

Counter is a subclass of dictionary object. The `Counter()` function in collections module takes an iterable or a mapping as the argument and returns a Dictionary. In this dictionary, a key is an element in the iterable or the mapping and value is the number of times that element exists in the iterable or the mapping.

There are multiple ways to create counter objects. The simplest way is to use `Counter()` function without any arguments.

In [52]:
cnt = Counter()

You can pass an iterable (`list`) to `Counter()` function to create a counter object.

In [53]:
mylist = [1,2,3,4,1,2,6,7,3,8,1]
Counter(mylist)

Counter({1: 3, 2: 2, 3: 2, 4: 1, 6: 1, 7: 1, 8: 1})

Finally, the `Counter()` function can take a dictionary as an argument. In this dictionary, the value of a key should be the 'count' of that key.

You can access any counter item with its key as shown below:

In [6]:
mylist = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(mylist)
print(cnt[1])

3


In the above examples, cnt is an object of Counter class which is a subclass of dict. So it has all the methods of dict class.

Apart from that, Counter has three additional functions:

- Elements
- `most_common([n])`
- Subtract([interable-or-mapping])

In [68]:
cnt = Counter({1:3,2:4})
print(list(cnt.elements()))

[1, 1, 1, 2, 2, 2, 2]


In [69]:
mylist = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(mylist)

The `Counter()` function returns a dictionary which is unordered. You can sort it according to the number of counts in each element using `most_common()` function of the Counter object.

In [74]:
cnt.most_common()

[(1, 3), (2, 2), (3, 2), (4, 1), (6, 1), (7, 1), (8, 1)]

In [16]:
cnt.most_common(3)

[(1, 3), (2, 2), (3, 2)]

The `subtract()` takes iterable (list) or a mapping (dictionary) as an argument and deducts elements count using that argument. Check the following example:

In [17]:
cnt = Counter({1:3,2:4})
deduct = {1:1, 2:2}
cnt.subtract(deduct)

In [19]:
cnt

Counter({1: 2, 2: 2})

## `collections.defaultdict`

In [1]:
from collections import defaultdict

The defaultdict works exactly like a python dictionary, except for it does not throw `KeyError` when you try to access a non-existent key.

Instead, it initializes the key with the element of the data type that you pass as an argument at the creation of `defaultdict`. The data type is called `default_factory`.

You can create a defaultdict with the `defaultdict()` constructor. You have to specify a data type as an argument. Check the following code:

In [115]:
nums = defaultdict(int)
nums['one'] = 1
nums['two'] = 2

In [22]:
nums['three']

0

n this example, int is passed as the `default_factory`. Notice that you only pass `int`, not `int()`. Next, the values are defined for the two keys, namely, `'one'` and `'two'`, but in the next line we try to access a key that has not been defined yet.

In a normal dictionary, this will force a `KeyError`. But defaultdict initialize the new key with `default_factory`'s default value which is `0` for `int`. Hence, when the program is executed, and `0` will be printed. This particular feature of initializing non-existent keys can be exploited in various situations.

For example, let's say you want the count of each name in a list of names given as `"Mike, John, Mike, Anna, Mike, John, John, Mike, Mike, Britney, Smith, Anna, Smith"`.

In [124]:
from collections import defaultdict

count = defaultdict(int)
names_list = "Mike John Mike Anna Mike John John Mike Mike Britney Smith Anna Smith".split()
for names in names_list:
    count[names] += 1

## `collections.OrderedDict`

In [147]:
from collections import OrderedDict

`OrderedDict` is a dictionary where keys maintain the order in which they are inserted, which means if you change the value of a key later, it will not change the position of the key.

In [160]:
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3

In [161]:
od

OrderedDict([('a', 1), ('b', 2), ('c', 3)])

You can access each element using a loop as well. Take a look at the following code:

In [28]:
for key, value in od.items():
    print(key, value)

a 1
b 2
c 3


Following example is an interesting use case of `OrderedDict` with Counter. Here, we create a `Counter` from a `list` and insert element to an `OrderedDict` based on their count.

In [162]:
mylist = ["a","c","c","a","b","a","a","b","c"]
cnt = Counter(mylist)
od = OrderedDict(cnt.most_common())
for key, value in od.items():
    print(key, value)

a 4
c 3
b 2


## `collections.deque`

In [163]:
from collections import deque

The `deque` is a list optimized for inserting and removing items. You can create a deque with `deque()` constructor. You have to pass a `list` as an argument:

In [180]:
list = ["a","b","c"]
deq = deque(list)

In [181]:
deq

deque(['a', 'b', 'c'])

### Inserting Elements to deque
You can easily insert an element to the `deq` we created at either of the ends. To add an element to the right of the deque, you have to use `append()` method.

If you want to add an element to the start of the deque, you have to use `appendleft()` method.

In [182]:
deq.append("d")
deq.appendleft("e")

In [183]:
deq

deque(['e', 'a', 'b', 'c', 'd'])

### Removing Elements from the deque

Removing elements is similar to inserting elements. You can remove an element the similar way you insert elements. To remove an element from the right end, you can use `pop()` function and to remove an element from left, you can use `popleft()`.

In [184]:
deq.pop()

'd'

In [185]:
deq.popleft()

'e'

In [186]:
deq

deque(['a', 'b', 'c'])

### Counting Elements in a deque

If you want to find the count of a specific element, use `count(x)` function. You have to specify the element for which you need to find the count, as the argument.

In [187]:
deq.count("a")

1

### Clearing a deque

If you want to remove all elements from a deque, you can use `clear()` function.

In [53]:
mylist = ["a","b","c"]
deq = deque(mylist)

In [54]:
deq.clear()

In [58]:
deq

deque([])

## `collections.namedtuple()`

In [188]:
from collections import namedtuple

The `namedtuple()` returns a tuple with names for each position in the tuple. One of the biggest problems with ordinary tuples is that you have to remember the index of each field of a tuple object. This is obviously difficult. The `namedtuple` was introduced to solve this problem.

In [262]:
Student = namedtuple('Student', 'fname lname age')

# identical to
# Student = namedtuple('Student', ['fname', 'lname', 'age'])

In [251]:
s1 = Student('John', 'Clarke', '13')

In [252]:
s1.fname

'John'

In [253]:
s1.lname

'Clarke'

In [254]:
s1.age

'13'

In this example, a namedtuple object Student has been declared. You can access the fields of any instance of a Student class by the defined field name.

### Creating a namedtuple Using List

The `namedtuple()` function requires each value to be passed to it separately. Instead, you can use `_make()` to create a namedtuple instance with a list. Check the following code:

In [263]:
mylist = ['Adam','joe','18']

In [266]:
# identical to: Student(*['Adam','joe','18'])
s2 = Student._make(['Adam','joe','18'])

In [267]:
s2.fname

'Adam'

### Create a New Instance Using Existing Instance

The `_asdict()` function can be used to create an `OrderedDict` instance from an existing instance.

In [273]:
s1._asdict()

OrderedDict([('fname', 'John'), ('lname', 'Clarke'), ('age', '13')])

### Changing Field Values with `_replace()` Function

To change the value of a field of an instance, the `_replace()` function is used. Remember that, `_replace()` function creates a new instance. It does not change the value of existing instance.

In [281]:
s1 = s1._replace(age='14')

In [282]:
s1

Student(fname='John', lname='Clarke', age='14')

### Have spaces in field names?

The second argument of `nametuple()` can either be a space-separated string or a list of strings like

In [83]:
columns = namedtuple(
    'columns', 
    ['What is typically the main dish at your Thanksgiving dinner?', 'other column']
)

ValueError: Type names and field names must be valid identifiers: 'What is typically the main dish at your Thanksgiving dinner?'

However, doing so will fail `ValueError`:

This is because columns (which you should capitalize as Columns) will be an object with `'What is typically...'` as an identifier and identifiers can't have spaces.

You can however ask ‍`namedtuple` to rename invalid identifiers:

In [296]:
Columns = namedtuple('columns', ['what is this?', 'this is a aftabe pelastiki?', 'this'], rename=True)

In [297]:
Columns._fields

('_0', '_1', 'this')

In [299]:
col = Columns('field 0', 'field 1', 'field 2')

In [300]:
col._0  # ugly but valid

'field 0'

In [301]:
col.this

'field 2'

## `collections.ChainMap`

In [312]:
from collections import ChainMap

`ChainMap` is used to combine several dictionaries or mappings. It returns a list of dictionaries.

In [313]:
dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }

In [314]:
chain_map = ChainMap(dict1, dict2)

In [315]:
chain_map.maps

[{'a': 1, 'b': 2}, {'c': 3, 'b': 4}]

In [316]:
chain_map['a']

1

### Adding a New Dictionary to ChainMap

If you want to add a new dictionary to an existing `ChainMap`, use `new_child()` function. It creates a new `ChainMap` with the newly added dictionary.

In [317]:
dict3 = {'e' : 5, 'f' : 6}
new_chain_map = chain_map.new_child(dict3)
print(new_chain_map)

ChainMap({'e': 5, 'f': 6}, {'a': 1, 'b': 2}, {'c': 3, 'b': 4})


Notice that new dictionary is added to the beginning of `ChainMap` list.

> **Note:** For practical usage of `ChainMap` in Python see [here](https://florimond.dev/en/posts/2018/07/a-practical-usage-of-chainmap-in-python/)