# Collections Module

The collections module is a built-in module that implements specialized containers providing alternatives to Python’s general purpose built-in containers: dict, list, set, and tuple.

Now we'll learn about the alternatives that the collections module provides.

## collections.Counter class

**Counter** is a **dict subclass** which helps count **hashable objects**.

Counter is a subclass of dictionary object. The Counter() function takes an iterable or a mapping as the argument and returns a Dictionary. In this output dictionary, a key is an unique element in the iterable or the mapping and value is the number of times that element exists in the iterable or the mapping

Let's see how it can be used:

In [18]:
from collections import Counter

**Counter() with lists**

In [11]:
lst = [1,2,2,2,2,3,3,3,1,2,1,12,3,2,32,1,21,1,223,1] # note that all the elements are hashable

Counter(lst)

Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

**Counter with strings**

In [20]:
Counter('aabsbsbsbhshhbbsbs')

Counter({'a': 2, 'b': 7, 's': 6, 'h': 3})

**Counter with words in a sentence**

In [21]:
s = 'How many times does each word show up in this sentence word times each each word'

words = s.split()

Counter(words)

Counter({'How': 1,
         'many': 1,
         'times': 2,
         'does': 1,
         'each': 3,
         'word': 3,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1})

### Counter initialized with a dictionary/mapping

the Counter() function can take a dictionary as an argument. In this dictionary, the value of a key should be the 'count' of that key.

In [22]:
Counter({1:3,2:4})

Counter({1: 3, 2: 4})

### Accessing any item with its key

You can access any counter item with its key as shown below:

In [23]:
lst = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(lst)
print(cnt[1])  # number of occurrences of 1

3


## The methods of Counter() class

**Counter** class which is a subclass of **dict**. So it has all the methods of **dict** class.

Apart from that, **Counter** has three additional functions:

    elements()
    most_common([n])
    subtract([interable-or-mapping])


### Counter.elements() method

The Counter.elements() method returns an iterator. The iterator yields each **key** in the counter object **Counter[key]** times

In [2]:
from collections import Counter
c = Counter({2:4,5:3})
print(list(c.elements()))

[2, 2, 2, 2, 5, 5, 5]


Here, we create a Counter object with a dictionary as an argument. In this Counter object, count of 2 is 4 and count of 5 is 3. The elements() function is called using c object which returns an iterator which is passed as an argument to the list.

The iterator repeats 4 times over 2 returning four '2's, and repeats three times over 5 returning three '5's to the list.

### Counter.most_common() Function

Counter() class returns a dictionary which is unordered. You can sort it according to the number of counts in each element using most_common() function of the Counter object:

In [14]:
lst = [2,2,2,2,3,4,1,2,6,7,3,8,1,8,8]
cnt = Counter(lst)
print(cnt)
print(cnt.most_common())

Counter({2: 5, 8: 3, 3: 2, 1: 2, 4: 1, 6: 1, 7: 1})
[(2, 5), (8, 3), (3, 2), (1, 2), (4, 1), (6, 1), (7, 1)]


Counter.most_common() function returns a list of tuples. Each tuple is a pair of (key, num_of_occurrences). The tuples are sorted based on num_of_occurrences. 2 has a count of five, therefore it is the first element of the list.

### Counter.subtract() method

The subtract() takes iterable (list) or a mapping (dictionary) as an argument and deducts elements' count using that argument. Check the following example:

In [10]:
cnt = Counter({1:3,2:4})
deduct = {1:1, 2:2, 7:7}
cnt.subtract(deduct)
print(cnt)

Counter({1: 2, 2: 2, 7: -7})


You can notice that **cnt** object we first created, has a count of 3 for '1' and count of 4 for '2'. The **deduct** dictionary has the count 1 for key '1', count 2 for key '2' and count 7 for '7'. The subtract() function deducted count 1 from count 3 for key '1' and subtracted count 2 from count 4 for key '2'.

Note that subtract() function deducted count 7 from count 0 for key '7', which has a count of 0 in **cnt**

## Common patterns when using the Counter() object

    sum(counter.values())                 # total of all counts
    counter.clear()                       # reset all counts
    list(counter)                         # list unique elements
    set(counter)                          # convert to a set
    dict(counter)                         # convert to a regular dictionary
    counter.items()                       # convert to a list of (elem, cnt) pairs
    Counter(dict(list_of_pairs))          # convert from a list of (elem, cnt) pairs
    counter.most_common()[:-n-1:-1]       # n least common elements
    counter.most_common()[:n:1]           # n most common elements  
    counter += Counter()                  # remove zero and negative counts

#### list(counter) :  a list of unique elements

In [2]:
from collections import Counter
lst = [2,2,2,2,3,4,1,2,6,7,3,8,1,8,8]
counter = Counter(lst)
print(list(counter))   # list unique elements
print(list(counter.keys()))

[2, 3, 4, 1, 6, 7, 8]
[2, 3, 4, 1, 6, 7, 8]


#### set(counter) : a set of unique elements

In [18]:
lst = [2,2,2,2,3,4,1,2,6,7,3,8,1,8,8]
counter = Counter(lst)
set(counter)

{1, 2, 3, 4, 6, 7, 8}

#### dict(counter) : convert to a regular dictionary

In [19]:
lst = [2,2,2,2,3,4,1,2,6,7,3,8,1,8,8]
counter = Counter(lst)
dict(counter)

{2: 5, 3: 2, 4: 1, 1: 2, 6: 1, 7: 1, 8: 3}

#### counter.items()

In [4]:
lst = [2,2,2,2,3,4,1,2,6,7,3,8,1,8,8]  # a list containing elements
counter = Counter(lst)       # container
print(counter)

print('****************** Using Counter.items() **********************************')
iterable = counter.items()   # Counter.items() returns an iterable
print(type(iterable))        # dict_items view object
print(hasattr(iterable, '__iter__')) # True
print(hasattr(iterable, '__next__')) # False
print(list(iterable))        # convert to a list of (element, cnt) pairs

print('****************** Do you see the difference between Counter.items() and iter(counter) ? ******')

iterator = iter(counter)    # counter is a dict, so it is an iterable
print(list(iterator))       # iterator yiels keys in the counter (i.e. unique elements in lst)

Counter({2: 5, 8: 3, 3: 2, 1: 2, 4: 1, 6: 1, 7: 1})
****************** Using Counter.items() **********************************
<class 'dict_items'>
True
False
[(2, 5), (3, 2), (4, 1), (1, 2), (6, 1), (7, 1), (8, 3)]
****************** Do you see the difference between Counter.items() and iter(counter) ? ******
[2, 3, 4, 1, 6, 7, 8]


#### Counter(dict(list_of_pairs))

In [45]:
list_of_pairs = [(2, 5), (3, 2), (4, 1), (1, 2), (6, 1), (7, 1), (8, 3)]
print(dict(list_of_pairs))
counter = Counter(dict(list_of_pairs))  # convert from a list of (elem, cnt) pairs
print(counter)

{2: 5, 3: 2, 4: 1, 1: 2, 6: 1, 7: 1, 8: 3}
Counter({2: 5, 8: 3, 3: 2, 1: 2, 4: 1, 6: 1, 7: 1})


#### n least common elements

In [47]:
n = 3
counter.most_common()[:-n-1:-1]

[(7, 1), (6, 1), (4, 1)]

#### n most common elements

In [50]:
n = 2
counter.most_common()[:n:1]

[(2, 5), (8, 3)]

#### remove zero and negative counts

In [5]:
counter = Counter({1:3,2:4,3:0})
deduct = {1:1, 2:2, 7:7}
counter.subtract(deduct)
print(counter)
counter += Counter()  # remove zero and negative counts
print(counter)

Counter({1: 2, 2: 2, 3: 0, 7: -7})
Counter({1: 2, 2: 2})


## collections.defaultdict

defaultdict is a dictionary-like object which provides all methods provided by a dictionary but takes an argument (called **default_factory**) as a default data type for the dictionary. Using **defaultdict** is faster than doing the same using **dict.setdefault()** method:

In [6]:
# using dict.set_default()
d = {1:1, 2:2, 7:7}
d.setdefault(9, 5)  # inserts key 9 with a value of 5
d.setdefault(2, 0)  # does nothing, since key 2 already exists in d
d.setdefault(3)     # key 3 does not exist, create an entry d[3] = None
d

{1: 1, 2: 2, 7: 7, 9: 5, 3: None}

In [7]:
d = {}  # a normal dict instance, which would raise a KeyError if a key does not exist in it

In [58]:
d['one']  # builtin dict d will raise a KeyError

KeyError: 'one'

**A defaultdict will never raise a KeyError. Instead, any key that does not exist gets the value returned by the default factory.**

In [60]:
from collections import defaultdict
d  = defaultdict(object)

In [10]:
d['one'] 

<object at 0x1792df202e0>

In [11]:
for item in d:
    print(item)

one


Can also initialize with default values:

In [12]:
d = defaultdict(lambda: 0)

In [13]:
d['one']

0

### A more complicated example with defaultdict [3]

In the this example, we start with a list of tuples (state, city). 

We want to build a dictionary where the keys are the states and the values are lists of all cities for that state. 
To build this dictionary of lists, we use a **defaultdict** with **a default factory of list**. A new list is created for each new key.

In [73]:
from collections import defaultdict
master_list = [('TX','Austin'), ('TX','Houston'), ('NY','Albany'), ('NY', 'Syracuse'), ('NY', 'Buffalo'), 
               ('NY', 'Rochester'), ('TX', 'Dallas'), ('CA','Sacramento'), ('CA', 'Palo Alto'), ('GA', 'Atlanta')]

state_to_cities = defaultdict(list)
for state, city in master_list:
    state_to_cities[state].append(city)

print(state_to_cities, '\n')
for state in state_to_cities:
    city_list = state_to_cities[state]
    print(f'{state}: {", " .join(city_list)}')

defaultdict(<class 'list'>, {'TX': ['Austin', 'Houston', 'Dallas'], 'NY': ['Albany', 'Syracuse', 'Buffalo', 'Rochester'], 'CA': ['Sacramento', 'Palo Alto'], 'GA': ['Atlanta']}) 

TX: Austin, Houston, Dallas
NY: Albany, Syracuse, Buffalo, Rochester
CA: Sacramento, Palo Alto
GA: Atlanta


In conclusion, whenever you need a dictionary, and each element’s value should start with a default value, use a defaultdict.

## collections.OrderedDict

An OrderedDict is a dictionary subclass that remembers the order in which its contents are added.

Starting from Python 3.6 onwards, a normal dictionary ALSO remembers the order in which its contents are added [4]:

In [87]:
print('Normal dictionary:')  # no matter how many times you run this cell, it will always print an ordered d

d = {}

d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'
d['d'] = 'D'
d['e'] = 'E'

for k, v in d.items():
    print(k, v)

Normal dictionary:
a A
b B
c C
d D
e E


An Ordered Dictionary is **almost** made obsolate by the normal dictionary from Python 3.6 onwards [4]

In [15]:
from collections import OrderedDict

print('OrderedDict:')

d = OrderedDict()

d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'
d['d'] = 'D'
d['e'] = 'E'

for k, v in d.items():
    print(k, v)

OrderedDict:
a A
b B
c C
d D
e E


## Equality with an Ordered Dictionary
The **almost** part: A regular dict looks at its contents when testing for equality. An OrderedDict also considers the order the items were added.

A normal Dictionary:

In [90]:
print('Dictionaries are equal?')

d1 = {}
d1['a'] = 'A'
d1['b'] = 'B'

d2 = {}
d2['b'] = 'B'
d2['a'] = 'A'

print('d1=', d1)
print('d2=', d2)
print(d1==d2)

Dictionaries are equal?
d1= {'a': 'A', 'b': 'B'}
d2= {'b': 'B', 'a': 'A'}
True


An Ordered Dictionary:

In [93]:
from collections import OrderedDict
print('Dictionaries are equal?')

d1 = OrderedDict()
d1['a'] = 'A'
d1['b'] = 'B'


d2 = OrderedDict()

d2['b'] = 'B'
d2['a'] = 'A'

print('d1=', d1)
print('d2=', d2)
print(d1==d2)

Dictionaries are equal?
d1= OrderedDict([('a', 'A'), ('b', 'B')])
d2= OrderedDict([('b', 'B'), ('a', 'A')])
False


## collections.deque() : A list like class optimized for left/right insertions/deletions

The deque is a list optimized for inserting and removing items from left/right.

In [108]:
from collections import deque

# You can create a deque with deque() constructor. You have to pass a list as an argument.
list = ["a","b","c"]
deq = deque(list)
print(deq)

deque(['a', 'b', 'c'])


In [109]:
deq.append("d")     # appends to the right of the deque obj
deq.appendleft(1)   # appends to the left of the deque obj
print(deq)

deque([1, 'a', 'b', 'c', 'd'])


In [110]:
deq.pop()           # removes from the right of the deque object  
deq.popleft()       # removes from the left of the deque object
print(deq)

deque(['a', 'b', 'c'])


In [111]:
# counting a specific element in the deque object
print(deq.count('b'))

1


In [107]:
# clearing all the elements away from the deque object
deq.clear()
print(deq)

deque([])


## collections.ChainMap()

ChainMap is used to combine several dictionaries (aka. mappings). It returns a list of dictionaries

In [14]:
from collections import ChainMap
# To create a chainmap we can use ChainMap() constructor. 
# We have to pass the dictionaries we are going to combine as an argument set.
dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print(chain_map.maps)

# You can access chain map values by key name
print(chain_map['a'])
print(chain_map['b'])

# An important point is ChainMap updates its values when its associated dictionaries are updated. 
# For example, if you change the value of 'c' in dict2 to '5', you will notice the change in ChainMap as well.
# This is the updatable aspect of a ChainMap
dict2['c'] = 5
print(chain_map.maps)

[{'a': 1, 'b': 2}, {'c': 3, 'b': 4}]
1
2
[{'a': 1, 'b': 2}, {'c': 5, 'b': 4}]


#### Getting Keys and Values from ChainMap

You can access the keys of a ChainMap with keys() function. 
Similarly, you can access the values of elements with values() function, as shown below:

In [11]:
dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print (list(chain_map.keys()))
print (list(chain_map.values()))

['c', 'b', 'a']
[3, 2, 1]


As a rule of thumb, when one key appears in more than one associated dictionaries (i.e. b), 
ChainMap takes the value for that key from the first dictionary (i.e. dict1).

#### Adding a New Dictionary to ChainMap

If you want to add a new dictionary to an existing ChainMap, use **new_child()** function. It creates a new ChainMap with the newly added dictionary.

In [13]:
dict3 = {'e' : 5, 'f' : 6}
new_chain_map = chain_map.new_child(dict3)
print(new_chain_map)

ChainMap({'e': 5, 'f': 6}, {'a': 1, 'b': 2}, {'c': 3, 'b': 4})


Notice that new dictionary (i.e. dict3) is added to the beginning of ChainMap list.

# namedtuple
The standard tuple uses numerical indexes to access its members, for example:

In [15]:
sam = (2, 'Lab','Sammy')  # a dog tuple (age, breed, name)

In [16]:
sam[0]

2

One of the biggest problems with ordinary tuples is that you have to remember the index of each field of a tuple object (i.e. age has an index 0, breed has an index 1 and name has an index 3)

For simple use cases, this is usually enough. On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields. A namedtuple assigns names, as well as the numerical index, to each field. 

Each kind of namedtuple is represented by its own class, created by using the **namedtuple() factory function**. The arguments are the name of the new class and a string containing the names of the elements.

You can basically think of namedtuples as a very quick way of creating a new class type with some attribute fields. For example:

In [20]:
from collections import namedtuple

In [21]:
Dog = namedtuple('Dog','age breed name')

sam = Dog(age=2,breed='Lab',name='Sammy')

frank = Dog(age=2,breed='Shepard',name="Frankie")

We construct the namedtuple by first passing the object type name (Dog) and then passing a string with the variety of fields as a string with spaces between the field names. We can then call on the various attributes:

In [22]:
sam

Dog(age=2, breed='Lab', name='Sammy')

In [23]:
sam.age

2

In [24]:
sam.breed

'Lab'

In [25]:
sam[0]

2

## Conclusion

Hopefully you now see how incredibly useful the collections module is in Python and it should be your go-to module for a variety of common tasks!

## REFERENCES

[1] https://stackabuse.com/introduction-to-pythons-collections-module/

[2] https://www.w3schools.com/python/ref_dictionary_setdefault.asp

[3] https://www.accelebrate.com/blog/using-defaultdict-python

[4] https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6

[5] About the use of ChainMap:
    https://blog.florimond.dev/a-practical-usage-of-chainmap-in-python