# Python Collections Module

The Python collections module provides specialized container data types that extend the capabilities of Python's built-in containers, such as lists, tuples, sets, and dictionaries. This module introduces additional data structures that are particularly useful for handling complex data scenarios.

In [1]:
# example

from collections import Counter

In [4]:
# say theres a list, some unique but also some repeats
mylist = [1, 1, 1, 1, 2, 2, 2, 3, 3, 3]

Say we want a count of unique elements on the list. Typically, we can use a dictionary with a for loop that will track of what has been seen. However, counter can do it automatically with a simple call.



In [6]:
# we can just call counter and it will automatically count the amount per element
Counter(mylist)

Counter({1: 4, 2: 3, 3: 3})

In [8]:
# this also works on strings

other_list = ['a', 'a', 'a', 10, 10, 20, 20, 20, 5]

In [10]:
Counter(other_list)

Counter({'a': 3, 20: 3, 10: 2, 5: 1})

If you notice, it looks similar to a dictionary. Counter is technically a dictionary subclass that helps count hashable objects. So inside, the elements are stored as keys, the counts of the objects are stored as the value. Key = Object, Value = Count.

In [13]:
# it also works with strings too, this counts the individual letters

Counter('aabbbcddddddddeeeeffggggggggghhhhhhhhhhhhiiiiiiiijjjk')

Counter({'h': 12,
         'g': 9,
         'd': 8,
         'i': 8,
         'e': 4,
         'b': 3,
         'j': 3,
         'a': 2,
         'f': 2,
         'c': 1,
         'k': 1})

In [15]:
# you can also do it with a sentence

sentence = 'How many times does each word show up in this sentence with a word'

In [17]:
sentence.split() # you can split

['How',
 'many',
 'times',
 'does',
 'each',
 'word',
 'show',
 'up',
 'in',
 'this',
 'sentence',
 'with',
 'a',
 'word']

In [18]:
Counter(sentence.split())

Counter({'word': 2,
         'How': 1,
         'many': 1,
         'times': 1,
         'does': 1,
         'each': 1,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1,
         'with': 1,
         'a': 1})

In [19]:
# you can even call lower case on it first then split it too

Counter(sentence.lower().split())

Counter({'word': 2,
         'how': 1,
         'many': 1,
         'times': 1,
         'does': 1,
         'each': 1,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1,
         'with': 1,
         'a': 1})

In [20]:
# common patterns when you encounter using the Counter object

letters = 'aaaaaabbbcccccccccccccddddddd'

In [21]:
c = Counter(letters)

In [22]:
c

Counter({'c': 13, 'd': 7, 'a': 6, 'b': 3})

In [25]:
# if you c. and hit tab, you can see what other methods you can use

c.most_common() # returns most common to least in order

[('c', 13), ('d', 7), ('a', 6), ('b', 3)]

In [26]:
# you can also pass a number of the amount of returns you want of a count. 
# say you only want to see 2 of those letters that occur x amount of times or the letter with x counts.

c.most_common(2)

[('c', 13), ('d', 7)]

In [27]:
# now say i want the 3 most common 
c.most_common(3)

[('c', 13), ('d', 7), ('a', 6)]

In [30]:
# say we want to list unique keys or objects with Counter

list(c)

['a', 'b', 'c', 'd']

# Default Dictionary

This is a comparison between a normal dictionary vs a default dictionary.

In [46]:
from collections import defaultdict

In [47]:
# normal

d = {'a': 10}

In [48]:
d

{'a': 10}

In [49]:
d['a']

10

In [50]:
# everything is normal so far. say in a normal dict, we call the wrong key

d['wrong']

# we will get an error

KeyError: 'wrong'

In certain situations especially when doing for loop when you quickly want to add in keys that are not alerady present in your dictionary for any given reason, you can use a default dictionary. What it does is, it will assign a default value if there is an instance where a key error would've occured. Essentially if you try to ask for a key thats not present or does not exist, it will assign it with some default value.

In [60]:
# lets create one, instead of returning key error, it will assign that key value

d = defaultdict(lambda: 0)

In [61]:
d['correct'] = 100

In [62]:
d['correct']

100

In [63]:
d['WRONG KEY!'] 
# in a normal dictionary it would result in a key error, however if you want your script to keep
# running then we can use defaultdict and assign to some default value like 0.

0

As you notice, we assigned 100 to the key 'correct', called it and returned the value 100. We did not assign any value to the key 'WRONG KEY' (named wrong key on purpose to show example of a key that may not exist), this time returning the value 0, despite us not actually assigning a single value. This is how the defaultdict works. We did defaultdict(lambda: 0), where 0 is always going to be the default assignment for any keys that do not exist but get called which avoids errors and continues to run the script. 

In [64]:
d

defaultdict(<function __main__.<lambda>()>, {'correct': 100, 'WRONG KEY!': 0})

# Last Specialised Container Object From Collections Module: Named Tuple

Named tuple tries to expand on a normal tuple object by actually having named indices.

In [65]:
# example

mytuple = (10, 20, 30)

In [68]:
mytuple[0]

# if we want 10

10

In some situations you may have a large tuple, or you may not remember what value is on what index. Named tuple is not only going to have a numeric connection to the values, but it will also have a named index for that value. So instead of calling it by 0, we could call it by some sort of string code.

In [69]:
from collections import namedtuple

In [70]:
Dog = namedtuple('Dog', ['age', 'breed', 'name'])

In [71]:
Dog

__main__.Dog

In [72]:
buddy = Dog(age = 4, breed = 'Golden Retriever', name = 'Buddy')

In [73]:
type(buddy)

__main__.Dog

In [74]:
buddy

Dog(age=4, breed='Golden Retriever', name='Buddy')

In [76]:
buddy.age

4

In [77]:
buddy.breed

'Golden Retriever'

In [78]:
buddy.name

'Buddy'

In [79]:
buddy[0]

4

In [83]:
buddy[2]

'Buddy'