The collections module is built in module that implements specialized container data types that are alternatives to Python's general purpose built in containers

We've already gone over the basics: list, tuple, set, and dict

Counter is a dict subclass which helps count hashable objects. 

Inside of it, elements are stored as dictionary keys and the counts of objects are stored as the value.

In [1]:
# counter

from collections import Counter

In [2]:
l = [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5]

In [3]:
Counter(l) #counts how many times each value shows up

Counter({1: 4, 2: 4, 3: 4, 4: 4, 5: 3})

In [4]:
s = 'aaaaaaaaaasssssssssssssggggggggggggggggghhhhhhheeeeee'

In [5]:
Counter(s) # counts how many times each letter shows up

Counter({'a': 10, 's': 13, 'g': 17, 'h': 7, 'e': 6})

In [9]:
# how many times does each word show up in a sentence?

s = 'How many times does does does each word word show up in this this this sentence?'

In [10]:
words = s.split()

Counter(words)

Counter({'How': 1,
         'many': 1,
         'times': 1,
         'does': 3,
         'each': 1,
         'word': 2,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 3,
         'sentence?': 1})

In [11]:
c = Counter(words) # we can run various methods off of a Counter object

In [13]:
c.most_common(3) # 3 most common words

[('does', 3), ('this', 3), ('word', 2)]

Common methods off Counter object:

-- sum(c.values()) - total of all counts

-- c.clear() - reset all counts

-- list(c) - list unique elements

-- set(c) - convert to set

-- dict(c) - convert to regular dictionary

-- c.items() - convert to list of (element, count) pairs

-- Counter(dict(listofpairs)) - convert from list of (element, count) pairs 

-- c.most_common()[:-n-1:-1] - find n least common elements

-- x += Counter() - remove zero and negative counts

In [14]:
sum(c.values())

16

In [16]:
list(c)

['How',
 'many',
 'times',
 'does',
 'each',
 'word',
 'show',
 'up',
 'in',
 'this',
 'sentence?']

In [17]:
set(c)

{'How',
 'does',
 'each',
 'in',
 'many',
 'sentence?',
 'show',
 'this',
 'times',
 'up',
 'word'}

In [18]:
dict(c)

{'How': 1,
 'many': 1,
 'times': 1,
 'does': 3,
 'each': 1,
 'word': 2,
 'show': 1,
 'up': 1,
 'in': 1,
 'this': 3,
 'sentence?': 1}

In [19]:
c.items()

dict_items([('How', 1), ('many', 1), ('times', 1), ('does', 3), ('each', 1), ('word', 2), ('show', 1), ('up', 1), ('in', 1), ('this', 3), ('sentence?', 1)])

In [20]:
c.most_common()[:-3-1:-1] 

[('sentence?', 1), ('in', 1), ('up', 1)]

## defaultdict()

defaultdict is a dictionary like object which provides all methods from dictionary but takes first argument default_factory as default data type for the dictionary.

Using defaultdict is faster than doing the same dict.set_default method

A defaultdict will never raise a key error. Any key that does not exist gets the value returned by the default factory.

In [21]:
from collections import defaultdict

In [23]:
d = {'k1':1}

In [24]:
d['k1']

1

In [25]:
d['k2'] # error because value is not in the dictionary

KeyError: 'k2'

In [26]:
d2 = defaultdict(object)

In [27]:
d2['one'] # one does not exist in defaultdict yet, so it returns object

<object at 0x20b058ebc40>

In [28]:
for item in d2:
    print(item) # now it is in the defaultdict

one


In [29]:
d3 = defaultdict(lambda: 0)

In [30]:
d3['one'] # automatic assignment

0

In [31]:
d3['two'] = 2 # manual assignment

In [33]:
print(d3)

defaultdict(<function <lambda> at 0x0000020B09C83598>, {'one': 0, 'two': 2})


## OrderedDict

An ordered dictionary is a dictionary subclass that remembers the order in which its contents are added


In [34]:
# normal dictionary

d = {}

d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4
d['e'] = 5

In [35]:
d

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

In [38]:
for k,v in d.items():
    print(k,v)

a 1
b 2
c 3
d 4
e 5


In [39]:
from collections import OrderedDict

In [40]:
# same thing as before but with OrderedDict

d = OrderedDict()

d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4
d['e'] = 5

In [41]:
for k,v in d.items():
    print(k,v)

a 1
b 2
c 3
d 4
e 5


In [46]:
d1 = {}

d1['a'] = 1
d1['b'] = 2

d2 = {}

d2['b'] = 2
d2['a'] = 1

In [47]:
d1

{'a': 1, 'b': 2}

In [48]:
d2

{'b': 2, 'a': 1}

In [49]:
d1 == d2 # returns true because contents are the same even though ordered differently

True

In [50]:
d1 = OrderedDict()

d1['a'] = 1
d1['b'] = 2

d2 = OrderedDict()

d2['b'] = 2
d2['a'] = 1

In [51]:
d1 == d2 # now it is false because they are not in the same order

False

## namedtuple

A standard tuple uses numerical indexing to access its members

This is usually fine, but remembering which index houses each value can sometimes lead to errors

A namedtuple assigns as well as a numerical index to each member

Each kind of namedtuple is represented by its own class created by namedtuple()

The arguments are the name of the new class and a string containing the names of the elements

namedtuples are an easy way to make new object / class types

In [52]:
# standard tuple 

t = (1,2,3)

In [53]:
t[0] # values are pulled out from numerical indexing

1

In [54]:
t[1]

2

In [55]:
from collections import namedtuple

In [56]:
# each type of namedtuple is like creating a new class very quickly

# first argument is name of class, second argument is string of attribute fields with spaces between diff fields

Dog = namedtuple('Dog', 'age breed name')

In [57]:
# now we can enter info for a dog

harley = Dog(age = 15, breed = 'Golden Doodle', name = 'Harley')

In [58]:
harley

Dog(age=15, breed='Golden Doodle', name='Harley')

In [59]:
harley.age

15

In [60]:
harley.breed

'Golden Doodle'

In [61]:
harley.name

'Harley'

In [62]:
# another example

Cat = namedtuple('Cat', 'fur claws name')

In [63]:
c = Cat(fur = 'striped', claws = False, name = 'Kitty')

In [64]:
c.name

'Kitty'

In [65]:
c.claws

False

In [66]:
c.fur

'striped'

In [67]:
c

Cat(fur='striped', claws=False, name='Kitty')