# Worksheet 6A: `collections` Module

The collections module is a built-in module that implements specialised container data types providing alternatives to Python’s general purpose built-in containers. We've already gone over the basics: `dict`, `list`, `set`, & `tuple`.

Now we'll learn about the alternatives that the `collections` module provides.

---
## Q1: `Counter`

`Counter` is a `dict` subclass which helps count hashable objects. Inside of it, elements are stored as dictionary keys and the counts of the objects are stored as the value.

To start using this class, we first need to import it:

In [13]:
from collections import Counter

Then we can retrieve counts as follows:

In [14]:
lst = [1, 2, 2, 2, 2, 3, 3, 3, 1, 2, 1, 12, 3, 2, 32, 1, 21, 1, 223, 1]

c = Counter(lst)
c

Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

### Q1 a

Given the following:

In [15]:
text = "dkjhskjfhkjfhsk"

Use `Counter` to count the characters in `text`.

In [16]:
# answer:
t = Counter(text)
t

Counter({'d': 1, 'k': 4, 'j': 3, 'h': 3, 's': 2, 'f': 2})

### Q1 b

Given the following:

In [17]:
text = "This is a very long sentence because is has several words"

Use `Counter` to count the **words** in `text`.

In [18]:
# answer:
words = text.split(" ")
Counter(words)


Counter({'This': 1,
         'is': 2,
         'a': 1,
         'very': 1,
         'long': 1,
         'sentence': 1,
         'because': 1,
         'has': 1,
         'several': 1,
         'words': 1})

### Q1 c

Using the function `most_common` on `c`, get the most common value & count.

In [19]:
# answer:
for value, count in c.most_common(1):
    print('The most common number is %d, was found %d time/(s) ' % (value, count))


The most common number is 1, was found 6 time/(s) 


---
## Q2: `defaultdict`

`defaultdict` is a container like dictionaries and a sub-class of the `dict` class that returns a dictionary-like object. The functionality of both `dict` & `defualtdict` are almost the same except for the fact that defualtdict **never** raises a `KeyError`. If you query a defaultdict with a key that doesn't exist, it will simply insert a new entry for it providing it with the default value.

To use it we have to import it first:

In [10]:
from collections import defaultdict

Querying a key which isn't in the dictionary results in an error with a standard `dict`:

In [11]:
d1 = {}
d1["one"]

KeyError: 'one'

But a default value is returned if we try this with a `defaultdict`:

In [12]:
d2 = defaultdict(int)
d2["one"]

0

And the original dictionary is modified to include the new key-value pair in `d2`:

In [20]:
print(d1)
print(d2)

{}
defaultdict(<class 'int'>, {'one': 0})


Notice how we are passing `int` to the `defaultdict` function. Remember that `int` is a built-in function, & this can be replaced with any other function to supply a default value of choice:

In [9]:
d3 = defaultdict(lambda: 123)
d3["one"]

NameError: name 'defaultdict' is not defined

### Q2 a

Suppose we have the following list:

In [6]:
words = ["code", "worksheet", "computer", "class", "glass", "vase", "base", "wise"]

Using a standard `dict`, we can calculate how many words starting with a given letter as follows:

In [7]:
d = {}
for word in words:
    character = word[0]
    if character not in d:
        d[character] = 0
    d[character] += 1
d

{'c': 3, 'w': 2, 'g': 1, 'v': 1, 'b': 1}

Notice how we need to have an `if` statement to add the dictionary item if we have encountered a new character. Without it, we would encounter a `KeyError`. This sort of mechanical code can be reduced by using `defaultdict`s. Provide the code.

In [11]:
# answer:
d = defaultdict(int)
for word in words:
    character = word[0]
    d[character] += 1
d


defaultdict(int, {'c': 3, 'w': 2, 'g': 1, 'v': 1, 'b': 1})

### Q2 b

Now, provide the code to give the words starting with each letter instead of the counts (still using `defaultdict`).

In [44]:
# answer:
d = defaultdict(list)
for word in words:
    character = word[0]
    d[character].append(word)
d

defaultdict(list,
            {'c': ['code', 'computer', 'class'],
             'w': ['worksheet', 'wise'],
             'g': ['glass'],
             'v': ['vase'],
             'b': ['base']})

---
## Q3: `namedtuple`

The standard tuple uses numerical indexes to access its members, for example:

In [6]:
t1 = (2, "Lab", "Sammy")

In [7]:
t1[0]

2

For simple use cases, this is usually enough. On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. A `namedtuple` assigns names, as well as the numerical index, to each member. 

Each kind of `namedtuple` is represented by its own class, created by using the `namedtuple` factory function. The arguments are the name of the new class and a string containing the names of the elements.

You can basically think of `namedtuple`s as a very quick way of creating a new object/class type with some attribute fields.
For example:

In [2]:
from collections import namedtuple

In [3]:
Dog = namedtuple("Dog", ["age", "breed", "name"])

sam = Dog(age=2, breed="Lab", name="Sammy")
sam

Dog(age=2, breed='Lab', name='Sammy')

### Q3 a

This allows us to use the field names we provided to access the individual member of a tuple. Provide the code to access the `breed` value of `sam` using the field name.

In [4]:
# answer:
sam.breed

'Lab'

### Q3 b

We can still access the values using indexes. Provide the code to access the `breed` value of `sam` using the index.

In [5]:
# answer:
sam[1]

'Lab'