# Dictionaries

## *"If a word in the dictionary were misspelled, how would we know?"*
*(–Noah Webster)*

## But first: One more data structure, Tuples

- are similar to lists (**ordered**, indexed), but they are **immutable**, i.e., once they are initialized, they can not be changed. 
- have no `append()` or `delete()`.
- are enclosed by round brackets `()`, and can otherwise be accessed like lists, e.g. with slicing.

In [4]:
# x = (1,2,3)
# ordered and indexed
print(x[:2])
# immutable
# x.append(4)
x[1] = 5

(1, 2)


TypeError: 'tuple' object does not support item assignment

## Numbered lists

Many functions return tuples, for example `enumerate()`:

In [5]:
names = ["Lana Kane", "Pam Poovey", "Sterling Archer", "Algernop Krieger", "Cheryl/Karol/Crystal Tunt"]

print(list(enumerate(names)))

[(0, 'Lana Kane'), (1, 'Pam Poovey'), (2, 'Sterling Archer'), (3, 'Algernop Krieger'), (4, 'Cheryl/Karol/Crystal Tunt')]


## The `zip()` function

We can *zip* together two or more lists to create a list of tuples. The result has the length of the shortest zipped list, other items are ignored:

In [8]:
skills = ['guns', 'puns', 'drinks']
ages = [32, 34, 35, 45, 28, 99]
zippy = list(zip(names, ages, skills))
print(zippy)
print("Length: names={}, ages={}, zippy={}".format(len(names), len(ages), len(zippy)))

[('Lana Kane', 32, 'guns'), ('Pam Poovey', 34, 'puns'), ('Sterling Archer', 35, 'drinks')]
Length: names=5, ages=6, zippy=3


## Unzipping

In order to separate a list of tuples into several lists, you can use `zip(*)`:

In [17]:
status = [("head", "hurts"), 
          ('feet', 'sore'), 
          ('arms', 'spaghetti')]
elements = list(zip(*status))
parts = elements[0]
labor = elements[1]
print(parts)
print(list(labor))

('head', 'feet', 'arms')
['hurts', 'sore', 'spaghetti']


## Activity

* create a list `month_numbers` ranging from 1 to 12 (including both)
* create a list `month_names` with the names of the months
* create a list of tuples, `months`, that matches numbers to names, in reversed order
* enumerate `months`

In [36]:
# your code here
month_numbers = list(range(1,13))
# month_names = list('JFMAMJJASOND')
# month_names = 'Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec'.split()
month_names = ['Jan',
               'Feb',
               'Mar',
               'Apr',
               'May',
               'Jun',
               'Jul',
               'Aug',
               'Sep',
               'Oct',
               'Nov',
               'Dec']
month_names.reverse()
months = list(zip(month_numbers, month_names))
print(list(enumerate(months)))
print(month_names.reverse())

[(0, (1, 'Dec')), (1, (2, 'Nov')), (2, (3, 'Oct')), (3, (4, 'Sep')), (4, (5, 'Aug')), (5, (6, 'Jul')), (6, (7, 'Jun')), (7, (8, 'May')), (8, (9, 'Apr')), (9, (10, 'Mar')), (10, (11, 'Feb')), (11, (12, 'Jan'))]
None


# Dictionaries

Dictionaries associate keys with values. In other languages they are also known as associative arrays, hash tables, or hash maps.

They are named after ordinary paper dictionaries, because they work analogously. A key (the word you want to look up) is associated with a value (the definition of a word).

Dictionaries are related to sets and list, which we have seen last lecture.

## From tuples to dictionaries

At a high level, dictionaries associate a unique **key** with a specific **value**. One way to look at dictionaries is therefore as a list of `(key, value)` tuples.

In fact, this is one way to **initialize** a dictionary, using the `dict` type:

In [37]:
long_list_of_stuff_kajs_done = [('Eaten a squirrel', 1975), 
                                ('Eaten a squirrel', 1980), 
                                ('Gone fishing', 1985), 
                                ('Married rich', 1987), 
                                ('Became a skilled mason', 1990), 
                                ('Learned Python 1.0', 1994), 
                                ('Found God', 1994)]

dict_of_kajs_achievments = dict(long_list_of_stuff_kajs_done)
print(dict_of_kajs_achievments)

{'Eaten a squirrel': 1980, 'Gone fishing': 1985, 'Married rich': 1987, 'Became a skilled mason': 1990, 'Learned Python 1.0': 1994, 'Found God': 1994}


A dictionary in Python uses a colon `:` to map a **key** to a **value**.

You might notice that the dictionary is enclosed by curly bracktes, which is the same as for sets.

Why do you think that is? Hint: look at Kaj's history of eating squirrels.

### Initialization

Apart from the method above, we can initialize dictionaries in two ways:


In [44]:
empty_dict = dict()
print(empty_dict)

{}


In [48]:
squares = {1: 1,
           2: 4,
           3: 9,
           4: 16}
print(squares)
prices = {"eggs": 2.5, "milk": 1.2}
print(prices)

grades = {('John Smith', 235234): 31,
          ('John Smith', 345984): 15,
         }
print(grades)

{1: 1, 2: 4, 3: 9, 4: 16}
{'eggs': 2.5, 'milk': 1.2}
{('John Smith', 235234): 31, ('John Smith', 345984): 15}


NOTE: keys can be almost anything, but **not lists** (because they can be changed after their creation).

## Activity

* Turn the  two lists of people and their pets into a dictionary called `pet_lookup`.
* How many entries does it have? Why?

In [51]:
people = ['Babette', 'Karen', 'Janne', 'Linda', 'Janne', 'Linda']
pets = ['dog', 'cat', 'dog', 'dog', 'ozelot', 'anaconda']
# your code here
pet_lookup = dict(zip(people, pets))
pet_lookup

{'Babette': 'dog', 'Karen': 'cat', 'Janne': 'ozelot', 'Linda': 'anaconda'}

## Dictionary operations

### Retrieving a value
Accessing a value works similar as indexing in lists, but instead of the index (`int`), we use the key (i.e., almost anything).

In [53]:
print(prices)
print(prices["eggs"])

{'eggs': 2.5, 'milk': 1.2}
2.5


This fails if the key does not exist

In [54]:
print(prices["honey"])

KeyError: 'honey'

A safer way to retrieve values is the `get()` method, which returns `None` for missing values:

In [56]:
print(prices)
print(prices.get("eggs"))
print(prices.get("honey"))

{'eggs': 2.5, 'milk': 1.2}
2.5
None


We can even define a **default value** for missing items:

In [60]:
print(prices)
print(prices.get("eggs", 0.0))
print(prices.get("honey", "we don't have that"))

{'eggs': 2.5, 'milk': 1.2}
2.5
we don't have that


## Setting values

In [62]:
print(prices)
prices["butter"] = 3.1 # add a new entry
print(prices)
prices["eggs"] = 2.0 # change existing entry
print(prices)

{'eggs': 2.0, 'milk': 1.2, 'butter': 3.1}
{'eggs': 2.0, 'milk': 1.2, 'butter': 3.1}
{'eggs': 2.0, 'milk': 1.2, 'butter': 3.1}


## Merging dictionaries
We can combine dictionaries with `update()`

In [63]:
translations_de_it = {'blau': 'azzurro', 
                      'gelb': 'giallo', 
                      'rot': 'rosso', 
                      'braun': 'marrone'
                     }
translations_de_it_food = {'Pizza': 'pizza', 
                           'Nudeln': 'pasta',
                           'Kaffee': 'caffè',
                           'Espresso': 'caffè',
                           'rot': 'rossa'
                          }

translations_de_it.update(translations_de_it_food)
print(translations_de_it)

{'blau': 'azzurro', 'gelb': 'giallo', 'rot': 'rossa', 'braun': 'marrone', 'Pizza': 'pizza', 'Nudeln': 'pasta', 'Kaffee': 'caffè', 'Espresso': 'caffè'}


## Activity

* change `pet_lookup` to give `Karen` a pony
* create a list `owners` with 3 new names, and `moar_pets` with 3 pets and make them into a dictionary `pet_lookup2`
* add the entries from `pet_lookup2` to `pet_lookup`

In [69]:
# your code here
print(pet_lookup)
pet_lookup['Karen'] = 'pony'
print(pet_lookup)
owners = 'Daniel Rune Anna'.split()
moar_pets = 'chinchilla rat unicorn'.split()
pet_lookup2 = dict(zip(owners, moar_pets))
print(pet_lookup2)
pet_lookup.update(pet_lookup2)
print(pet_lookup)

{'Babette': 'dog', 'Karen': 'pony', 'Janne': 'ozelot', 'Linda': 'anaconda'}
{'Babette': 'dog', 'Karen': 'pony', 'Janne': 'ozelot', 'Linda': 'anaconda'}
{'Daniel': 'chinchilla', 'Rune': 'rat', 'Anna': 'unicorn'}
{'Babette': 'dog', 'Karen': 'pony', 'Janne': 'ozelot', 'Linda': 'anaconda', 'Daniel': 'chinchilla', 'Rune': 'rat', 'Anna': 'unicorn'}


## Membership
To check whether an element is in the dictionary, use `in`:

In [70]:
print("cereal" in prices)

False


## Removing values

In [72]:
print(prices)
del(prices['butter'])
print(prices)

{'eggs': 2.0, 'milk': 1.2}


KeyError: 'butter'

## Cleaning up
If we want to remove all items from a dictionary, we can use `clear()`

In [76]:
print(len(translations_de_it))
translations_de_it.clear()
print('after clearing:', len(translations_de_it))
print('hello')

0
after clearing: 0
hello


## Retrieving keys and values

We can retrieve the keys, values, and combinations of the two separately

In [77]:
print(prices.keys())

dict_keys(['eggs', 'milk'])


In [78]:
print(prices.values())

dict_values([2.0, 1.2])


In [81]:
print(prices.items())

dict_items([('eggs', 2.0), ('milk', 1.2)])


## Operations on dictionaries

Similar to lists, we can call a variety of functions on dictionaries, like
* `sorted()`
* `len()`
* `enumerate()`

In [83]:
print(sorted(prices))
print(len(prices))
print(list(enumerate(prices)))

['eggs', 'milk']
2
[(0, 'eggs'), (1, 'milk')]


## Activity

* anonymize the `employee` dictionary by enumerating the values and creating a new dictionary `anonymous_employees`

In [None]:
# your code here


# Special Dictionaries

Python has two special dictionary types, that serve very special purposes. However, they are in a separate library, that we need to `import` them from first:

In [86]:
from collections import defaultdict, Counter

## `defaultdict`

`defaultdict` solves some of the problems we have adressed with `get()`: if a key is not in the dictionary, they do two things:
1. they automatically add the key with the default value to the dictionary
2. they return the default value or type

We need to specify the type of default type when we assign the `defaultdict`:
* `int` returns `0`
* `float` returns `0.0`
* `list` returns `[]`
* `set` returns `set()`
* `bool` returns `False`
* `str` returns `''`

In [89]:
word_counts = defaultdict(int)
print(word_counts)
word_counts['platypus'] += 1
word_counts['platypus'] = word_counts['platypus'] + 1
print(word_counts['dingo'])
print(word_counts)

defaultdict(<class 'int'>, {})
0
defaultdict(<class 'int'>, {'platypus': 2, 'dingo': 0})


In [92]:
achievements = defaultdict(set)
achievements['Lana'].add('driving course')
achievements['Lana'].add('snorkeling course')
print(achievements)
print(achievements['Cyrill'])
print(achievements)

defaultdict(<class 'set'>, {'Lana': {'driving course', 'snorkeling course'}})
set()
defaultdict(<class 'set'>, {'Lana': {'driving course', 'snorkeling course'}, 'Cyrill': set()})


## `Counter`

`Counter` is a specialized dictionary just for integer counts, which is very handy. Their input is usually a list

In [94]:
ages = [86, 21, 28, 71, 83, 79, 41, 69, 58, 30, 79, 43, 77, 70, 79, 30, 79, 68, 56, 46, 73, 66, 54, 47, 75, 57, 65, 27, 19, 84, 56, 39, 78, 73, 49, 44, 86, 61, 74, 49, 62, 52, 61, 59, 74, 73, 58, 55, 56, 80, 57, 62, 19, 42, 49, 45, 22, 37, 42, 32, 28, 67, 65, 78, 53, 42, 49, 63, 55, 29, 57, 75, 27, 42, 84, 71, 83, 66, 20, 54, 71, 32, 24, 22, 64, 60, 45, 18, 37, 19, 31, 65, 65, 39, 74, 64, 66, 27, 42, 83]

histogram1 = Counter()
histogram1.update(ages)
# alternative syntax
histogram2 = Counter(ages)

print(histogram1)
print(histogram2)

Counter({42: 5, 79: 4, 65: 4, 49: 4, 71: 3, 83: 3, 56: 3, 73: 3, 66: 3, 57: 3, 27: 3, 19: 3, 74: 3, 86: 2, 28: 2, 58: 2, 30: 2, 54: 2, 75: 2, 84: 2, 39: 2, 78: 2, 61: 2, 62: 2, 55: 2, 45: 2, 22: 2, 37: 2, 32: 2, 64: 2, 21: 1, 41: 1, 69: 1, 43: 1, 77: 1, 70: 1, 68: 1, 46: 1, 47: 1, 44: 1, 52: 1, 59: 1, 80: 1, 67: 1, 53: 1, 63: 1, 29: 1, 20: 1, 24: 1, 60: 1, 18: 1, 31: 1})
Counter({42: 5, 79: 4, 65: 4, 49: 4, 71: 3, 83: 3, 56: 3, 73: 3, 66: 3, 57: 3, 27: 3, 19: 3, 74: 3, 86: 2, 28: 2, 58: 2, 30: 2, 54: 2, 75: 2, 84: 2, 39: 2, 78: 2, 61: 2, 62: 2, 55: 2, 45: 2, 22: 2, 37: 2, 32: 2, 64: 2, 21: 1, 41: 1, 69: 1, 43: 1, 77: 1, 70: 1, 68: 1, 46: 1, 47: 1, 44: 1, 52: 1, 59: 1, 80: 1, 67: 1, 53: 1, 63: 1, 29: 1, 20: 1, 24: 1, 60: 1, 18: 1, 31: 1})


Apart from the regular dictionary functions, `Counter` has two useful methods.

* `most_common()` shows you the `n` most frequent keys and their counts

In [96]:
print(histogram1.most_common())

[(42, 5), (79, 4), (65, 4), (49, 4), (71, 3), (83, 3), (56, 3), (73, 3), (66, 3), (57, 3), (27, 3), (19, 3), (74, 3), (86, 2), (28, 2), (58, 2), (30, 2), (54, 2), (75, 2), (84, 2), (39, 2), (78, 2), (61, 2), (62, 2), (55, 2), (45, 2), (22, 2), (37, 2), (32, 2), (64, 2), (21, 1), (41, 1), (69, 1), (43, 1), (77, 1), (70, 1), (68, 1), (46, 1), (47, 1), (44, 1), (52, 1), (59, 1), (80, 1), (67, 1), (53, 1), (63, 1), (29, 1), (20, 1), (24, 1), (60, 1), (18, 1), (31, 1)]


* Another function, `subtract()`, allows us to reduce the counts of one or more keys:

In [98]:
histogram1.subtract([42, 49, 65, 65, 65])
print(histogram1.most_common(3))

[(79, 4), (71, 3), (83, 3)]


You can convert `dict` or `Counter` objects into `defaultdict`s by using `update()`

In [None]:
default_counts = defaultdict(int)
default_counts.update(histogram1)
print(default_counts)

## Activity

* Get the 20 most frequent counts from `histogram1` and store them in a new `dict` called `top20`

In [None]:
# your code here


# Multi-level dictionaries

If the values of a dictionary are dictionaries themselves, we have a multi-level dictionary. This can be helpful for complex lookups

In [99]:
employees = {'Algernop Krieger': {'age': 45, 'skill': "'science'"},
             'Cheryl/Karol/Crystal Tunt': {'age': 28, 'skill': 'supervision'},
             'Lana Kane': {'age': 32, 'skill': 'shooting'},
             'Pam Poovey': {'age': 34, 'skill': 'mixed martial arts'},
             'Sterling Archer': {'age': 35, 'skill': 'drinking'}
            }
first_level = employees['Pam Poovey']
type(first_level['age'])

int

## Activity

* add entries for `Ray Gilette (43, planes)`, `Cyril Figgis (44, numbers)` and `Mallory Archer (58, childcare)`

In [None]:
# your code here


## Activity

* Separate the `employees` dictionary into two lists, `employee_name`, which contains only the names, and `employee_properties`, which contains only the dictionaries with `age` and `skill` as keys.

In [None]:
# your code here