# 7. Dictionaries

Here is the table of contents for this notebook:

- 7.1 Introduction
- 7.2 Dictionary as a set of counters
- 7.3 Looping and dictionaries
- 7.4 Debugging
- 7.5 Exercises

## 7.1 Introduction

A _dictionary_ is like a list, but more general. In a list, the index positions have to be integers; in a dictionary, the indices can be (almost) any type.

You can think of a dictionary as a mapping between a set of indices (which are called _keys_) and a set of values. Each key maps to a value. The association of a key and a value is called a _key-value pair_ or sometimes an _item_.

As an example, we’ll build a dictionary that maps from English to Spanish words, so the keys and the values are all strings.

The function `dict` creates a new dictionary with no items. Because `dict` is the name of a built-in function, you should avoid using it as a variable name.

In [1]:
eng2sp = dict()
print(eng2sp)

{}


The curly brackets, `{}`, represent an empty dictionary. To add items to the dictionary, you can use square brackets:

In [2]:
eng2sp['one'] = 'uno'

This line creates an item that maps from the key `'one'` to the value “uno”. If we print the dictionary again, we see a key-value pair with a colon between the key and value:

In [3]:
print(eng2sp)

{'one': 'uno'}


This output format is also an input format. For example, you can create a new dictionary with three items.

In [4]:
eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
eng2sp

{'one': 'uno', 'two': 'dos', 'three': 'tres'}

❗Note❗

Since Python 3.7x the order of key-value pairs is the same as their input order, i.e. dictionaries are now ordered structures.

Elements of a dictionary are never indexed with integer indices. Instead, you use the keys to look up the corresponding values:

In [5]:
print(eng2sp['two'])

dos


The key `'two'` always maps to the value “dos”.

If the key isn’t in the dictionary, you get an exception:

In [6]:
print(eng2sp['four'])

KeyError: 'four'

The `len` function works on dictionaries; it returns the number of key-value pairs:

In [7]:
len(eng2sp)

3

The `in` operator works on dictionaries; it tells you whether something appears as a _key_ in the dictionary (appearing as a value is not good enough).

In [8]:
'one' in eng2sp

True

In [9]:
'uno' in eng2sp

False

To see whether something appears as a value in a dictionary, you can use the method `values`, which returns the values as a type that can be converted to a list, and then use the `in` operator:

In [12]:
vals = list(eng2sp.values())
'uno' in vals

True

The `in` operator uses different algorithms for lists and dictionaries. For lists, it uses a linear search algorithm. As the list gets longer, the search time gets longer in direct proportion to the length of the list. For dictionaries, Python uses an algorithm called a _hash table_ that has a remarkable property: the `in` operator takes about the same amount of time no matter how many items there are in a dictionary. I won’t explain why hash functions are so magical, but you can read more about it at: https://en.wikipedia.org/wiki/Hash_table

**Exercise 7.1**

Create a dictionary called `roman` where keys are the Roman numerals 'I', 'II', 'III', 'IV', 'V' and values are the corresponding integers.

In [15]:
# YOUR CODE HERE
roman = dict()
roman = {'I': 1, 'II': 2, 'III': 3, 'IV': 4, 'V': 5}
roman['III']

3

**Exercise 7.2**

Write a function called `country_dictionary_maker` that takes a list of strings in the following format:

`['country1 number1', 'country2 number2']`

and returns a dictionary in the following format:

`{'country1': 'number1', 'country2': 'number2'}`

Example:

country_dictionary_maker(['Germany 20', 'Netherlands 80', 'Belgium 50']) -> {'Germany': '20', 'Netherlands': '80', 'Belgium': '50'}

In [3]:
def country_dictionary_maker(listussy):
    new_dict = dict()
    for i in listussy:
        country, number = i.split()
        item = {country: number}
        new_dict.update(item)
    return new_dict

In [4]:
country_dictionary_maker(['Germany 20', 'Netherlands 80', 'Belgium 50'])

{'Germany': '20', 'Netherlands': '80', 'Belgium': '50'}

## 7.2 Dictionary as a set of counters


Suppose you are given a string and you want to count how many times each letter appears. There are several ways you could do it:

1. You could create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional.

2. You could create a list with 26 zeros, loop through the string and write 26 conditional statements.

```
letter_counts = [0, 0, 0 ... 0]
for letter in string:
    if letter == 'a':
        letter_counts[0] += 1
    elif letter == 'b':
        letter_counts[1] += 1
    ...
```

which is insane.

3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item.

Each of these options performs the same computation, but each of them implements that computation in a different way.

An _implementation_ is a way of performing a computation; some implementations are better than others. For example, an advantage of the dictionary implementation is that we don’t have to know ahead of time which letters appear in the string and we only have to make room for the letters that do appear.

Here is what the code might look like:

In [29]:
word = 'brontosaurus'
d = dict()
for c in word:
    if c not in d:
        d[c] = 1
    else:
        d[c] = d[c] + 1

We are effectively computing a _histogram_, which is a statistical term for a set of counters (or frequencies).

The `for` loop traverses the string. Each time through the loop, if the character `c` is not in the dictionary, we create a new item with key `c` and the initial value 1 (since we have seen this letter once). If `c` is already in the dictionary we increment `d[c]`.

Here’s the output of the program:

In [30]:
print(d)

{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}


The histogram indicates that the letters “a” and “b” appear once; “o” appears twice, and so on.

Dictionaries have a method called `get` that takes a key and a default value. If the key appears in the dictionary, `get` returns the corresponding value; otherwise it returns the default value. For example:

In [32]:
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}

In [33]:
print(counts.get('jan', 0))

100


In [34]:
print(counts.get('tim', 0))

0


We can use `get` to write our histogram loop more concisely. Because the `get` method automatically handles the case where a key is not in a dictionary, we can reduce four lines down to one and eliminate the `if` statement.

In [37]:
word = 'brontosaurus'
d = dict()
for c in word:
    d[c] = d.get(c, 0) + 1
print(d)

{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}


The use of the `get` method to simplify this counting loop ends up being a very commonly used “idiom” in Python and you can use this pattern when creating dictionaries for counting things. So you should take a moment and compare the loop using the `if` statement and `in` operator with the loop using the `get` method. They do exactly the same thing, but one is more succinct.

**Exercise 7.3**

Write a function that counts how many times each number appears in a given list.

Example input: [1, 2, 3, 3, 0, -5, -5, -5]

Example output: {1: 1, 2: 1, 3: 2, 0: 1, -5: 3}

In [42]:
# YOUR CODE HERE
def counter_funcion(arr):
    new_dict = dict()
    for element in arr:
        new_dict[element] = new_dict.get(element, 0) + 1
    print(new_dict)

In [43]:
counter_funcion([1, 2, 3, 3, 0, -5, -5, -5])

{1: 1, 2: 1, 3: 2, 0: 1, -5: 3}


## 7.3 Looping and dictionaries

If you use a dictionary as the sequence in a `for` statement, it traverses the keys of the dictionary. This loop prints each key and the corresponding value:

In [44]:
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    print(key, counts[key])

chuck 1
annie 42
jan 100


We can use this pattern to implement the various loop idioms that we have described earlier. For example if we wanted to find all the entries in a dictionary with a value above ten, we could write the following code:

In [45]:
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    if counts[key] > 10 :
        print(key, counts[key])

annie 42
jan 100


We see only the entries with a value above 10.

The `for` loop iterates through the keys of the dictionary, so we must use the index operator to retrieve the corresponding value for each key.

If you want to print the keys in alphabetical order, you first make a list of the `keys` in the dictionary using the keys method available in dictionary objects, and then sort that list and loop through the sorted list, looking up each key and printing out key-value pairs in sorted order as follows:

In [46]:
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
lst = list(counts.keys())
print(lst)
lst.sort()
for key in lst:
    print(key, counts[key])

['chuck', 'annie', 'jan']
annie 42
chuck 1
jan 100


First you see the list of keys in non-alphabetical order that we get from the keys method. Then we see the key-value pairs in alphabetical order from the for loop.

## 7.4 Debugging

As you work with bigger datasets it can become unwieldy to debug by printing and checking data by hand. Here are some suggestions for debugging large datasets:

**Scale down the input**

If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first `n` lines.

If there is an error, you can reduce `n` to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.

**Check summaries and types**

Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.

A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.

**Write self-checks**

Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a “sanity check” because it detects results that are “completely illogical”.

Another kind of check compares the results of two different computations to see if they are consistent. This is called a “consistency check”.

**Pretty print the output**

Formatting debugging output can make it easier to spot an error.
Again, time you spend building scaffolding can reduce the time you spend debugging.

## 7.5 Exercises

You have given the following dataset, which contains monthly average temperatures for 3 Dutch cities.

|City|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|
|--|--|--|--|--|--|--|--|--|--|--|--|--|
|Amsterdam|4.8|4.71|7.06|10.45|13.49|16.71|18.85|19.23|16.54|13.04|8.94|6.46|
|Breda|3.8|4.5|8.22|12.92|16.14|19.73|21.72|21.33|17.51|13.06|8.25|5.21|
|Eindhoven|3.6|4.27|7.8|12.26|15.31|18.72|20.61|20.24|16.62|12.39|7.83|4.94|

The data is stored in a list as follows:

In [None]:
data = [
    'Amsterdam,4.8,4.71,7.06,10.45,13.49,16.71,18.85,19.23,16.54,13.04,8.94,6.46',
    'Breda,3.8,4.5,8.22,12.92,16.14,19.73,21.72,21.33,17.51,13.06,8.25,5.21',
    'Eindhoven,3.6,4.27,7.8,12.26,15.31,18.72,20.61,20.24,16.62,12.39,7.83,4.94'
    ]

data

****Exercise 7.4****

Write a function called `list_to_dict` that converts `data` into a dictionary called `data_dict` where keys are the cities and values are lists containing the temperatures.


{
'Amsterdam': ['4.8', '4.71', '7.06', '10.45', '13.49', '16.71', '18.85', '19.23', '16.54', '13.04', '8.94', '6.46'], 

'Breda': ['3.8', '4.5', '8.22', '12.92', '16.14', '19.73', '21.72', '21.33', '17.51', '13.06', '8.25', '5.21'],

'Eindhoven': ['3.6', '4.27', '7.8', '12.26', '15.31', '18.72', '20.61', '20.24', '16.62', '12.39', '7.83', '4.94']
}


The function should work with any number of cities.

`data_dict = list_to_dict(data)`

In [13]:
# YOUR CODE HERE
def list_to_dict(data):
    data_dict = dict()
    temperatures = []
    city = []
    for element in data:
        #city, *temperatures = variable.split(',')
        #data_dict[city] = temperatures
        data = str(data)
        data = data.split(",")
        for var in data:
            if type(var) == int:
                temperaturesvar
    return temperatures
list_to_dict([
    'Amsterdam,4.8,4.71,7.06,10.45,13.49,16.71,18.85,19.23,16.54,13.04,8.94,6.46',
    'Breda,3.8,4.5,8.22,12.92,16.14,19.73,21.72,21.33,17.51,13.06,8.25,5.21',
    'Eindhoven,3.6,4.27,7.8,12.26,15.31,18.72,20.61,20.24,16.62,12.39,7.83,4.94'
    ])

SyntaxError: invalid syntax (1995696972.py, line 13)

**Exercise 7.5**

Write a function called `mean_t_city` that accepts `city_name` as a string and `data_dict` as parameters, and returns the average yearly temperature for that city.

- mean_t_city('Amsterdam', data_dict) -> 11.69
- mean_t_city('Eindhoven', data_dict) -> 12.05
- mean_t_city('Breda', data_dict) -> 12.70

In [10]:
# YOUR CODE HERE
def mean_t_city(city_name, data_dict):
    total = 0
    for i in data_dict[city_name]:
        total += float(i)
        avg = total / len(data_dict[city_name])
    return avg

In [11]:
mean_t_city('Amsterdam', {'Amsterdam': ['4.8', '4.71', '7.06', '10.45', '13.49', '16.71', '18.85', '19.23', '16.54', '13.04', '8.94', '6.46']})

11.69

**Exercise 7.6**

Write a function called `mean_t_month` that accepts `month_name` as a string and `data_dict` as parameters, and returns the average yearly temperature for that city.

- mean_t_month('Jul', data_dict) -> 20.39
- mean_t_month('Feb', data_dict) -> 4.49

In [6]:
# YOUR CODE HERE
import calendar
def mean_t_month(month_name, data_dict):
    months = ["Jan", "Feb", 'Mar', 'May', 'Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', "Dec"]
    months = list(calendar.month_abbr)
    month_index = months.index(month_name)
    t = []
    for city_name in data_dict:
        temps = data_dict[city_name]
        t.append(float(temps[month_index]))
    return sum(t)/len(t)

In [12]:
mean_t_month('Jul', {'Amsterdam': ['4.8', '4.71', '7.06', '10.45', '13.49', '16.71', '18.85', '19.23', '16.54', '13.04', '8.94', '6.46']})

19.23

**Exercise 7.7**

Write a function called `coldest` that finds the coldest city by average yearly temperature, given `data_dict`. Use `mean_t_city` in the function `coldest`.

coldest(data_dict) -> 'Amsterdam'

In [73]:
# YOUR CODE HERE
def coldest(data_dict):
    avg = mean_t_city()
    for city in data_dict:
        