# Dictionaries

Python offers a number of ways of organizing data. We have already seen *lists*, which offer one way of organizing data in a sequence. We will now discuss *dictionaries*.

After lists, dictionaries are the most commonly used data structures in Python. You can think of a list `x` with $n$ items as a map (or function) from the indices $0, ..., n-1$ to arbitrary values (i.e., with the index $i$ mapping to the value `x[i]`). (Note that this map is not invertible: a value may be associated with many different indices, as with the list `["a", "a", "a"]` where the value `"a"` is associated with each of the indices $0,1,2$.)

A dictionary is also a (non-invertible) map, but of a much more general variety than a list. Instead of mapping consecutive integer indices to arbitrary values, dictionary maps (nearly) arbitrary *keys* to arbitrary values. For example, suppose we wanted to map countries to their population. We can define such a dictionary as follows:

In [22]:
population = {"China": 1439000000,
              "India": 1380000000,
              "United States": 330000000}

This dictionary has three items in it: the *keys* `"China"`, `"India"`, and `"United States"` (all strings), and their associated values, the approximate population of each country as an integer.

More generally, the Python syntax for defining a dictionary `d` with keys `k1, ..., kn` and corresponding values `v1, ..., vn` is
```python
d = {k1: v1, k2: v2, ..., kn: vn}
```
(with the ellipsis replaced with the each of the explicit key-value pairs). So we surround the key-value pairs with curly brackets `{}`, separate the key-value pairs with commas, and within each key-value pair we first give the key and then the value with a colon `:` in between. This syntax is similar to the syntax for defining a list `x` with values `v1, ..., vn`
```python
x = [v1, v2, ..., vn]
```
(with the ellipsis replaced with each of the explicit values). Of course, a list uses consecutive integer indices rather than arbitary keys, so we don't need the keys `k1, ..., kn` when we define the list.

To access the value associated with a key, we use a similar square bracket syntax as with a list, simply replacing the integer index with the key:

In [23]:
print(population["China"])

1439000000


We can also use similar syntax to *assign* dictionary values, whether or not the corresponding key is already present in the dictionary. If the key is not present it will be created, and if it is present, the corresponding value will be overwritten.

In [24]:
population["United States"] = 331000000 # Update the value associated with the key "United States"
population["Indonesia"] = 273000000 # Create a new key, "Indonesia", with specified value

So accessing individual items in a dictionary, or changing the value of individual items in a dictionary, uses extremely similar syntax to lists. We just replace the integer index with whatever keys we are using.

Recall that when you use the `in` operator with a list, you check if a particular item value is present in the list, i.e.
```python
("a" in ["a", "b", "c"]) == True
```
You can also use the `in` operator with a dictionary, but it now checks for the presence of a given *key*, rather than a given *value*.

In [25]:
print("China" in population) # True
print(population["China"] in population) # False

True
False


## Mutable and immutable types

We said above that keys to a dictionary can be *nearly* anything. The precise statement is that dictionary keys can be of any "immutable" Python type. An object in Python is *immutable* if (roughly speaking) nothing about it can be changed; otherwise it is *mutable*. Lists and dictionaries are mutable types, because you can change their entries. By contrast, floats, integers, and strings are immutable and so can be used as dictionary keys. (Note that you should almost never use floats as dictionary keys, for the same reason that you should almost never check if two floating point values are exactly equal: floating point arithmetic is inherently inaccurate.)

We saw last week that we can change individual entries of a list using square bracket notation, but cannot change individual characters of a string without constructing a new string:

In [26]:
x = ["a", "b", "c"]
x[1] = "B" # we can change individual entries of a list

word = "cat"
# word[0] = "h" # this will throw an error when you run it, since you can't modify a string

This difference between lists and strings illustrates that lists are mutable while strings are immutable. However, it might be a little more confusing that floats and integers are immutable. After all, you can always change the value of an integer variable:
```python
x = 1
x += 1 # we changed the value associated with the variable x, but this is not the same as changing the integer itself
```
But there is a difference between changing which value is stored in a particular variable, and changing the value itself. Consider the following contrast between the behavior of integers and lists:

In [13]:
x = 1
y = 1
z = y # this copies the value of y to z
z += 1 # replaces z with a new value. It does not affect y
print("x is " + str(x) + " and y is " + str(y) + " and z is " + str(z))

x = []
y = [] # x and y are lists with same values (none), but they do not refer to the same list
z = y # z and y refer to exactly the same list! z is *not* a copy of y, it is y itself
z.append(1) # we have modified z, and hence y, because z and y are exactly the same list
print("x is " + str(x) + " and y is " + str(y) + " and z is " + str(z))

x is 1 and y is 1 and z is 2
x is [] and y is [1] and z is [1]


When we change an integer variable, we effectively forget the previous integer value and replace it with a new integer value. By contrast, when we change a list, the same change occurs in all variables that point to the same list.

This distinction is subtle and takes some time to get used to. It is also a common source of errors in more complicated code - it can be very confusing to understand what is happening if you accidently construct multiple variables referring to the same list (or dictionary). On the other hand, it allows you to do amusing things like this:

In [14]:
x = []
x.append(x)
print(x[0][0][0][0][0])

[[...]]


## Numerical keys

You should avoid using `float`s as dictionary keys, for the same reason you should avoid checking for exact equality of floating point numbers: floating point arithmetic is inherently inaccurate, and the "same" number computed in two different ways might give you slightly different floating point values.

By contrast, integers make perfectly good dictionary keys. Why might you want to use integer keys in a dictionary, when you can already index a list with integer indices? One reason is that in a list, the indices always start from $0$ and continue consecutively. With a dictionary, you can use whatever keys you like.

In [15]:
emails_by_id = {} # create an empty dictionary, that we will use to associate email addresses with ID numbers
emails_by_id[100938] = "someone@brandeis.edu" # key 100938 gets an email address as its value...
emails_by_id[101076] = "someone.else@brandeis.edu" # ...the dictionary now contains 2 items...

## Essential dictionary operations

Recall that when you use the `in` operator with a list, you check if a particular item value is present in the list, i.e.
```python
"a" in ["a", "b", "c"] == True
```
You can also use the `in` operator with a dictionary, but it now checks for the presence of a given *key*, and **not** for a given *value*.

In [16]:
population = {"China": 1439000000,
              "India": 1380000000,
              "United States": 330000000}

print("China" in population) # True
print(1439000000 in population) # False

True
False


Just as with a list, you can iterate over a dictionary using a `for` loop. But although iteration over a list yields the *values* of items in the list, iteration over a dictionary yields the *keys*:

In [17]:
primes = [2,3,5,7]
for p in primes:
    # iterating over a list gives the values in that list
    print(p)

for country in population:
    # iterating over a dictionary gives the keys in that dictionary
    # you can then access the corresponding value using the square-bracket syntax
    print(country + " has a population of " + str(population[country]))

2
3
5
7
China has a population of 1439000000
India has a population of 1380000000
United States has a population of 330000000


It is nevertheless possible to iterate over the values stored in a dictionary. To do so, you call the `values` method of the dictionary:

In [18]:
total = 0 # total population of the countries in the dictionary
for pop in population.values():
    # pop is the population *value* of one of the countries in the dictionary
    total += pop # add the population of this country to the total
    
print("Total population of all countries in dictionary: " + str(total))

Total population of all countries in dictionary: 3149000000


There is also an `items` method for iterating over key/value pairs. You use it like this:

In [19]:
for country, pop in population.items():
    # country is the key (country name, a string)
    # pop is the value (population value, an integer)
    print(country + " has population " + str(pop))

China has population 1439000000
India has population 1380000000
United States has population 330000000


If you want to *remove* an item from a dictionary, you can use the `del` keyword and specify the key you want to delete using square-bracket notation. (This will give an error if the key is not present in the dictionary.)

In [20]:
print("China" in population)
del population["China"]
print("China" in population)

True
False


## Lists versus dictionaries

Dictionaries and lists can both be used to store an arbitrary number of arbitrary kinds of values in Python. They have many similarities:
- We use square-bracket notation for reading and writing to specific entries
- We use the `in` keyword to check if a particular entry exists in the structure
- We can use a `for` loop to iterate over all the entries
- They are mutable: you can change their entries

They also have important differences:
- Dictionaries can have any immutable type as their keys (including, e.g., strings and integers)
- Using the `in` keyword, or iterating with a `for` loop, accesses the values stored in a list, but accesses the keys stored in a dictionary

Another important difference is that the items in a list have a clear and unambiguous order, and when you iterate over a list using a `for` loop you will access the entries in that order. By contrast, even if it would in principle be possible to sort the keys of a dictionary is some way, you **cannot rely** on seeing the items in a dictionary in any particular order when you iterate over the dictionary using a `for` loop. When you access a dictionary using a `for` loop, you should not expect to see the items in any particular order.

If you aren't sure whether to use a list or a dictionary in a particular case, here are some tips:
- If you need to look up values that are associated with strings or other complex keys, use a dictionary
- If you have some values that appear in a particular order, and you want to be able to access the $i$th value in that order, use a list

There are also substantial differences in the efficiency of various operations on lists and dictionaries, and for sophisticated Python programmers these efficiency differences can be very important. But that is beyond the scope of this course.

## Quiz

Here are some exercises to check your understanding of the above material. This quiz will not be graded and does not need to be turned in, but might be a useful way for you to review. The first three questions use the following dictionary.

In [21]:
median_household_income = {"Massachusetts": 79835,
                           "New York": 67844,
                           "California": 75277,
                           "New Jersey": 81740}

How would you look up the median household income in California and print it, using the dictionary?

How would you print all the names of all the states that appear in the dictionary?

Mississippi has a media household income of \$44,717. How can you add it to the dictionary?

How would you add up the total area in square feet of all rooms in the house represented by the following dictionary?

In [None]:
# keys are room names, values are square footage
rooms = {"kitchen": 300, "bedroom 1": 300, "bedroom 2": 200, "living room": 400, "bathroom": 80}

## Optional material for advanced students

We now discuss two convenience methods for common tasks involving dictionaries, and an additional Python data structure, the `tuple`.

A very common pattern when using a dictionary is that you want to get the value associated with a particular key in the dictionary *if it exists*. You can use the `get` method to return either the value associated with a particular key, or a default value if the key is not present. The syntax is
```python
    result = dictionary.get(key_name, default_value)
```
and it is equivalent in effect to
```python
    if key_name in dictionary:
        result = dictionary[key_name]
    else:
        result = default_value
```

In [None]:
cities = {"Massachusetts": ["Boston", "Waltham", "Amherst"], "Colorado": ["Denver"]}

print(cities.get("Massachusetts", [])) # "Massachusetts" is already present in the dictionary
print(cities.get("Rhode Island", [])) # "Rhode Island" is not present, but we specified a sensible default value (empty list)

In a similar vein, the `setdefault` method assigns a value to a dictionary key, as long as the key is not already present in the dictionary; and in both cases, return the value associated with the key. The syntax is
```python
    result = dictionary.setdefault(key_name, default_value)
```
and it is equivalent in effect to
```python
    if key_name not in dictionary:
        dictionary[key_name] = default_value
    result = dictionary[key_name]
```

In [None]:
RI_cities = cities.setdefault("Rhode Island", []) # "Rhode Island" wasn't already in the dictionary, but now it is
RI_cities.append("Providence")
print(cities["Rhode Island"])
MA_cities = cities.setdefault("Massachusetts", []) # "Massachusetts" is already in the dictionary, so it's returned unchanged
MA_cities.append("Salem")
print(cities["Massachusetts"])

### Tuples

*Tuples* are Python data structures that behave almost identically to lists, with two differences:
1. They are immutable
2. The are specified using parentheses `()` instead of square brackets `[]`

In [None]:
full_name = ("Carl", "Friedrich", "Gauss") # define a tuple with three items, all strings
fore_name = full_name[0] # "Carl" - indexing works identically as with lists

Since they are immutable, tuples can be used as keys in dictionaries. For example, we could use a tuple of three integers to describe a date, and use that as a dictionary key. (In practice, there are other Python data structures specifically for working with dates, but it's a reasonably illustrative example.)

In [None]:
calendar = {} # empty dictionary
calendar[(1958, 10, 1)] = "NASA begins operations" # add some items
calendar[(1961, 5, 5)] = "Second human in space"
calendar[(1961, 5, 5)] = "First American in space" # change the value associated with a given key
print(calendar)

There is a small quirk of notation one should be aware of with tuples. Parenthesis are used to clarify the order of operations of arithmetic, as well as define tuples. As a result, **tuples with a single item are specified with an extra comma**.

In [None]:
print(5*(1+2)) # A single value in parenthesis, with no comma, is not a tuple
print(5*(1+2,)) # Add a comma to the end, and it becomes a tuple with one element (which is then quintupled)

A common use of tuples is to return multiple values from a function. If you specify multiple comma-separated values in a `return` statement, a tuple of the values is created automatically.

In [None]:
def longest(strings):
    """Returns the longest string in the list strings, along with its length"""
    longest = None
    length = -1
    for s in strings:
        if len(s) > length:
            length = len(s)
            longest = s
    return longest, length # implicitly creates a tuple with two items

result = longest(["red", "orange", "blue"])
print(result)

In situations such as a function that returns multiple values, one typically wants to use each returned value in a separate way. For example, above we have a function that returns both a string and an integer. In such cases, *tuple unpacking* is helpful: this is a shortcut for assigning the items in a tuple to separate variables, without needing to explicitly give the indices. We give an example below:

In [None]:
# Here's one way of getting the individual items in a tuple
longest_string = result[0]
length = result[1]
# Here's a more convenient way: tuple unpacking
longest_string, length = result # same effect as the two lines above
# We can also achieve the same effect when directly calling the function
# (There's no need for the intermediate variable "result")
longest_string, length = longest(["brown", "taupe", "beige"])

Actually, we already saw an example of tuple unpacking in the code above. The `items` method of a dictionary yields tuples of `(key, value)` pairs in the dictionary, and we unpacked them in our iteration example using the `items` method.

In [None]:
zoo = {"gorillas": 12, "pandas": 3, "zebras": 4}

for animal, count in zoo.items():
    # zoo.items() yields key/value pairs
    # we directly unpack them in the `for` statement, into the animal key and count value
    print("The zoo has {} {}.".format(count, animal))

In general, to unpack a tuple, just give a comma-separated list of variable names on the left-hand side of the assignment operator `=`, and the tuple to be unpacked on the right-hand side --- and ensure that the number of variables on the left matches the length of the tuple on the right. (Lists, and indeed any iterable, can be unpacked in exactly the same way.)

To create a tuple from a list, or other iterable, use the `tuple` constructor.

In [None]:
tuple(["yellow", "green", "violet"])