# NB10: Dictionaries

## Programming Fundamentals

## L.EIC/2023-24

#### João Correia Lopes$^{1}$, Pedro Vasconcelos$^{2}$
$^{1}$FEUP/DEI & INESC TEC\
$^{2}$FCUP/DCC & LIACC

> “Programs are meant to be read by humans and only incidentally for computers to execute.”

Donald Knuth

## Goals

By the end of this class, the student should be able to:

- Use the main operations and methods available to work with dictionaries
- Describe the differences between dictionary aliasing and shallow copying


## Bibliography

- Peter Wentworth, Jeffrey Elkner, Allen B. Downey, and Chris Meyers, *How to Think Like a Computer Scientist — Learning with Python 3* (Section 5.4) [[PDF](https://media.readthedocs.org/pdf/howtothink/latest/howtothink.pdf)]
[[HTML](http://openbookproject.net/thinkcs/python/english3e/)]

- Brad Miller and David Ranum, *Learning with Python: Interactive Edition*. Based on material by Jeffrey Elkner, Allen B. Downey, and Chris Meyers (Chapter 12) [[HTML](https://runestone.academy/runestone/books/published/thinkcspy/index.html)]


# 10 Data type: Dictionaries

### A compound data type (recap)

- So far we have seen built-in types like `int`, `float`, `bool`, `str` and also lists, pairs and tuples.

- Strings, lists, and tuples are qualitatively different from the others because they are made up of smaller pieces.

- Lists, tuples, and strings are called *sequences*, because their items occur in a fixed order.

- All of those use integers as indices to access the values they contain within them.

## 10.1 Dictionaries

- Dictionaries are yet another kind of compound type.

- They are Python’s built-in **mapping type**.

- They map **keys**, which can be any *immutable type*, to **values**, which can be any type (heterogeneous).<sup>1</sup>
  - This means that tuples can be used as keys, but not lists.

- In other languages, they are called *associative arrays* since they associate a key with a value.

<sup>1</sup>
Just like the elements of a list or tuple

### Creating dictionaries

- One way to create a dictionary is to start with the empty dictionary and add **key:value** pairs.

```python
    >>> english_spanish = {}
    >>> english_spanish['one'] = "uno"
    >>> english_spanish["two"] = 'dos'
    >>> print(english_spanish)
    {'one': 'uno', 'two': 'dos'}
```

Let's try it here:

In [None]:
english_portuguese = {}
print(english_portuguese)

In [None]:
english_portuguese['one'] = "um"
english_portuguese["two"] = 'dois'
print(english_portuguese)

- Another way to create a dictionary is to provide a list of **key:value** pairs using the same syntax as the previous output.

- It doesn’t matter what order we write the pairs (there’s no indexing!).



```python
  >>> english_spanish = {"one": "uno", "three": "tres", "two": "dos"}
  >>> english_spanish
  {'one': 'uno', 'three': 'tres', 'two': 'dos'}

  >>> print(english_spanish["two"])
  dos
```


### Look up a value

- The dictionary is the first compound type that we've seen that is not a sequence, so we can't index or slice a dictionary.
- The syntax for looking up a value is however the same, with square brackets.



In [None]:
english_portuguese = {"one": "um", "three": "tres", "two": "dois"}
print(english_portuguese["three"])

What do you get here?

In [None]:
# silly dictionary
dict = {1: 10, 2: 20, 3: 30, 4: 40, 5: 50}
print(dict[3])

That was lookup, not indexing!
It just happens that the keys of this dictionary are integers...


![mafalda](https://raw.githubusercontent.com/fp-leic/public/main/notebooks/10/mafalda.png)

$\Rightarrow$
<https://en.wikipedia.org/wiki/Mafalda>

### Hashing

- Python uses complex algorithms, designed for very fast access, to determine where the **key:value** pairs are stored in a dictionary.

- The implementation uses a technique called **hashing** [[wiki]](https://en.wikipedia.org/wiki/Hash_function).




### Efficiency

- The same concept of mapping a key to a value could be implemented using a list of tuples...

```python
  >>> {"apples": 430, "bananas": 312, "oranges": 525, "pears": 217}
  {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

  >>> [("apples", 430), ("bananas", 312), ("oranges", 525), ("pears", 217)]
  [('apples', 430), ('bananas', 312), ('oranges', 525), ('pears', 217)]
```

- The reason to choose this new data type is because dictionaries are **very fast**.
- Hashing allows us to access a value very quickly.
- By contrast, the list of tuples implementation is slow:
  - If we wanted to find a value associated with a key, we would have to iterate over every tuple, checking the 0th element.
  - What if the key wasn’t even in the list?
  - We would have to get to the end of it to find out.

## 10.2 Dictionary operations

- The `del` statement removes a *key:value* pair from a dictionary.

- The `len()` function also works on dictionaries; it returns the number of *key:value* pairs.

```python
   >>> inventory = {"apples": 430, "bananas": 312, "quinces": 217}
   >>> del inventory["bananas"]
   >>> len(inventory)
```

$\Rightarrow$
<https://github.com/fp-leic/public/tree/main/lectures/10/operations.py>

Watch for some operations:

In [None]:
inventory = {"apples": 430, "bananas": 312, "oranges": 525, "pears": 217}
print(inventory)
print(len(inventory))

In [None]:
del inventory["pears"]
print(inventory)

In [None]:
inventory["bananas"] = 0
print(inventory)

In [None]:
inventory["bananas"] += 200
print(inventory)

## 10.3  Dictionary methods

- Dictionaries have a number of useful built-in methods.

- The `keys()` method returns what Python3 calls a *view* of its underlying keys.

    - A *view* object has some similarities to the *range* object we saw earlier — it is a **lazy promise**, to deliver its elements when they're needed by the rest of the program.

    - We can *iterate over the view*, or turn the view into a list.

- The `values()` method is similar, but for values rather than keys.

- The `items()` method also returns a *view*, which *promises* a list of key:value pairs.

- From Python 3.7 onwards, the order of the pairs *key:value* follows the insertion order (but this doesn't apply to all programming languages!).

```python
  for key in english_spanish.keys():
      print("Got key", key, "which maps to value", english_spanish[key])
```

$\Rightarrow$
<https://github.com/fp-leic/public/tree/main/lectures/10/methods.py>

Use `keys()`:

In [None]:
english_portuguese = {"one": "um", "two": "dois", "three": "tres"}

for key in english_portuguese.keys():
    print("Got key", key, "which maps to value", english_portuguese[key])

In [None]:
keys = list(english_portuguese.keys())
print(keys)

Iterating over a dictionary implicitly iterates over its keys:

In [None]:
for key in english_portuguese:
    print("Got key", key)

Use `values()`:

In [None]:
values = list(english_portuguese.values())
print(values)

Use `items()`:

In [None]:
print(list(english_portuguese.items()))

In [None]:
for (key, value) in english_portuguese.items():
    print("Got key", key, "which maps to value", value)

### Dictionary membership

In [None]:
print("one" in english_portuguese)
print("six" in english_portuguese)

Note that `in` tests *keys*, not *values*.

In [None]:
print("tres" in english_portuguese)

What's the result of looking up a non-existent key in a dictionary?

In [None]:
print(english_portuguese["dog"])

So, you better be sure before doing it:

In [None]:
if "three" in english_portuguese:
    print(english_portuguese["three"])

### Dictionary method `get()`

- You can also use the `get()` method to avoid the runtime error.
- It returns a default value if the key doesn't exist.

In [None]:
print(english_portuguese.get("dog", "NOT FOUND"))

## 10.4 Aliasing and copying

- As in the case of lists, because **dictionaries are mutable**, we need to be aware of *aliasing*.

- Whenever two variables refer to the same object, changes to one affect the other.

- To modify a dictionary and keep a copy of the original, use the `copy()` method.

```python
  >>> opposites = {"up": "down", "right": "wrong", "yes": "no"}
  >>> alias = opposites
  >>> copy = opposites.copy()  # Shallow copy
```

- A *shallow copy* constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.

  - This means that if values are objects, they are not copied themselves.

  - By the way, this also applies to the `copy()` method of lists.

$\Rightarrow$
<https://github.com/fp-leic/public/tree/main/lectures/10/aliases.py>


Those are aliases:

In [None]:
opposites = {"up": "down", "right": "wrong", "yes": "no"}

alias = opposites

But this is a shallow copy:

In [None]:
copy = opposites.copy()

What now?



In [None]:
alias["right"] = "left"
print(opposites["right"])

In [None]:
copy["right"] = "Guiness"
print(opposites["right"])

But remember that shallow copies are shallow! They just copy the references to the values.

In [None]:
d1 = {"ref":[1,2,3,4]}
d2 = d1.copy()
d1["ref"].append(5)
d2["ref"].append(5)
print(d1)
print(d2)

## 10.5 Sparse matrices

- We previously used a list of lists to represent a matrix.

- That is a good choice for a matrix with mostly nonzero values, but consider a sparse matrix like this one:

$$\left[
    \begin{array}{ccccc}
    0 & 0 & 0 & 1 & 0 \\
    0 & 0 & 0 & 0 & 0 \\
    0 & 2 & 0 & 0 & 0 \\
    0 & 0 & 0 & 0 & 0 \\
    0 & 0 & 0 & 3 & 0 \\
    \end{array}
  \right]$$

- The list representation contains a lot of zeroes.

- An alternative, more compact representation, is to use a dictionary and the `get()` method (as you'll see next).

- Beware that there’s a trade-off here, as the access may take more time.

$\Rightarrow$
<https://github.com/fp-leic/public/tree/main/lectures/10/matrix.py>

The sparse matrix as a list:

In [None]:
matrix = [[0, 0, 0, 1, 0],
          [0, 0, 0, 0, 0],
          [0, 2, 0, 0, 0],
          [0, 0, 0, 0, 0],
          [0, 0, 0, 3, 0]]

print(matrix)

The sparse matrix as a dictionary:

In [None]:
# For the keys, we can use tuples that contain the row and column numbers
matrix = {(0, 3): 1, (2, 1): 2, (4, 3): 3}

print(matrix)

Accessing matrix elements:

In [None]:
print(matrix[(0, 3)])

What happens if we try:

In [None]:
print(matrix[(1, 3)])

The `get()` method with a default value of 0 solves this problem:

In [None]:
# The first argument is the key; the second argument is the value
# get should return 0 if the key is not in the dictionary

print(matrix.get((0, 3), 0))
print(matrix.get((1, 3), 0))

## 10.6 Counting letters

### Generate a frequency table

- Let's try to write a function that counts the number of occurrences of a letter in a string.

- Dictionaries provide an elegant way to generate a frequency table.

- An algorithm:

```
     start with an empty dictionary
     for each letter in the string:
        find the current count (possibly zero) and increment it
     the dictionary contains pairs of letters and their frequencies
```

$\Rightarrow$
<https://github.com/fp-leic/public/tree/main/lectures/10/frequency-table.py>

Consider the text:

In [None]:
s = """
This parrot is no more! It has ceased to be!
It’s expired and gone to meet its maker!
This is a late parrot! It’s a stiff!
Bereft of life, it rests in peace!
If you hadn’t nailed it to the perch, it would be pushing up the daisies!
It’s run down the curtain and joined the choir invisible!
This is an ex-parrot!
"""

Let's count the letters of the text (any idea where it came from?):

In [None]:
letter_counts = {}
for letter in s:
    letter_counts[letter] = letter_counts.get(letter, 0) + 1
print(letter_counts)

# Further reading

### Python Dictionaries

Python Tutorial || Learn Python Programming -- Socratica

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('BfS2H1y6tzQ')

### Memoisation

- Considering the sequence of [Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_number)

- and the function call graph for `fib()` with n = 4

![fib](https://raw.githubusercontent.com/fp-leic/public/main/notebooks/10/fib.png)


- A good solution is to keep track of values that have already been computed by **storing them in a dictionary**

- A previously computed value that is stored for later use is called a **memo**
    
$\Rightarrow$
<https://github.com/fp-leic/public/tree/main/lectures/10/fib.py>

Let's have a look at a `fib()` implementation (we'll see **recursion**!):

In [None]:
# This is a particularly inefficient algorithm, and this could be solved
# far more efficient iteratively or using memoisation

def fib(n):
    if n <= 1: return n
    return fib(n-1) + fib(n-2)

print(fib(10))

Now, using **memoisation**:

In [None]:
already_known = {0: 0, 1: 1}

def fib(n):
    if n not in already_known:
        new_value = fib(n-1) + fib(n-2)
        already_known[n] = new_value
    return already_known[n]

print(fib(10))

-- João Correia Lopes & Pedro Vasconcelos, 16 out 2022 15:24:07 WEST