# Dictionaries

In the previous lessons you used **lists** in your Python code.
An object of **class list** is a container of ordered values.

The fact that a list is *ordered* is a fundamental concept: **lists** work the best when the items stored there can be ordered in a natural and meaningful way.

However, that's not always the case.

In [None]:
# This list contains how much you earned each month of the last year (from January to December)
# The items have a meaningful order, so working with them it's easy
payslips = [100, 120, 100, 90, 100, 110, 100, 120, 95, 100, 90, 120]
june_index = 5
print("The pay in January is", payslips[0], "and the pay in july is", payslips[june_index + 1])

# This list contains consecutive observations of an experiment
# The order of the elements is important: if we mix them, they would provide a different result
obs = [1, 5, 10]
last = 0
for x in obs:
    if x > last:
        print("The observed value increased to", x)

# This list contains the price of the ticket in different museums
# These items DO NOT have a meaningful natural order.
# They could be ordered in any way that it would  not make any difference:
# you still need a "map" that tells you which element is in which position.
museum_tickets = [5, 2, 8]
science_museum_index = 0
history_musuem_index = 1
art_museum_index = 2
print("The ticket for the science museum costs", museum_tickets[science_museum_index])

### Dictionaries

A **list** is not the only container **class** in Python. There are also **dictionaries**: a container **class** that is meant to solve the problem described above.

A **dictionary** is a container of multiple **key-value** pairs.

Think about what a dictionary is in your everyday experience: a dictionary is a book where, for each word ( the **key**) there is a description (the **value**). Each word is defined only once, but multiple words may have the same definition (e.g. if they are synonyms). Even if the dictionary is ordered (alphabetically), when you have to use it you don't say things like "I need the definition of the third word after this one", but rather "I need the definition of the word *cephalopod*".

A Python **dictionary** has all the properties described above.

In [None]:
museum_tickets = {
    "science_museum" : 5,
    "history_museum" : 2,
    "art_museum" : 8
}

x = museum_tickets["science_museum"]
print("The science ticket costs", x)
print("The art ticket costs", museum_tickets["art_museum"])

Let's analyze how to create and use a **dictionary**.

A **dictionary** is initialized using multiple key-value pairs between curly brackets `{`, `}`.
First you have a **key**, then its corresponding **value**; between the key and the value there is a colon `:`.
Key-value pairs are separated by commas `,`.

In order to access a value in the dictionary, you use an operator very similar to the indexing operator used for lists. The only difference is that you have to provide a key between brackets and not a position.

Note that you can use different **classes** for keys and values: keys can be of any Python immutable **class**, i.e. **int**, **float**, **string**, but you can't use **lists** as keys, on the other hand values can be of any **class**, both mutable and immutable.

### Exercise

Write a dictionary that allows to convert some numbers into their English text representation (i.e. `1` to `"one"`). Try to access some elements.

### Checking for elements existence

If you try to access an element of a **list** using a non-existing index you get an out-of-bound error.
You should always check if an index is valid before using it to access an element of a container.

For **lists**, this check consists in making sure that the index is less than the length of the **list**.

In [None]:
my_list = [1, 10, 100]

my_indices = [1, 6]
for index in my_indices:
    if index < len(my_list):
        print("The index", index, "is valid and the element is", my_list[index])
    else:
        print("The index", index, "is out of bound, I can't use it")

The same problem applies also to **dictionaries**.
If you try to **read** the value for a non-existing key, you will get an error.

In [None]:
my_dict = { "a" : 10 }

# KeyError: key "b" does not exist in the dictionary
print(my_dict["b"])

Similarly to **lists** also **dictionaries** provide a way for checking if a key is valid or not.

In [None]:
my_dict = { "a" : 10 }

my_key = "a"
if my_key in my_dict:
    print("The key", my_key, "has been found and the value is", my_dict[my_key])
    
# The syntax on the right hand side of the assignmenet operator evaluates to a boolean value
# That's why we have been able to use it in an `if` statement above
# `in` is a boolean operator exactly as `>` or `!=`
found = "my_fancy_key" in my_dict
print("Is the key found?", found)

### Exercise

Encoding is an invertible operation that takes some data and represent it in a different format.

Use the provided encoding dictionary to convert a sequence of characters into their encoded form.
Note that not all values in the list have a valid encoding described in the dictionary. Encode any missing value as  `0`.

In [None]:
encoding_dict = {
    "_" : 0,
    "a" : 1,
    "b" : 2,
    "c" : 3,
    "d" : 4
}

x = ["a", "d", "h", "b", "b", "_"] # This should encode to `[1, 4, 0, 2, 2, 0]`

### Iterating over a dictionary

In a **list** each element has an index and its value.
The `enumerate()` function allows you to do a  `for` loop that uses both of them.

On the other hand, a **dictionary** is made of keys (instead of indices) and values.

It is possible to iterate through the elements of a **dictionary** using standard `for` loops. In this case, the placeholder variable will have each of the keys assigned to it, not the values as when working with **lists**.

In [None]:
my_dict = {
    1 : "uno",
    2 : "dos",
    5 : "cinco",
    10 : "diez"
}

for k in my_dict:
    print(k, "corresponds to", my_dict[k])

### Modifying a dictionary

The **dictionary** is a mutable **class**, as the **list**.
This means that you can modify the value for an existing key or add new key-value pairs.

Using the **indexing** operator on the left hand side of an assignment, you can modify the value in a **list** at the position indicated by the index.
The same applies to **dictionaries**, but you have to specify the key instead of the index.
Note that you can't modify an existing key, but only its associated value.

In a **list** you can add new elements at the end using the `append()` **method**.
This does not exists for **dictionaries** because key-value pairs do not have a meaningful order.
The same operator described above for modifying existing values, can be used to add new key-value pairs to a **dictionary**: if the provided key does not exist it will be automatically created.
This is different from **lists** where the **indexing** operator would give you an out-of-bound error regardless if you are trying to **read** or **write** values.

Remember that keys are unique in a **dictionary**.

In [None]:
my_dict = {
    "a" : 10
}

print(my_dict["a"])
my_dict["a"] = 20 # key "a" already exist in the dictionary, so modify its value
print(my_dict["a"])
my_dict["b"] = 40 # key `b` does not exist in the dictionary, create a new key-value pair

print(my_dict)

Note that in the last example you are trying to modify the **dictionary** entry for key `b`. `my_dict["b"]` is on the left side of the assignement operator.

You can't **read** a value from a non-existing key (i.e. by using it on the right hand side of an assignement operator), but you can use a non-existing key to **write** a new key-value pair into the **dictionary**.

Note that the following 2 notations result in exactly the same **dictionary**.

In [None]:
# This syntax is preferred when you already know all the elements
a = {
    "a" : 10,
    "b" : 20
}
print("Dictionary a:", a)

# This syntax is preferred when you have to add key-value pairs according to some conditions
b = {}
b["a"] = 10
b["b"] = 20
print("Dictionary b:", b)

When creating **dictionaries**, a different behavior is often required depending on if a certain key (and thus also its corresponding value) is already present or not.

In [None]:
my_dict = {}

my_keys = ["a", "b", "a", "a"]
for k in my_keys:
    if k in my_dict:
        my_dict[k] = my_dict[k] * 10
    else:
        my_dict[k] = 1

print(my_dict)

Remember that mutable **classes** in Python are copied by reference.

This means that if you copy a **dictionary** and then you modify it, the changes will reflect also on the copy as it happens with **lists**.

You can avoid copy by reference with the same mechanisms used for **lists**, i.e. creating a new empty **dictionary** and adding all the elements of the original one to it.

Note that the **slicing** operator is not available for **dictionaries**.

In [None]:
a = {}
b = a

b["a"] = 1

print("a is:", a)
print("b is:", b)

### Exercise

Define a function that given an input number returns a dictionary where the keys are all the integer numbers from `1` to the input number included and the values are their square.

Hint: you should use the `range()` function in your loop.

In [None]:
# Input values
x = 2
y = 5

### Exercise

Define a function that takes as input a list of strings and it returns a dictionary where each key is a character that is present in one of the strings and the value is its total number of occurrences within the list.

In [None]:
# Input lists
x = ["hello", "world", "test"]
y = ["what you doing?"]

### Exercise

Decoding is the inverse of the encoding operation. It converts an encoded information back to its original form.
Use the encoding dictionary to create a decoding dictionary, i.e. its opposite where the encoded value is the key and the decoded version is the value.

In [None]:
encoding_dict = {
    "_" : 0,
    "a" : 1,
    "b" : 2,
    "c" : 3,
    "d" : 4
}

z = [1, 4, 0, 2, 2, 0]

### Exercise

Define a function that takes as input a string and a list of strings representing a database.
The function should compare the provided input string with every string in the databse and find the two strings in the database that are the most similar with it.
The function should return a list of 2 elements corresponding to the 2 most similar strings in order.

Similarity is only measured by same elements in the exact same position among the strings.

In [None]:
# The database
db = [
    "ATATATATATAT",
    "AGCTAGCTAGCT",
    "GCGCGCATATAT",
    "TGCAATGACGTA"
]

# Input strings
x = "AAAAAAAAAAAA"
y = "GAGAGACTCTCT"