# Dictionaries (mappings) and JSON

What we just saw was [JSON](https://www.youtube.com/watch?v=b4QDxoWlPFw) format that is commonly used online. When next semester we will be finally using some real data it will be most likely in this format. We will avoid using any kind of tabular data in _Python_ not because it is impossible but because for our purposes most of the time _R_ is better suited for the task.

In _Python_ a natural representation of `JSON` is a dictionary (often also referred to as mapping). Any hashable object can be used as a key, for example, `int`, `float`, `bool`, `str`, but also a function or tuple. However, for now, we are going to use mostly strings or integers. In a way, a dictionary is similar to a list except that we index them using **keys** rather than integers. You can think about them as key-value pairs. Like `JSONs` they are enclosed between curly braces and each element is written as a key followed by a colon followed by a value. For example, consider the code below.

In [None]:
## Let's define an empty dictionary
empty_dct = {}

## Let's define a dictionary with some elements
nike = {
    "Dorota Masłowska": "Paw królowej",
    "Piotr Matywiecki": "Ta chmura powraca",
    "Zbigniew Mentzla": "Wszystkie języki świata",
    "Eustachy Rylski": "Warunek",
    "Wisława Szymborska": "Dwukropek",
    "Mariusz Wilk": "Wołoka",
    "Michał Witkowski": "Lubiewo",
}

Let's now see what are the most basic operations on dictionaries.

In [None]:
## Return the number of items (key-value pairs) of a dictionary
len(nike)

In [None]:
## Return a view of the keys in a dictionary
nike.keys()

In [None]:
## Return a view of the values in a dictionary
nike.values()

In [None]:
## Return a view of the (key, value) pairs in a dictionary
nike.items()

In [None]:
## Update a dictionary with the (key, value) pair, overwriting existing keys.
nike.update({"Dorota Masłowska": "Motyle"})
nike

In [None]:
## Return True if a key is in a dictionary
"Dorota Masłowska" in nike

In [None]:
## Returns the item if the key is in the dictionary
nike["Dorota Masłowska"]

In [None]:
## Return d[k] if k is in d, and v otherwise
nike.get("Dorota Masłowska", "Nie ma")

In [None]:
## Return d[k] if k is in d, and v otherwise
nike.get("Czesła Miłosz", "Nie ma")

In [None]:
## Associate the value with the key. If there is already a value associated with the key it is replaced
nike["Zbigniew Rokita"] = "Kajś"
nike

In [None]:
## Remove the key from the dictionary
del nike["Zbigniew Rokita"]

In [None]:
## Pop the key from a dictionary
title = nike.pop("Dorota Masłowska")
title

In [None]:
## Can we find a key by its value?
nike["Dwukropek"]

No. Why is it so? That is because dictionaries (mappings) are meant to represent functions, which in terms of maths are maps from one set to another ($A \mapsto B$). It means that every element of $A$ (key) has to be uniquely mapped to an element of $B$ (value), but multiple keys can be mapped to one value. In other words, keys have to be unique while values may repeat in the same dictionary (mapping). For example, consider the code below.

In [None]:
ChL = {
    2010: "Olympique Lyon",
    2011: "Olympique Lyon",
    2012: "Olympique Lyon",
    2013: "Vfl Wolsburg",
    2014: "Vfl Wolsburg",
    2015: "FFC Frankfurt",
    2016: "Olympique Lyon",
    2017: "Olympique Lyon",
    2018: "Olympique Lyon",
    2019: "Olympique Lyon",
    2020: "Olympique Lyon",
    2021: "FC Barcelona",
    2022: "Olympique Lyon",
    2023: "FC Barcelona",
}

ChL

The important thing is that the order of the keys in the dictionary is the order in which the keys were inserted. Let's then iterate over the dictionary. In fact, there are multiple ways to use `for-loop` over the entries of a dictionary. The simplest is to just go over the keys. For example, consider the code below.

In [None]:
## Iterate over keys
for key in ChL:
    print(f"{ChL[key]} won Champions League in {key}.")

In [None]:
## Or by using method keys
for key in ChL.keys():
    print(f"{ChL[key]} won Champions League in {key}.")

Now, we can also iterate over the values of a dictionary. Instead of using the `dict.keys()` method we will use the `dict.values()` method. It works very similarly to the previous one. The values are again returned in the order of the entry.

In [None]:
## We can iterate over values
clubs = []
for value in ChL.values():
    clubs.append(value)

print(f"Champions League was won only by the following clubs {clubs}")

This is cool but what we print does not have much sense because the names of the teams repeat. Can you try to fix it?

In [None]:
## Fix the code so clubs consider only unique names
clubs = []
for value in ChL.values():
    if value not in clubs:
        clubs.append(value)

print(f"Champions League was won only by the following clubs {clubs}")

Let's now use the dictionary to solve the Task 1 from the homework assignment. Write a function that will count occurrences of integers in a list. It should take as the only argument a list of integers and return a dictionary with frequencies, i.e.

```python
input_list = [1, 2, 3, 3, 2, 4]
output_dict = {1 : 1, 2 : 2, 3 : 2, 4 : 1}
```
This time try to iterate over every single element of a list.

In [None]:
def dict_count(L1):
    """
    It returns a dictionary with the frequencies of elements of L1.

    Args:
            L1 (list): a list of values

    Returns:
            dict: dictionary with frequencies of elements of L1
    """
    output = {}
    for item in L1:
        if item in output:
            output[item] += 1
        else:
            output[item] = 1
    return output

In [None]:
import numpy as np

np.random.seed(1987)
rand_numbers = np.random.randint(0, 20, (100000,)).tolist()
dict_count(rand_numbers)

What is more, we can also iterate over both keys and values at the same time. To do so we use method `dict.items()`. Therefore, in each iteration, each element of an object is a tuple of a key and its associated value.

In [None]:
for key, value in ChL.items():
    print(f"{value} won the Champions League in {key}")

The other important notion is that we can store mappings in a list. For example, let's come back to Marian and Marianna.

In [None]:
## Let's define the mapping for Marianna
marianna_dict = {
    "name": "Marianna",
    "age": 17,
    "interests": [
        {"name": "physics", "field": ["quantum physics", "string theory"]},
        {"name": "sport", "field": ["fishing", "football"]},
    ],
}

## Let's define the mapping for Marian
marian_dict = {
    "name": "Marian",
    "age": 15,
    "interests": [{"name": "literature", "genre": ["poems"]}],
}

In [None]:
## We can store them in one list
mm_list = [marianna_dict, marian_dict]
mm_list

This looks more like a `JSON` line file I showed you before, right?

In [None]:
## Access Marianna's name
mm_list[0]["name"]

## Read a file

This is all very good. However, the issue here is that we mainly play with the data we produce or insert into _Python_ from the keyboard. This is fun but what we would like to do is to load this famous `JSON` to _Python_ and start playing with them. This is not that hard. We will now use the files I sent to you via email or you can download them from [here](https://classroom.google.com/c/NTQxNjE1OTc3MjYw/m/NjE0NDA3OTEwMTY3/details). In Colab, you need to do an extra step to load some data to _Python_ but it is somehow obvious. It is almost the same as uploading a file to Google Drive.

Once we have the file uploaded to our workspace we can load it to _Python_ (you can think about it as opening the uploaded file in Google Docs).

In [None]:
## Open connection (a bit like open the file)
file = open("m.json", "r")
## Read the contect of the file to variable
marian = file.read()
## Close the connection (a bit like close the file)
file.close()

In [None]:
## Let's what it looks like
marian

In [None]:
## It does not look like a dict but rather a string
type(marian)

To convert this string that consists of a dictionary we would have to parse it. In other words, convert it to the dictionary so _Python_ recognizes it correctly. Fortunately, we are not going to do it without help. We are going to use an already existing function that is in `json` module. In _Python_ what in _R_ is called `library` is called a module. To use a function from a specific module we need to import it first. We have already seen an example of this with `numpy`.

In [None]:
## Import json module
import json

## Use loads function
json.loads(marian)

This is all very good but it is a lot of lines. It requires opening and closing the file. Afterward, we use this new module. Is there a way to make it more concise? Yes, of course. Consider the code below.

In [None]:
import json

with open("m.json", "r") as file:
    marian = json.loads(file.read())

marian

That is fine but what if we wanted to read to _Python_ a `JSON` line file. So now our approach will be kind of similar. But instead of reading everything at the same time, we will do it line by line. Therefore, we will use the method `file.readlines()`. It creates a list of lines and in our case it allows us to convert every single line to a dictionary.

In [None]:
import json

with open("mm.jl", "r") as file:
    mm = [json.loads(item) for item in file.readlines()]

mm

## Write out the data

This is all very good. But let's now imagine that we want to save the data we produced in _Python_ in a form of `JSON`. Let's first imagine that we promised to help our friend who happens to be a librarian. We got the following variables from somewhere and we want to create a dictionary with the record.

In [None]:
title = "Gone with the wind"
author = "Margaret Mitchell"
blurb = """
Scarlett O'Hara, the beautiful, spoiled daughter of a well-to-do
Georgia plantation owner, must use every means at her disposal to
claw her way out of the poverty she finds herself in after Sherman's
March to the Sea.
"""
pages = 1037
published = 1936
lang = "en"

In [None]:
## Create the dictionary with data from the previous chunk
record = {}

So, how to save it now? It is very similar to reading the data. 

In [None]:
import json

with open("gone_with_the_wind.json", "w") as file:
    file.write(json.dumps(record))

Let's now try to read the file and check whether it is the same as the record.

In [None]:
## YOUR CODE

Ok, but how to save a list of dictionaries? Then, we will produce a `JSON` line file. It will be similar to how we read it. However, this time we want to iterate over our list in _Python_ and put every single element of it as a separate line, separate `JSON` in the file.

In [None]:
import json

with open("mm2.jl", "w") as file:
    for line in mm_list:
        file.write(json.dumps(line) + "\n")