# Session 10: JSON files

What is JSON? It's a simple way of representing data independent of the platform or language you're using.

In JSON, data is stored as key-value pairs, and they are stored as text within a single string. 

The module to use when working with JSON in Python is `json`. We can use it with:

```Python
import json
```

JSON Summary
* **JSON stands for JavaScript Object Notation**
* JSON is a lightweight data-interchange format
* JSON is plain text written in JavaScript object notation
* JSON is used to send data between computers
* JSON is language independent
* JSON is "self-describing" and easy to understand

In [1]:
import json

We can read JSON files with `json.load()` or write on JSON files with `json.dump()`

In [2]:
# let's create a dictionary to store in a json file

dani_info = {
    "name": "Daniel",
    "last_names": ["Garcia", "Hernandez"],
    "email": "dgarciah@faculty.ie.edu",
    "age": 33,
    "vehicles": [
        {
            "type": "car",
            "brand": "Nissan",
            "model": "Pulsar",
            "age": 6
        },
        {
            "type": "motorbike",
            "brand": "BMW",
            "model": "F800GT",
            "age": 3
        }
    ],
    "pets":[
        {
            "name": "Churro",
            "species": "dog",
            "age": 8
        }
    ]
        
}

dani_info

{'name': 'Daniel',
 'last_names': ['Garcia', 'Hernandez'],
 'email': 'dgarciah@faculty.ie.edu',
 'age': 33,
 'vehicles': [{'type': 'car', 'brand': 'Nissan', 'model': 'Pulsar', 'age': 6},
  {'type': 'motorbike', 'brand': 'BMW', 'model': 'F800GT', 'age': 3}],
 'pets': [{'name': 'Churro', 'species': 'dog', 'age': 8}]}

In [4]:
# we dump info into a json file with `json.dump`

with open("../files/dani_info.json", "w") as f:
    json.dump(dani_info, f)

In [6]:
# read json file we just created

with open("../files/dani_info.json") as json_file:
    dani_dictionary = json.load(json_file)

dani_dictionary

{'name': 'Daniel',
 'last_names': ['Garcia', 'Hernandez'],
 'email': 'dgarciah@faculty.ie.edu',
 'age': 33,
 'vehicles': [{'type': 'car', 'brand': 'Nissan', 'model': 'Pulsar', 'age': 6},
  {'type': 'motorbike', 'brand': 'BMW', 'model': 'F800GT', 'age': 3}],
 'pets': [{'name': 'Churro', 'species': 'dog', 'age': 8}]}

### Types of data we can include in JSON:
* Strings
* Numeric: float and integers
* Boolean: `True` and `False`
  * In the case of booleans, the `json` library will convert `True` to `true` and `False` to `false`

In [9]:
data_types = [
    {"type": "string", "examples": "hello!"},
    {"type": "numeric", "examples": [145, 0.0052]},
    {"type": "boolean", "examples": [True, False]},
]

with open("../files/data_types.json", "w") as json_file:
    json.dump(data_types, json_file)

We can update an existing JSON file by adding new key-value pairs of information to it:
* Let's add a new key called `favorites` including my favorite food, drink, band, and videogame
* Let's change my age to 34 --getting ready for my birthday in December :D

How to do that:
1. Read the JSON file with `json.load()` as an object we can modify
2. Do the changes we need into it
3. Use `json.dump()` and save the new object as a JSON file with the same name we used 

In [10]:
# first we read the JSON file
with open("../files/dani_info.json") as json_file:
    dani_info = json.load(json_file)
    
# update the content
# new key: favorites
dani_info["favorites"] = {
    "food": "döner kebab",
    "drink": "sparkling water",
    "band": "Tame Impala",
    "videogame": "Metal Gear Solid"
}

# changing `age` key's valuefrom 33 to 34
dani_info["age"] = 34

# now we store the new information as JSON
with open("../files/dani_info.json",  "w") as json_file:
    json.dump(dani_info, json_file)

In [12]:
# we can see now that the info has been updated ;)
with open("../files/dani_info.json") as json_file:
    dani_info = json.load(json_file)
    
dani_info

{'name': 'Daniel',
 'last_names': ['Garcia', 'Hernandez'],
 'email': 'dgarciah@faculty.ie.edu',
 'age': 34,
 'vehicles': [{'type': 'car', 'brand': 'Nissan', 'model': 'Pulsar', 'age': 6},
  {'type': 'motorbike', 'brand': 'BMW', 'model': 'F800GT', 'age': 3}],
 'pets': [{'name': 'Churro', 'species': 'dog', 'age': 8}],
 'favorites': {'food': 'döner kebab',
  'drink': 'sparkling water',
  'band': 'Tame Impala',
  'videogame': 'Metal Gear Solid'}}

## When to use JSON files IRL

JSON files are present almost everywhere, specifically when information needs to flow like when doing requests to APIs for example.

It's also a very useful format to store configurations of Data Science models and projects.

For example, we can store the initial parameters of the algorithms, resource allocation of the machines, and useful variables that we don't want to hard-code in out codebase, but rather read it from the `config.json` file.

In [15]:
# example of information to store in a JSON file for Machine Learning
config_dict = {
    "number_of_trees": 50,
    "n_of_branches": 5,
    "nodes": 4,
}

## Practice: JSON data

In [16]:
with open("../files/bike_accidents_2021.json", encoding="utf-8") as f:
    accidents = json.load(f)

In [17]:
accidents[0]

{'date': '2021-01-01',
 'time': '11:38',
 'street': 'CALL. JOSE BERGAMIN / CALL. FLORENCIO CANO CRISTOBAL',
 'number': '62',
 'district': 'MORATALAZ',
 'weather': 'clear',
 'sex': 'male',
 'alcohol_positive': 'N'}

### 1. Include the following new keys in each item in the JSON file:
* Hour of the accident
* Day of the week
* Weekend or not

In [18]:
# working with date and time with the module `datetime`
from datetime import datetime

dt_to_convert = "2021-01-01"

datetime.strptime(dt_to_convert, "%Y-%m-%d").weekday()

4

In [19]:
# let's create the new keys for the accidents

for accident in accidents:
    accident["hour"] = accident["time"].split(":")[0]
    dt_object = datetime.strptime(accident["date"], "%Y-%m-%d")
    day_of_week = dt_object.weekday()
    accident["day_of_week"] = day_of_week
    accident["weekend"] = True if accident["day_of_week"]>4 else False
    
accidents[0]

{'date': '2021-01-01',
 'time': '11:38',
 'street': 'CALL. JOSE BERGAMIN / CALL. FLORENCIO CANO CRISTOBAL',
 'number': '62',
 'district': 'MORATALAZ',
 'weather': 'clear',
 'sex': 'male',
 'alcohol_positive': 'N',
 'hour': '11',
 'day_of_week': 4,
 'weekend': False}

### 2. Build dictionary with district as key and total number of accidents as value.

Save the results as a JSON file called `accidents_per_district.json`

In [20]:
new_dict = {}

districts = {
    accident["district"] for accident in accidents
}

for district in districts:
    number_of_accidents = 0
    for accident in accidents:
        if accident["district"] == district:
            number_of_accidents += 1
            
    new_dict[district] = number_of_accidents
    
with open("accidents_per_district.json", "w") as f:
    json.dump(new_dict, f)

### 3. What's the proportion of accidents with a `sex=male` involved? And `sex=female`?

In [21]:
sexes = {
    accident["sex"] for accident in accidents
}

accidents_per_sex = {}

for sex in sexes:
    counter = 0
    for accident in accidents:
        if accident["sex"] == sex:
            counter += 1
            
    accidents_per_sex[sex] = counter
    
total_accidents = sum(accidents_per_sex.values())

prop_male = accidents_per_sex["male"] / total_accidents
prop_female = accidents_per_sex["female"] / total_accidents

print(f"Proportion of accidents involving men: {prop_male}")
print(f"Proportion of accidents involving women: {prop_female}")

Proportion of accidents involving men: 0.7680851063829788
Proportion of accidents involving women: 0.2127659574468085


### 4. Hour of the day with more accidents

In [22]:
new_dict = {}

hours = {
    accident["hour"] for accident in accidents
}

for h in hours:
    number_of_accidents = 0
    for accident in accidents:
        if accident["hour"] == h:
            number_of_accidents += 1
            
    new_dict[h] = number_of_accidents
    
sorted_dict = sorted(new_dict.items(), key=lambda x: x[1])

hour_with_more_accidents = sorted_dict[-1]

hour_with_more_accidents

('19', 39)

### 5. Is bad weather correlated with more accidents?

In [23]:
weathers = {
    accident["weather"] for accident in accidents
}

accidents_per_weather = {}

for weather in weathers:
    counter = 0
    for accident in accidents:
        if accident["weather"] == weather:
            counter += 1
            
    accidents_per_weather[weather] = counter
    
accidents_per_weather

{'clear': 407, 'heavy rain': 3, 'light rain': 24, 'cloudy': 14, 'unknown': 22}