# Module 1: JSON files

There are lot of different file types, and some are more common then others. And others require more work than others. JSON is one of the most important file types.

During this training we will go into detail about JSON files, and how to work with them. For those familiar with Python and dictionaries, it will feel quite similar. During the training we will follow the following outline:
1. JSON file basics
2. The json library
3. Navigating a JSON file structure
4. JSON to information
5. Nested JSON files

Enjoy!

Run the following cell to import all necessary libraries.

In [1]:
import json
import requests
from pprint import pprint
import pandas as pd

## Section 1: JSON file basics

JSON, or **J**ava**S**cript **O**bject **N**otation, is one of the golden standards for information exhange in the world of data. When transporting data, for example throught an API, then JSON is the way to go. JSON has been built to be readable by a lot of programming languages, and that includes Python.

Due to its structure, and its widespread availability and accessability, knowledge about JSON is essential for an aspiring data engineer. So let's have a look at those so called JSON files. In the example below we will retrieve a JSON file through an API. The example is about Game of Thrones.

In [2]:
URL = "https://anapioficeandfire.com/api/characters/583"
jon_snow_json = requests.get(URL).json()

pprint(jon_snow_json)

{'aliases': ['Lord Snow',
             "Ned Stark's Bastard",
             'The Snow of Winterfell',
             'The Crow-Come-Over',
             "The 998th Lord Commander of the Night's Watch",
             'The Bastard of Winterfell',
             'The Black Bastard of the Wall',
             'Lord Crow'],
 'allegiances': ['https://anapioficeandfire.com/api/houses/362'],
 'books': ['https://anapioficeandfire.com/api/books/5'],
 'born': 'In 283 AC',
 'culture': 'Northmen',
 'died': '',
 'father': '',
 'gender': 'Male',
 'mother': '',
 'name': 'Jon Snow',
 'playedBy': ['Kit Harington'],
 'povBooks': ['https://anapioficeandfire.com/api/books/1',
              'https://anapioficeandfire.com/api/books/2',
              'https://anapioficeandfire.com/api/books/3',
              'https://anapioficeandfire.com/api/books/8'],
 'spouse': '',
 'titles': ["Lord Commander of the Night's Watch"],
 'tvSeries': ['Season 1',
              'Season 2',
              'Season 3',
              'Season

The example above shows the structure of a JSON file. And as you can see, a JSON file is very structured. 
JSON files are based on key-value pairs. With each key corresponding to a specific file. The keys can be used to navigate around the JSON file.
One important thing to note is that JSON files are extremely flexible. Almost any key can be used.
There is also a lot of flexibility in the values of the JSON.

The JSON example above is retrieved from a service about the world of Game of Thrones (spoilers). We retrieved information about Jon Snow, an important character in the books and show. The JSON file contains information about this character. Each key contains values with information about what is known. For example, you can see the following things in the JSON file.
- The name, which is of course Jon Snow.
- There is also information about the father, and the mother (both are empty in this case).
- We can find which seasons and which books the character is present.
- And, interestingly, a list of aliases for the character.

If you look closely in the JSON, you can find these points of information, as well as other information.

One thing to remember is that you can see a JSON as one long string. And using Python (and the json library) we can decode those strings and work with them. Also, we can encode Python objects as JSON strings. We can do these things with the json library. 

For those with experience in Python; a JSON structure is very similar to the structure of a Python dictionary.

## Section 2: The json library

Now that we had a look at the structure of a JSON file, we want to work with it! And we want to work with in Python!
For working with JSON files we can use the appropriately named json library.

With the json library we can do a lot of things regarding json files, including:
- Decode JSON files so that we can use them within Python.
- Encode Python objects so that we can store them as JSON files.

The json library is essential in working with JSON files within Python. As with (almost) all decent libraries, there is an extensive amount of documentation that can help you understand the functionalities of the library. 

While working with Python it is essential that you learn how to read documentation. This will help speed your work up, and improve your understanding of the library. So, have a look: https://docs.python.org/3/library/json.html. 

Let's see what we can do with the json library. We're going to have a look at four of the functionalities of the library:
- json.dumps
- json.dump
- json.loads
- json.load

Let's first create a Python dictionary that we can use as a basis for our examples. See below.

In [3]:
json_structure_example = {
    "name": "Roger Federer",
    "age": 40,
    "occupation": "Professional tennis player",
}

Using the method of 'json.dumps' we can convert the Python dictionary to a JSON string. In that way Python will see it as a string.

In [4]:
# Use json.dumps
json_string = json.dumps(json_structure_example)

print(json_string)
print(type(json_string))

{"name": "Roger Federer", "age": 40, "occupation": "Professional tennis player"}
<class 'str'>


We can also save our created dictionary as a JSON file. We can use the method of 'json.dump'.

In [5]:
# Use json.dump
file_name = "my_first_json.json"

with open(file_name, "w") as file:
    json.dump(json_structure_example, file, indent="\t\t\n\t\t")

So, we can use the 'json.dumps' to create a JSON string, and we can use the 'json.dump' to create a JSON string and save it as a JSON file.

Now let's look at reading JSON files. The 'json.loads' can read JSON strings and convert them to Python dictionaries.

In [6]:
print(json_string)
print(type(json_string))

converted_json_string = json.loads(json_string)
print(converted_json_string)
print(type(converted_json_string))

{"name": "Roger Federer", "age": 40, "occupation": "Professional tennis player"}
<class 'str'>
{'name': 'Roger Federer', 'age': 40, 'occupation': 'Professional tennis player'}
<class 'dict'>


And using the 'json.load' method, we can read JSON files and load them as Python dictionaries.

In [7]:
file_name = "my_first_json.json"

with open("my_first_json.json", "r") as file:
    loaded_json = json.load(file)

print(loaded_json)

{'name': 'Roger Federer', 'age': 40, 'occupation': 'Professional tennis player'}


Now that we seen the most important methods of the json library, it's your turn to try them out. Please complete the following assigments.

#### Assignment 1: The json library 1

Create your own Python dictionary, with your name, age and occupation.

In [8]:
my_dict: dict = {"name": "Chris", "age": 27, "occupation": "Software Engineer"}

#### Assignment 2: The json library 2

Convert your dictionary to a JSON string, and print it.
Use the 'json.dumps' method.

In [9]:
json.dumps(my_dict)

'{"name": "Chris", "age": 27, "occupation": "Software Engineer"}'

#### Assignment 3: The json library 3

Save your dictionary as a JSON file with the name "my_second_json.json".
Use the json.dump' method.

In [10]:
filename = "my_second_json.json"
with open(filename, "w") as file:
    json.dump(my_dict, file)

#### Assignment 4: The json library 4

Read your save JSON file and print it. It should have the name: "my_second_json.json".
Use the 'json.load' method.

In [11]:
with open(filename, "r") as file:
    pprint(json.load(file))

{'age': 27, 'name': 'Chris', 'occupation': 'Software Engineer'}


#### Assignment 5: The json library 5

Create a JSON string from your Python dictionary, and than convert it back to a Python dictionary.
First use the 'json.dumps' method, and then use the 'json.loads' method.

In [12]:
json.loads(json.dumps(my_dict))

{'name': 'Chris', 'age': 27, 'occupation': 'Software Engineer'}

Good job! These steps should give you some insight in how JSON files are structured, and how we can read, load and save them within Python.

## Section 3: Navigating a JSON file structure

We had a touch of the json library. We had our first of taste of saving a json file and loading a json file.
Let's move on to the most important part of JSON files. Navigating them.

The json library can convert a JSON file to a Python dictionary, and that's how can navigate it. This means that you can index JSON files by their keys.

Let's have a look at an example. 

In [13]:
### You can retrieve any key of a dictionary by indexing it on the dictionary.
json_structure_example = {
    "name": "Roger Federer",
    "age": 40,
    "occupation": "Professional tennis player",
}

print(json_structure_example["name"])

Roger Federer


In [14]:
for key in json_structure_example:
    print(key)
    print(json_structure_example[key])

name
Roger Federer
age
40
occupation
Professional tennis player


This is how you can access information in JSON files. Now it's your turn to try it. Let's use the example about Jon Snow.

In [15]:
URL = "https://anapioficeandfire.com/api/characters/583"
jon_snow_json = requests.get(URL).json()

pprint(jon_snow_json)

{'aliases': ['Lord Snow',
             "Ned Stark's Bastard",
             'The Snow of Winterfell',
             'The Crow-Come-Over',
             "The 998th Lord Commander of the Night's Watch",
             'The Bastard of Winterfell',
             'The Black Bastard of the Wall',
             'Lord Crow'],
 'allegiances': ['https://anapioficeandfire.com/api/houses/362'],
 'books': ['https://anapioficeandfire.com/api/books/5'],
 'born': 'In 283 AC',
 'culture': 'Northmen',
 'died': '',
 'father': '',
 'gender': 'Male',
 'mother': '',
 'name': 'Jon Snow',
 'playedBy': ['Kit Harington'],
 'povBooks': ['https://anapioficeandfire.com/api/books/1',
              'https://anapioficeandfire.com/api/books/2',
              'https://anapioficeandfire.com/api/books/3',
              'https://anapioficeandfire.com/api/books/8'],
 'spouse': '',
 'titles': ["Lord Commander of the Night's Watch"],
 'tvSeries': ['Season 1',
              'Season 2',
              'Season 3',
              'Season

In the following assignments you will try and navigate the JSON file.

#### Assignment 7: Navigating a JSON file structure 1

From the 'jon_snow_json', print all keys.
Use a for loop.

In [16]:
for key in jon_snow_json:
    print(key)

url
name
gender
culture
born
died
titles
aliases
father
mother
spouse
allegiances
books
povBooks
tvSeries
playedBy


#### Assignment 7: Navigating a JSON file structure 2

From the 'jon_snow_json', print all keys that have empty values.
Use a for loop.

In [17]:
for key, value in jon_snow_json.items():
    if value == "":
        print(key)

died
father
mother
spouse


#### Assignment 8: Navigating a JSON file structure 3

From the 'jon_snow_json', print every value that is part of a list.
Use a for loop.

In [18]:
for key, value in jon_snow_json.items():
    if isinstance(value, list):
        print(value)

["Lord Commander of the Night's Watch"]
['Lord Snow', "Ned Stark's Bastard", 'The Snow of Winterfell', 'The Crow-Come-Over', "The 998th Lord Commander of the Night's Watch", 'The Bastard of Winterfell', 'The Black Bastard of the Wall', 'Lord Crow']
['https://anapioficeandfire.com/api/houses/362']
['https://anapioficeandfire.com/api/books/5']
['https://anapioficeandfire.com/api/books/1', 'https://anapioficeandfire.com/api/books/2', 'https://anapioficeandfire.com/api/books/3', 'https://anapioficeandfire.com/api/books/8']
['Season 1', 'Season 2', 'Season 3', 'Season 4', 'Season 5', 'Season 6']
['Kit Harington']


JSON files can also be manipulated. You can add or adjust information according to your views. In the following assignments we will have a small taste of that.

#### Assignment 9: Navigating a JSON file structure 4

In the loaded json, add the following values for the following keys.
'mother': 'Lyanna Stark'
'father': 'Rhaeger Targaryen'

In [19]:
jon_snow_json["mother"] = "Lyanna Stark"
jon_snow_json["father"] = "Rhaeger Targaryen"

jon_snow_json

{'url': 'https://anapioficeandfire.com/api/characters/583',
 'name': 'Jon Snow',
 'gender': 'Male',
 'culture': 'Northmen',
 'born': 'In 283 AC',
 'died': '',
 'titles': ["Lord Commander of the Night's Watch"],
 'aliases': ['Lord Snow',
  "Ned Stark's Bastard",
  'The Snow of Winterfell',
  'The Crow-Come-Over',
  "The 998th Lord Commander of the Night's Watch",
  'The Bastard of Winterfell',
  'The Black Bastard of the Wall',
  'Lord Crow'],
 'father': 'Rhaeger Targaryen',
 'mother': 'Lyanna Stark',
 'spouse': '',
 'allegiances': ['https://anapioficeandfire.com/api/houses/362'],
 'books': ['https://anapioficeandfire.com/api/books/5'],
 'povBooks': ['https://anapioficeandfire.com/api/books/1',
  'https://anapioficeandfire.com/api/books/2',
  'https://anapioficeandfire.com/api/books/3',
  'https://anapioficeandfire.com/api/books/8'],
 'tvSeries': ['Season 1',
  'Season 2',
  'Season 3',
  'Season 4',
  'Season 5',
  'Season 6'],
 'playedBy': ['Kit Harington']}

#### Assignment 10: Navigating a JSON file structure 5

In the loaded json, add the following values to the 'tvSeries' key; 'Season 7' and 'Season 8'.

In [20]:
jon_snow_json["tvSeries"].extend(["Season 7", "Season 8"])
jon_snow_json

{'url': 'https://anapioficeandfire.com/api/characters/583',
 'name': 'Jon Snow',
 'gender': 'Male',
 'culture': 'Northmen',
 'born': 'In 283 AC',
 'died': '',
 'titles': ["Lord Commander of the Night's Watch"],
 'aliases': ['Lord Snow',
  "Ned Stark's Bastard",
  'The Snow of Winterfell',
  'The Crow-Come-Over',
  "The 998th Lord Commander of the Night's Watch",
  'The Bastard of Winterfell',
  'The Black Bastard of the Wall',
  'Lord Crow'],
 'father': 'Rhaeger Targaryen',
 'mother': 'Lyanna Stark',
 'spouse': '',
 'allegiances': ['https://anapioficeandfire.com/api/houses/362'],
 'books': ['https://anapioficeandfire.com/api/books/5'],
 'povBooks': ['https://anapioficeandfire.com/api/books/1',
  'https://anapioficeandfire.com/api/books/2',
  'https://anapioficeandfire.com/api/books/3',
  'https://anapioficeandfire.com/api/books/8'],
 'tvSeries': ['Season 1',
  'Season 2',
  'Season 3',
  'Season 4',
  'Season 5',
  'Season 6',
  'Season 7',
  'Season 8'],
 'playedBy': ['Kit Harington']

Great job! Keep it going! :)

## Section 4: JSON to information

Manipulating a json file through a dictionary is quite straightforward. But the complexity changes with larger json files and dictionaries. The most important skill to develop is retrieving and storing information from JSON files / dictionaries. 

JSON files are readable by computers and by humans. But sometimes you'll want to convert a JSON file to another structure, such as lists or pandas DataFrames. In this section we'll have a look at extracting information from JSON files.

First we'll download a larger JSON file. And then you'll get to work on some larger assignments.

In [21]:
URL = "https://anapioficeandfire.com/api/houses"
larger_json_file_1 = requests.get(URL, params={"region": "The North", "page": 5}).json()

# Print the first entry in the JSON file.
pprint(larger_json_file_1)

[{'ancestralWeapons': [],
  'cadetBranches': [],
  'coatOfArms': 'A bend orange on vairy grey and green',
  'currentLord': '',
  'diedOut': '',
  'founded': '',
  'founder': '',
  'heir': '',
  'name': 'House Moss',
  'overlord': 'https://anapioficeandfire.com/api/houses/34',
  'region': 'The North',
  'seats': [],
  'swornMembers': [],
  'titles': [],
  'url': 'https://anapioficeandfire.com/api/houses/273',
  'words': ''},
 {'ancestralWeapons': [],
  'cadetBranches': [],
  'coatOfArms': 'Or,six thistles slipped vert',
  'currentLord': 'https://anapioficeandfire.com/api/characters/205',
  'diedOut': '',
  'founded': '',
  'founder': '',
  'heir': '',
  'name': 'House Norrey',
  'overlord': 'https://anapioficeandfire.com/api/houses/362',
  'region': 'The North',
  'seats': [],
  'swornMembers': ['https://anapioficeandfire.com/api/characters/136',
                   'https://anapioficeandfire.com/api/characters/205',
                   'https://anapioficeandfire.com/api/characters/811'],

We retrieved another JSON file. As you can see, this is a larger file. This file contains information on some houses present in the world of Game of Thrones. There are 10 entries in total.

#### Assignment 11: JSON to information 1

Retrieve all values for the following keys; 'name', 'region', 'coatOfArms', 'words', and put them in separate lists. 
Use a for loop to fill the lists.

In [22]:
len(larger_json_file_1)

10

In [23]:
name_list = []
region_list = []
coat_of_arms_list = []
words_list = []


for house_dict in larger_json_file_1:
    name_list.append(house_dict["name"])
    region_list.append(house_dict["region"])
    coat_of_arms_list.append(house_dict["coatOfArms"])
    words_list.append(house_dict["words"])

pprint(name_list)
pprint(region_list)
pprint(coat_of_arms_list)
pprint(words_list)

['House Moss',
 'House Norrey',
 'House Overton',
 'House Poole',
 'House Ryder of the Rills',
 'House Ryswell of the Rills',
 'House Slate of Blackpool',
 'House Stane of Driftwood Hall',
 'House Stark of Winterfell',
 'House Stout of Goldgrass']
['The North',
 'The North',
 'The North',
 'The North',
 'The North',
 'The North',
 'The North',
 'The North',
 'The North',
 'The North']
['A bend orange on vairy grey and green',
 'Or,six thistles slipped vert',
 'Sable, a fess chequy argent and Or',
 'White, a roundel azure, a tressure cendrée',
 '',
 "Bronze, a horse's head sable orbed and maned gules within a bordure "
 'engrailed sable',
 'Pale grey, a double tressure white',
 'A driftwood tree, bare and brown, on pale green',
 'A running grey direwolf, on an ice-white field',
 'Chevronny tenné and Or']
['', '', '', '', '', '', '', '', 'Winter is Coming', '']


#### Assigment 12: JSON to information 2

Create a pandas dataframe with information for each house from the JSON file. The following pieces of information should be present in the columns: 'name', 'region', 'coatOfArms', 'words', 'seats', 'titles', 'currentLord'.
Use a for loop to fill the pandas dataframe.

In [24]:
df = pd.DataFrame(
    list(zip(name_list, region_list, coat_of_arms_list, words_list)),
    columns=["name", "region", "coat of arms", "words"],
)
df

Unnamed: 0,name,region,coat of arms,words
0,House Moss,The North,A bend orange on vairy grey and green,
1,House Norrey,The North,"Or,six thistles slipped vert",
2,House Overton,The North,"Sable, a fess chequy argent and Or",
3,House Poole,The North,"White, a roundel azure, a tressure cendrée",
4,House Ryder of the Rills,The North,,
5,House Ryswell of the Rills,The North,"Bronze, a horse's head sable orbed and maned g...",
6,House Slate of Blackpool,The North,"Pale grey, a double tressure white",
7,House Stane of Driftwood Hall,The North,"A driftwood tree, bare and brown, on pale green",
8,House Stark of Winterfell,The North,"A running grey direwolf, on an ice-white field",Winter is Coming
9,House Stout of Goldgrass,The North,Chevronny tenné and Or,


Now let's retrieve another JSON file. It's important to work with a few different files with a bit of variation. This will help you understand the structure of JSON files even better.

In [25]:
# Or...

df = pd.DataFrame(columns=["name", "region", "coat of arms", "words"])

for i, house in enumerate(larger_json_file_1):
    df.loc[i] = [house["name"], house["region"], house["coatOfArms"], house["words"]]

display(df)

Unnamed: 0,name,region,coat of arms,words
0,House Moss,The North,A bend orange on vairy grey and green,
1,House Norrey,The North,"Or,six thistles slipped vert",
2,House Overton,The North,"Sable, a fess chequy argent and Or",
3,House Poole,The North,"White, a roundel azure, a tressure cendrée",
4,House Ryder of the Rills,The North,,
5,House Ryswell of the Rills,The North,"Bronze, a horse's head sable orbed and maned g...",
6,House Slate of Blackpool,The North,"Pale grey, a double tressure white",
7,House Stane of Driftwood Hall,The North,"A driftwood tree, bare and brown, on pale green",
8,House Stark of Winterfell,The North,"A running grey direwolf, on an ice-white field",Winter is Coming
9,House Stout of Goldgrass,The North,Chevronny tenné and Or,


In [26]:
# Or...

df = pd.DataFrame(columns=["name", "region", "coat of arms", "words"])

for i in range(len(name_list)):
    df.loc[i] = [name_list[i], region_list[i], coat_of_arms_list[i], words_list[i]]

display(df)

Unnamed: 0,name,region,coat of arms,words
0,House Moss,The North,A bend orange on vairy grey and green,
1,House Norrey,The North,"Or,six thistles slipped vert",
2,House Overton,The North,"Sable, a fess chequy argent and Or",
3,House Poole,The North,"White, a roundel azure, a tressure cendrée",
4,House Ryder of the Rills,The North,,
5,House Ryswell of the Rills,The North,"Bronze, a horse's head sable orbed and maned g...",
6,House Slate of Blackpool,The North,"Pale grey, a double tressure white",
7,House Stane of Driftwood Hall,The North,"A driftwood tree, bare and brown, on pale green",
8,House Stark of Winterfell,The North,"A running grey direwolf, on an ice-white field",Winter is Coming
9,House Stout of Goldgrass,The North,Chevronny tenné and Or,


In [27]:
URL = "https://hp-api.onrender.com/api/characters"
potter_json = requests.get(URL).json()

# Print the first entry in the JSON file.
pprint(potter_json)

[{'actor': 'Daniel Radcliffe',
  'alive': True,
  'alternate_actors': [],
  'alternate_names': ['The Boy Who Lived',
                      'The Chosen One',
                      'Undesirable No. 1',
                      'Potty'],
  'ancestry': 'half-blood',
  'dateOfBirth': '31-07-1980',
  'eyeColour': 'green',
  'gender': 'male',
  'hairColour': 'black',
  'hogwartsStaff': False,
  'hogwartsStudent': True,
  'house': 'Gryffindor',
  'id': '9e3f7ce4-b9a7-4244-b709-dae5c1f1d4a8',
  'image': 'https://ik.imagekit.io/hpapi/harry.jpg',
  'name': 'Harry Potter',
  'patronus': 'stag',
  'species': 'human',
  'wand': {'core': 'phoenix tail feather', 'length': 11, 'wood': 'holly'},
  'wizard': True,
  'yearOfBirth': 1980},
 {'actor': 'Emma Watson',
  'alive': True,
  'alternate_actors': [],
  'alternate_names': ['Hermy', 'Know-it-all', 'Miss Grant', 'Herm-own-ninny'],
  'ancestry': 'muggleborn',
  'dateOfBirth': '19-09-1979',
  'eyeColour': 'brown',
  'gender': 'female',
  'hairColour': 'brow

We just retrieved an even larger JSON file on Harry Potter characters. Now it's your turn to retrieve information from this JSON file and convert it to a pandas DataFrame.

#### Assigment 13: JSON to information 3

Fill lists with information on each character from the JSON file. The following pieces of information should be present in the separate lists: 'name', 'actor', 'dateOfBirth', 'gender', 'eyeColour'.
Use a for loop to fill the lists.

In [28]:
name_list = []
actor_list = []
birthday_list = []
gender_list = []
eye_list = []

for person in potter_json:
    name_list.append(person["name"])
    actor_list.append(person["actor"])
    birthday_list.append(person["dateOfBirth"])
    gender_list.append(person["gender"])
    eye_list.append(person["eyeColour"])

pprint(name_list)
pprint(actor_list)
pprint(birthday_list)
pprint(gender_list)
pprint(eye_list)

['Harry Potter',
 'Hermione Granger',
 'Ron Weasley',
 'Draco Malfoy',
 'Minerva McGonagall',
 'Cedric Diggory',
 'Cho Chang',
 'Severus Snape',
 'Rubeus Hagrid',
 'Neville Longbottom',
 'Luna Lovegood',
 'Ginny Weasley',
 'Sirius Black',
 'Remus Lupin',
 'Arthur Weasley',
 'Bellatrix Lestrange',
 'Lord Voldemort',
 'Horace Slughorn',
 'Kingsley Shacklebolt',
 'Dolores Umbridge',
 'Lucius Malfoy',
 'Vincent Crabbe',
 'Mr Crabbe',
 'Gregory Goyle',
 'Mr Goyle',
 'Mrs Norris',
 'Argus Filch',
 'Vernon Dursley',
 'Petunia Dursley',
 'Dudley Dursley',
 'Boa constrictor',
 'Lily Potter',
 'James Potter',
 'Albus Dumbledore',
 'Fawkes',
 'Madam Pomfrey',
 'Mrs Figg',
 'Marge Dursley',
 'Ripper',
 'Yvonne',
 'Piers Polkiss',
 'Dennis',
 'Malcolm',
 'Gordon',
 'Miranda Gaushawk',
 'Bathilda Bagshot',
 'Adalbert Waffling',
 'Emeric Switch',
 'Phyllida Spore',
 'Arsenius Jigger',
 'Newt Scamander',
 'Quentin Trimble',
 'Tom',
 'Doris Crockford',
 'Quirinus Quirrel',
 'Griphook',
 'Madam Malkin',

In [29]:
df = pd.DataFrame(
    list(zip(name_list, actor_list, birthday_list, gender_list, eye_list)),
    columns=["name", "actor", "date of birth", "gender", "eye color"],
)
df

Unnamed: 0,name,actor,date of birth,gender,eye color
0,Harry Potter,Daniel Radcliffe,31-07-1980,male,green
1,Hermione Granger,Emma Watson,19-09-1979,female,brown
2,Ron Weasley,Rupert Grint,01-03-1980,male,blue
3,Draco Malfoy,Tom Felton,05-06-1980,male,grey
4,Minerva McGonagall,Dame Maggie Smith,04-10-1925,female,
...,...,...,...,...,...
432,Albus Severus Potter,Arthur Bowen,,male,green
433,Rose Weasley,Helena Barlow,,female,
434,Hugo Weasley,Ryan Turner,,male,
435,Scorpius Malfoy,Bertie Gilbert,,male,grey


#### Assigment 14: JSON to information 4

Create a pandas dataframe with the information for characters from the JSON file, but only for characters that are wizards. Use the 'wizard' key in order to determine whether a character is a wizard.

The following pieces of information should be present in the columns: 'name', 'house', 'wand', 'patronus'.
Use a for loop to fill the pandas dataframe.

In [30]:
# Or...

df = pd.DataFrame(
    columns=["name", "actor", "date of birth", "gender", "eye color"],
)
i = 0
for person in potter_json:
    if person["wizard"]:
        df.loc[i] = [
            person["name"],
            person["actor"],
            person["dateOfBirth"],
            person["gender"],
            person["eyeColour"],
        ]
        i += 1

display(df)

Unnamed: 0,name,actor,date of birth,gender,eye color
0,Harry Potter,Daniel Radcliffe,31-07-1980,male,green
1,Hermione Granger,Emma Watson,19-09-1979,female,brown
2,Ron Weasley,Rupert Grint,01-03-1980,male,blue
3,Draco Malfoy,Tom Felton,05-06-1980,male,grey
4,Minerva McGonagall,Dame Maggie Smith,04-10-1925,female,
...,...,...,...,...,...
361,Albus Severus Potter,Arthur Bowen,,male,green
362,Rose Weasley,Helena Barlow,,female,
363,Hugo Weasley,Ryan Turner,,male,
364,Scorpius Malfoy,Bertie Gilbert,,male,grey


#### Assigment 15: JSON to information 5

Create a pandas dataframe with information on the characters and their wands in the JSON file, but only for characters that have wands. Have a closer look at the 'wand' key.

The following pieces of information should be present in the columns: 'name', 'core', 'length', 'wood'.
Use a for loop to fill the pandas dataframe.

In [31]:
potter_json[0].get("wand")

{'wood': 'holly', 'core': 'phoenix tail feather', 'length': 11}

In [32]:
df = pd.DataFrame(columns=["name", "core", "length", "wood"])
i = 0

for person in potter_json:
    wand = person.get("wand")
    # Core is always available, but let's check all attributes
    if any([wand.get("core"), wand.get("length"), wand.get("wood")]):
        df.loc[i] = [
            person.get("name"),
            wand.get("core"),
            wand.get("length"),
            wand.get("wood"),
        ]
        i += 1
df

  df.loc[i] = [


Unnamed: 0,name,core,length,wood
0,Harry Potter,phoenix tail feather,11.0,holly
1,Hermione Granger,dragon heartstring,10.75,vine
2,Ron Weasley,unicorn tail hair,14.0,willow
3,Draco Malfoy,unicorn tail hair,10.0,hawthorn
4,Minerva McGonagall,dragon heartstring,9.5,fir
5,Cedric Diggory,unicorn hair,12.25,ash
6,Rubeus Hagrid,,16.0,oak
7,Neville Longbottom,unicorn tail hair,13.0,cherry
8,Ginny Weasley,,,yew
9,Remus Lupin,unicorn tail hair,10.25,cypress


#### Assigment 16: JSON to information 6

This is smaller, easier assignment. Use the powerfull pandas library in order to read the Potter dictionary, and convert it at once to a pandas DataFrame.
Use the pandas.from_dict (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_dict.html) method.

In [33]:
df = pd.DataFrame.from_dict(potter_json)
df

Unnamed: 0,id,name,alternate_names,species,gender,house,dateOfBirth,yearOfBirth,wizard,ancestry,eyeColour,hairColour,wand,patronus,hogwartsStudent,hogwartsStaff,actor,alternate_actors,alive,image
0,9e3f7ce4-b9a7-4244-b709-dae5c1f1d4a8,Harry Potter,"[The Boy Who Lived, The Chosen One, Undesirabl...",human,male,Gryffindor,31-07-1980,1980.0,True,half-blood,green,black,"{'wood': 'holly', 'core': 'phoenix tail feathe...",stag,True,False,Daniel Radcliffe,[],True,https://ik.imagekit.io/hpapi/harry.jpg
1,4c7e6819-a91a-45b2-a454-f931e4a7cce3,Hermione Granger,"[Hermy, Know-it-all, Miss Grant, Herm-own-ninny]",human,female,Gryffindor,19-09-1979,1979.0,True,muggleborn,brown,brown,"{'wood': 'vine', 'core': 'dragon heartstring',...",otter,True,False,Emma Watson,[],True,https://ik.imagekit.io/hpapi/hermione.jpeg
2,c3b1f9a5-b87b-48bf-b00d-95b093ea6390,Ron Weasley,"[Dragomir Despard, Ronald, Ickle Ronniekins, R...",human,male,Gryffindor,01-03-1980,1980.0,True,pure-blood,blue,red,"{'wood': 'willow', 'core': 'unicorn tail hair'...",Jack Russell terrier,True,False,Rupert Grint,[],True,https://ik.imagekit.io/hpapi/ron.jpg
3,af95bd8a-dfae-45bb-bc69-533860d34129,Draco Malfoy,[],human,male,Slytherin,05-06-1980,1980.0,True,pure-blood,grey,blonde,"{'wood': 'hawthorn', 'core': 'unicorn tail hai...",,True,False,Tom Felton,[],True,https://ik.imagekit.io/hpapi/draco.jpg
4,ca3827f0-375a-4891-aaa5-f5e8a5bad225,Minerva McGonagall,[],human,female,Gryffindor,04-10-1925,1925.0,True,half-blood,,black,"{'wood': 'fir', 'core': 'dragon heartstring', ...",tabby cat,False,True,Dame Maggie Smith,[],True,https://ik.imagekit.io/hpapi/mcgonagall.jpg
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
432,dcdc063e-cf3e-48fc-b777-65922e899b38,Albus Severus Potter,[Al],human,male,Slytherin,,,True,half-blood,green,black,"{'wood': '', 'core': '', 'length': None}",,True,False,Arthur Bowen,[],True,
433,e4653b01-76a5-4769-a6a2-1f2efaf89cbb,Rose Weasley,[],human,female,Gryffindor,,,True,half-blood,,red,"{'wood': '', 'core': '', 'length': None}",,True,False,Helena Barlow,[],True,
434,2b203c7e-7b3d-4f27-8b3c-11473904da73,Hugo Weasley,[],human,male,,,,True,half-blood,,brown,"{'wood': '', 'core': '', 'length': None}",,True,False,Ryan Turner,[],True,
435,7f2f6207-8998-4f98-92c2-8d02898a82eb,Scorpius Malfoy,[Scorpius Hyperion Malfoy],human,male,Slytherin,,,True,pure-blood,grey,blond,"{'wood': '', 'core': '', 'length': None}",,True,False,Bertie Gilbert,[],True,


## Section 5: Nested JSON files

Up until now the JSON files were quite clear and structured. But sometimes the structure of JSON files can become more and more complex when objects are nested within other objects. For those familiar with Python, you can see it as dictionaries within dictionaries. Depending on the size it can be quite difficult to work with, but with enough knowledge of the structure you should be able to quickly work with it.

Let's have a look at an example.

In [34]:
# Nested JSON example.
nested_json = {
    "species": "Tiger",
    "binomial_name": "Panthera tigris",
    "status": "Endangered",
    "subspecies": [
        {
            "name": "Amur tiger",
            "binomial_name": "Panthera tigris altaica",
            "status": "Endangered",
        },
        {
            "name": "Bengal tiger",
            "binomial_name": "Panthera tigris tigris",
            "status": "Endangered",
        },
        {
            "name": "South China tiger",
            "binomial_name": "Panthera tigris amoyensis",
            "status": "Critically Endangered",
        },
        {
            "name": "Malayan tiger",
            "binomial_name": "Panthera tigris jacksoni",
            "status": "",
        },
        {
            "name": "Indo-Chinese tiger",
            "binomial_name": "Panthera tigris corbetti",
            "status": "",
        },
        {
            "name": "Sumatran tiger",
            "binomial_name": "Panthera tigris sumatrae",
            "status": "Critically endangered",
        },
        {
            "name": "Bali tiger",
            "binomial_name": "Panthera tigris balica",
            "status": "Extinct",
        },
        {
            "name": "Javan tiger",
            "binomial_name": "Panthera tigris sondaica",
            "status": "Extinct",
        },
        {
            "name": "Caspian tiger",
            "binomial_name": "Panthera tigris virgata",
            "status": "Extinct",
        },
    ],
}

print(json.dumps(nested_json, indent=4))

{
    "species": "Tiger",
    "binomial_name": "Panthera tigris",
    "status": "Endangered",
    "subspecies": [
        {
            "name": "Amur tiger",
            "binomial_name": "Panthera tigris altaica",
            "status": "Endangered"
        },
        {
            "name": "Bengal tiger",
            "binomial_name": "Panthera tigris tigris",
            "status": "Endangered"
        },
        {
            "name": "South China tiger",
            "binomial_name": "Panthera tigris amoyensis",
            "status": "Critically Endangered"
        },
        {
            "name": "Malayan tiger",
            "binomial_name": "Panthera tigris jacksoni",
            "status": ""
        },
        {
            "name": "Indo-Chinese tiger",
            "binomial_name": "Panthera tigris corbetti",
            "status": ""
        },
        {
            "name": "Sumatran tiger",
            "binomial_name": "Panthera tigris sumatrae",
            "status": "Critically end

As you can see, the structure remains visible. The most important aspect to note is that nested JSON objects can have keys with the same as other levels within the entire JSON object. See below for an example illustrating keys with double names. 

In [35]:
# EXAMPLE
print(nested_json["binomial_name"])
print(nested_json["subspecies"][0]["binomial_name"])

Panthera tigris
Panthera tigris altaica


The example above illustrates that a nested JSON is too different from a simpler JSON. It only requires more attention to the structure, and more attention to the navigation within the JSON. Let's try it out. Below are some assignments. First we'll retrieve a nested JSON. Below you'll see a large nested JSON about Pikachu.

In [36]:
URL = "https://pokeapi.co/api/v2/pokemon/pikachu"
nested_json = requests.get(URL).json()

# Print the first entry in the JSON file.
pprint(nested_json)

{'abilities': [{'ability': {'name': 'static',
                            'url': 'https://pokeapi.co/api/v2/ability/9/'},
                'is_hidden': False,
                'slot': 1},
               {'ability': {'name': 'lightning-rod',
                            'url': 'https://pokeapi.co/api/v2/ability/31/'},
                'is_hidden': True,
                'slot': 3}],
 'base_experience': 112,
 'cries': {'latest': 'https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/latest/25.ogg',
           'legacy': 'https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/legacy/25.ogg'},
 'forms': [{'name': 'pikachu',
            'url': 'https://pokeapi.co/api/v2/pokemon-form/25/'}],
 'game_indices': [{'game_index': 84,
                   'version': {'name': 'red',
                               'url': 'https://pokeapi.co/api/v2/version/1/'}},
                  {'game_index': 84,
                   'version': {'name': 'blue',
                               'url': '

#### Assigment 17: Nested JSON files 1

Create a pandas DataFrame with each move Pikachu can perform from the game/version_group gold-silver.

Information that needs to be present in the DataFrame: 'name', 'level_learned_at', 'move_learn_method'.

In [37]:
df = pd.DataFrame(columns=["name", "level_learned_at", "move_learn_method"])

moves = nested_json.get("moves")
i = 0

for move in moves:
    for version_group_detail in move.get("version_group_details"):
        version_group_name = version_group_detail.get("version_group").get("name")
        if version_group_name == "gold-silver":
            # print(
            #     f"{move.get('move').get('name')} learned at level {version_group_detail.get('level_learned_at')} by {version_group_detail.get('move_learn_method').get('name')}"
            # )
            df.loc[i] = [
                move.get("move").get("name"),
                version_group_detail.get("level_learned_at"),
                version_group_detail.get("move_learn_method").get("name"),
            ]
            i += 1

df

Unnamed: 0,name,level_learned_at,move_learn_method
0,thunder-punch,0,machine
1,slam,20,level-up
2,headbutt,0,machine
3,tail-whip,6,level-up
4,growl,1,level-up
5,strength,0,machine
6,thunder-shock,1,level-up
7,thunderbolt,26,level-up
8,thunder-wave,8,level-up
9,thunder,41,level-up


#### Assigment 18: Nested JSON files 2

Create your own dictionary with each game/version-group as a key. As value, have a counter for the number of occurences for each game/version-group within the 'moves' section. Lastly, save the dictionary as a JSON file.

In [39]:
version_dict = {}

for move in nested_json.get("moves"):
    for version in move.get("version_group_details"):
        version_name = version.get("version_group").get("name")
        if version_name not in version_dict:
            version_dict[version_name] = 1
        else:
            version_dict[version_name] += 1

pprint(version_dict)

# Use json.dump
file_name = "versions.json"

with open(file_name, "w") as file:
    json.dump(version_dict, file, indent=4)

{'black-2-white-2': 51,
 'black-white': 42,
 'brilliant-diamond-and-shining-pearl': 50,
 'colosseum': 33,
 'crystal': 39,
 'diamond-pearl': 45,
 'emerald': 52,
 'firered-leafgreen': 42,
 'gold-silver': 38,
 'heartgold-soulsilver': 55,
 'legends-arceus': 8,
 'lets-go-pikachu-lets-go-eevee': 31,
 'omega-ruby-alpha-sapphire': 60,
 'platinum': 54,
 'red-blue': 30,
 'ruby-sapphire': 33,
 'scarlet-violet': 67,
 'sun-moon': 45,
 'sword-shield': 63,
 'ultra-sun-ultra-moon': 57,
 'x-y': 46,
 'xd': 40,
 'yellow': 34}
