In [None]:
%%HTML
<style>
div.heading{
    padding: 0 10%;
    text-align:center;
    }

p.text{
    text-align:center;
    padding: 0 10%;

}
</style>

# <p class="text">Python for Automation - Lesson 7</p> 

<div class="heading">
    <ul style="list-style-type:none">
        <li><b>Lesson 7 Structure:</b></li>
        <li>Parsing JSON</li>
    </ul>
</div>

## <p class="text">Parsing JSON</p>

<p class="text">JavaScript Object Notation (JSON) is used for parsing data interchangeably across multiple or different systems such as transferring data from a backend server to the frontend of an application or vice-versa.
    
Data in JSON are stored in curly braces {}, as key and value pairs, wrapped in double-quote ("") and separated by a colon (:) while a comma (,) is used to indicate the end of a particular key and value pair. This makes it easy to read and write by humans.</p> 

### <p class="text">Example JSON</p>

In [None]:
{
  "squadName": "Super hero squad",
  "homeTown": "Metro City",
  "formed": 2016,
  "secretBase": "Super tower",
  "active": True,
  "members": [
    {
      "name": "Molecule Man",
      "age": 29,
      "secretIdentity": "Dan Jukes",
      "powers": ["Radiation resistance", "Turning tiny", "Radiation blast"]
    },
    {
      "name": "Madame Uppercut",
      "age": 39,
      "secretIdentity": "Jane Wilson",
      "powers": [
        "Million tonne punch",
        "Damage resistance",
        "Superhuman reflexes"
      ]
    },
    {
      "name": "Eternal Flame",
      "age": 1000000,
      "secretIdentity": "Unknown",
      "powers": [
        "Immortality",
        "Heat Immunity",
        "Inferno",
        "Teleportation",
        "Interdimensional travel"
      ]
    }
  ]
}

<p class="text">The process of encoding JSON is usually called <b>serialization</b>. This term refers to the transformation of data into a series of bytes (hence serial) to be stored or transmitted across a network. Naturally, <b>deserialization</b> is the reciprocal process of decoding data that has been stored or delivered in the JSON standard. Lucky for us, Python supports JSON natively - we can use the <code>json</code> module in the Python's Standard Library</p>

## <p class="text">Serialize JSON - Dump/s</p>
<p class="text">What happens after a computer processes lots of information? It needs to take a data dump. Accordingly, the json library exposes the dump() method for writing data to files. There is also a dumps() method (pronounced as “dump-s”) for writing to a Python string. Simple Python objects are translated to JSON according to a fairly intuitive conversion:</p>

| Python      | JSON |
| ----------- | ----------- |
| dict      | object       |
| list, tuple   | array        |
| str   | string        |
| int, long, float   | number        |
| True   | true        |
| False   | false        |
| None   | null        |

In [None]:
example_json_as_dict = {
  "squadName": "Super hero squad",
  "homeTown": "Metro City",
  "formed": 2016,
  "secretBase": "Super tower",
  "active": True,
  "members": [
    {
      "name": "Molecule Man",
      "age": 29,
      "secretIdentity": "Dan Jukes",
      "powers": ["Radiation resistance", "Turning tiny", "Radiation blast"]
    },
    {
      "name": "Madame Uppercut",
      "age": 39,
      "secretIdentity": "Jane Wilson",
      "powers": [
        "Million tonne punch",
        "Damage resistance",
        "Superhuman reflexes"
      ]
    },
    {
    "name": "Eternal Flame",
      "age": 1000000,
      "secretIdentity": "Unknown",
"powers": [
        "Immortality",
        "Heat Immunity",
"Inferno",
        "Teleportation",
        "Interdimensional travel"
      ]
    }
  ]
}

In [None]:
# json.dump is used, when you want to write a object to a file in the JSON notation
import json

with open("data_file.json", "w") as write_file:
    json.dump(example_json_as_dict, write_file, indent=4)

<p class="text">Note that <code>dump()</code> takes two positional arguments: (1) the data object to be serialized, and (2) the file-like object to which the bytes will be written.</p>

In [None]:
# json.dumps on the other hand is used, if you want to convert your object to a standard Python string (dump[s]tring)

json_string = json.dumps(example_json_as_dict)
print(json_string)
print(type(json_string))

## <p class="text">Deserializing JSON - Load/s</p>
<p class="text">This is the reverse operation of serialization - we take a string (.json formatted file) and convert it again to a standard Python type - dictionary. This is how the deserialization process maps JSON structures to Python ones:</p>

| Python      | JSON |
| ----------- | ----------- |
| object      | dict       |
| array   | list        |
| string   | str        |
| number (int)   | int        |
| number (real)   | float        |
| true   | True        |
| false   | null        |
| null   | null        |

In [None]:
# Create a Python dict from a from a JSON file string

with open("data_file.json", "r") as file:
    json_file = file.read()
    json_loads = json.loads(json_file)

In [None]:
print(json_file) # Print raw .json file (string)

In [None]:
print(json_loads) # Print dictionary converted from JSON string
print(type(json_loads))

In [None]:
# Create a Python dictionary from a JSON file

with open("data_file.json", "r") as file:
    json_load = json.load(file)
print(json_load) # Print dictionary created straight from file (no read beforehand)
print(type(json_load))

## <p class="text">Working with JSON - frankly, same as dicts</p>
<p class="text">Working with a dictionary converted from a .json file is really easy - you just treat it like a standard dictionary - you can loop, find and access elements.</p>

In [None]:
# Access squad name
print(f"Squad name is: {json_load['squadName']}")

## <p class="text">Real life example</p>
<p class="text">There is a free rest service named SWAPI (Star Wars API), from which you can test and access working with JSON - it's completely free.</p>

[Star Wars API Link](https://swapi.dev/)

In [None]:
# We import requests - the standard Python Module used to access web API's
import requests

## Available topics in SWAPI
# root
# people
# films
# starships
# vehicles
# species
# planets

# We create a fetch function
def fetch_data_from_swapi(topic:str, page:int=1) -> dict:
    """
    A simple function to retrieve data from SWAPI
    : param topic: What kind of data do we want retrieved 
    : return: Result in dictionary format
    """
    # We input a base url - it's a good practice as it's static - only the dynamic parts change
    base_url = "https://swapi.dev/api/"

    # This API provides useful statistics, like the total amount of characters available
    response = requests.get(f"{base_url}{topic}?page={page}")
    results = response.json()
    return results


In [None]:
# This service always returns the first page if no page is added to the URL
people = fetch_data_from_swapi('people')

In [None]:
# Let's iterate over all pages containing people and get the complete list, by checking if there is a next page
star_wars_characters = {}
page = 1
incremental_id = 1
while True:
    people = fetch_data_from_swapi('people', page)
    for person in people['results']:
        star_wars_characters[incremental_id] = person
        incremental_id += 1
    print(f'Currently available characters: {len(star_wars_characters)}')
    if people['next']:
        page += 1
    else:
        break

In [None]:
# Now we have a complete list. For the sake of not executing this multiple times and loading the API, I will save it to a file
with open("star_wars_characters.json", "w") as write_file:
    json.dump(star_wars_characters, write_file, indent=4)

In [None]:
# Now for example, we can take only the character names
star_wars_character_names = [person['name'] for person in star_wars_characters.values()]

In [None]:
print(star_wars_character_names)

In [None]:
# Or we can see which movies featured all characters and on the flip side - which 
# characters appeared in most movies

# Get all movie names and id's
films = {episode['episode_id']:{"title": episode['title'], 'number_of_characters': len(episode['characters'])} for episode in fetch_data_from_swapi('films')['results']}

In [None]:
films

In [None]:
films_sorted = {key: value for key, value in sorted(films.items(), key=lambda x: x[1]['number_of_characters'], reverse=True)}

In [None]:
films_sorted

In [None]:
sw_character_appearances = {}
for character in star_wars_characters.values():
    movie_names = [films_sorted[int(movie[-2])]['title'] for movie in character['films']]
    sw_character_appearances[character['name']] = {'total_movies': len(character['films']), 'movie_names': movie_names}

In [None]:
list(sw_character_appearances.items())[:5]

In [None]:
sw_character_appearances_sorted = {key: value for key, value in sorted(sw_character_appearances.items(), key=lambda x: x[1]['total_movies'], reverse=True)}

In [None]:
sw_character_appearances_sorted

In [None]:
# Import libraries to visualize results
import pandas 
import plotly.express as px

In [None]:
list(films_sorted.items())[:5]

In [None]:
# Visualize movies dictionary

films_df = pandas.DataFrame.from_dict([movie for movie in films_sorted.values()])

In [None]:
films_df

In [None]:
# Visualize movie dataframe
fig = px.bar(films_df, x='title', y='number_of_characters', title='Appearing Characters Per Movie', labels={
                     "number_of_characters": "Number of Appearing Characters"})
fig.show()

In [None]:
list(sw_character_appearances_sorted.items())[:3]

In [None]:
# Visualize character appearances dictionary
characters_list = [[name, *attributes.values()] for name, attributes in sw_character_appearances_sorted.items()]
characters_df = pandas.DataFrame(characters_list, columns=['name', 'Number Of Appearances', 'Movie Name'])
characters_df = characters_df.explode('Movie Name', ignore_index=False)

In [None]:
characters_df

In [None]:
# Visualize character dataframe
fig = px.bar(characters_df, x='name', y='Number Of Appearances', color='Movie Name', height=600,width=2000, 
             category_orders={"Movie Name": ['The Phantom Menace',
                                             'Attack of the Clones',
                                             'Revenge of the Sith',
                                             'A New Hope',
                                             'The Empire Strikes Back',
                                             'Return of the Jedi']})
fig.update_layout(barmode='stack', xaxis={'categoryorder': 'total descending'})
fig.show()

# <p class="text">Thank you for your time!</p>