## About this Notebook

This notebook will take us through:
* working with JSON files
* representing JSON files in different formats

## JSON objects and files

JSON can be used to write certain kinds of Python data structures as a file. To do this natively in Python, we can use the `json` module. To convert these into a JSON object, we can use the `dumps()` or `dump()` methods.

In [47]:
import json

myList = ['1', '2', '3']
myDict = {'Woodbine':57, 'Whitehorn':123}


# open a file for reading, and write both myList and myDict to the file. Then, close the file.


print(json.dumps(myList))
print(json.dumps(myDict))

# now open a new file for writing and use the dump() method to write this object to the file
json_string = json.dumps(myDict)
with open('data1.json', 'w') as file:
    file.write(json_string)

with open('data1.json', 'w') as file:
    json.dump(myDict, file)
    
with open('data1.json', 'r') as file:
    json_string = file.read()

my_data = json.loads(json_string)
my_data

["1", "2", "3"]
{"Woodbine": 57, "Whitehorn": 123}


{'Woodbine': 57, 'Whitehorn': 123}

Now let's work with a pre-existing dataset. We will use results from the [Government of Canada's Algorithmic Impact Assessment for the ATIP Online Request Service](https://open.canada.ca/data/dataset/cea9985f-5e0f-425e-9b7e-e1d122272c56/resource/5678a163-bfaa-4006-b655-75c5fe421d58/download/atip-digital-services-aia.json). 

In [48]:
import json

with open("atip-digital-services-aia.json") as my_json:
    %time aitp_info = json.load(my_json)
    print (aitp_info)

CPU times: user 555 µs, sys: 1.59 ms, total: 2.15 ms
Wall time: 1.85 ms
{'version': 'v0.8', 'currentPage': 12, 'data': {'projectDetailsRespondent': 'W Herbert', 'projectDetailsJob': 'Senior Analyst', 'projectDetailsDepartment-NS': '056', 'projectDetailsBranch': 'CIOB', 'projectDetailsTitle': 'ATIP Digital Services', 'projectDetailsPhase': 'item2', 'projectDetailsDescription': 'Simple central website for Canadians to submit ATIP requests', 'businessDrivers1': ['item2', 'item5'], 'riskProfile1': 'item1-3', 'riskProfile2': 'item2-0', 'riskProfile3': 'item2-0', 'riskProfile4': 'item2-0', 'projectAuthority1': 'item1-2', 'aboutSystem1': ['item2', 'item4'], 'aboutAlgorithm1': 'item1-3', 'aboutAlgorithm2': 'item1-3', 'impact1': 'item1-1', 'impact2': 'item1-3', 'impact3': 'item2-0', 'impact5': 'item1-4', 'impact6': 'item1-1', 'impact7': 'item1-1', 'impact8': 'a misdirected request would immediately be redirected to the appropriate GoC institution', 'impact9': 'item1-1', 'impact10': 'does not pr

What are the equivalent methods you would use in pandas to work with JSON objects? Give them a try in the cell below. 

In [50]:
import pandas as pd

%time json_data = pd.read_json("atip-digital-services-aia.json")

json_data

CPU times: user 7.31 ms, sys: 1.76 ms, total: 9.08 ms
Wall time: 8.17 ms


Unnamed: 0,version,currentPage,data
aboutAlgorithm1,v0.8,12,item1-3
aboutAlgorithm2,v0.8,12,item1-3
aboutDataSource1,v0.8,12,item2-0
aboutDataSource2,v0.8,12,item1-0
aboutDataSource3,v0.8,12,item2-1
...,...,...,...
projectDetailsTitle,v0.8,12,ATIP Digital Services
riskProfile1,v0.8,12,item1-3
riskProfile2,v0.8,12,item2-0
riskProfile3,v0.8,12,item2-0


What happens when you use a JSON file that has a different structure? Try downloading some data from the JSON Generator (https://www.json-generator.com/), and trying to work with this in Python.

In [6]:
#with open("generated_data.json") as gen_json:
#   generated_info = json.load(gen_json)
#    print (generated_info)


generated_data = pd.read_json("generated_data.json")
generated_data

Unnamed: 0,_id,index,guid,isActive,balance,picture,age,eyeColor,name,gender,...,phone,address,about,registered,latitude,longitude,tags,friends,greeting,favoriteFruit
0,6179815d6e8e5559873076e3,0,3a509b67-885d-4bce-b01d-e96aefee0bf7,True,"$1,157.12",http://placehold.it/32x32,35,brown,Susanna Reynolds,female,...,+1 (811) 555-3654,"482 Danforth Street, Albany, Connecticut, 3698",Do duis dolor ex exercitation esse velit do ex...,2020-09-18T02:20:58 +06:00,3.847462,-3.795569,"[cupidatat, proident, aliqua, irure, ut, id, e...","[{'id': 0, 'name': 'Edna Herman'}, {'id': 1, '...","Hello, Susanna Reynolds! You have 3 unread mes...",strawberry
1,6179815d89fc077e9ced7558,1,5307801a-f651-436c-9dfe-b11ce838e736,True,"$3,127.70",http://placehold.it/32x32,39,brown,Chen West,male,...,+1 (978) 488-2004,"913 Lorimer Street, Grandview, Hawaii, 8756",Duis enim ex labore esse do. In reprehenderit ...,2016-12-26T08:15:32 +07:00,38.252197,120.615941,"[nostrud, dolor, nostrud, ad, voluptate, exerc...","[{'id': 0, 'name': 'Smith Hall'}, {'id': 1, 'n...","Hello, Chen West! You have 10 unread messages.",strawberry
2,6179815eb522c464e1930ef1,2,014cce4d-906d-42af-bd16-8bb1edcccdd7,True,"$2,215.78",http://placehold.it/32x32,36,brown,Nolan Smith,male,...,+1 (814) 491-3162,"337 Evans Street, Canoochee, Puerto Rico, 2864",Duis eiusmod officia non ex amet. Laboris et d...,2015-03-21T11:58:12 +06:00,22.379612,-147.970211,"[tempor, aliqua, irure, velit, officia, sunt, ...","[{'id': 0, 'name': 'Knight Ward'}, {'id': 1, '...","Hello, Nolan Smith! You have 9 unread messages.",strawberry
3,6179815ec1d2142de641395f,3,577577a5-fa80-465a-b549-23e805f8e193,False,"$3,758.68",http://placehold.it/32x32,33,brown,Morris Sanford,male,...,+1 (960) 427-3102,"442 Liberty Avenue, Logan, Illinois, 8643",Non dolor ullamco consectetur qui magna ipsum ...,2018-05-22T06:07:01 +06:00,79.649244,10.84846,"[magna, Lorem, ut, consequat, in, magna, eu]","[{'id': 0, 'name': 'Stephenson Charles'}, {'id...","Hello, Morris Sanford! You have 2 unread messa...",apple
4,6179815e5af1a10268c58f99,4,fe19407d-2e66-4909-9c6e-2327f343ce20,True,"$2,454.98",http://placehold.it/32x32,34,green,Jan Rivers,female,...,+1 (990) 478-3221,"966 Chase Court, Hartsville/Hartley, Palau, 139",Non aliquip ad magna anim aliquip consequat. C...,2020-10-25T03:39:18 +06:00,21.108885,22.746014,"[nulla, Lorem, anim, aliquip, ex, veniam, magna]","[{'id': 0, 'name': 'Katy Wright'}, {'id': 1, '...","Hello, Jan Rivers! You have 9 unread messages.",strawberry
5,6179815e2aa35c5cc8c6cb59,5,4909fb18-6819-492d-9a09-a124415bea4b,False,"$1,723.20",http://placehold.it/32x32,25,green,Terry Hines,female,...,+1 (844) 579-2960,"549 Lawrence Avenue, Eden, Wyoming, 5578",Ipsum et eiusmod pariatur pariatur ipsum conse...,2018-07-16T03:44:19 +06:00,-55.20682,-15.667406,"[adipisicing, laborum, sit, incididunt, volupt...","[{'id': 0, 'name': 'Adams Jensen'}, {'id': 1, ...","Hello, Terry Hines! You have 1 unread messages.",strawberry


Since each person might have a different number of friends, instead we want to create a table where one can find a person's friends.

In [12]:
# Initialize an empty list to store the extracted data
data_list = []

# Iterate through the data and append each record to the list
for row in generated_friends:
    data_list.extend(row)

# Create a DataFrame from the list of dictionaries
new_df = pd.DataFrame(data_list)`

# Display the resulting DataFrame
print(new_df)

    id                name
0    0         Edna Herman
1    1       Farrell Haley
2    2        Joanne House
3    0          Smith Hall
4    1   Brooks Cunningham
5    2       Jewel Chapman
6    0         Knight Ward
7    1      Herminia Noble
8    2     Watkins Langley
9    0  Stephenson Charles
10   1    Christina French
11   2      Doreen Gilliam
12   0         Katy Wright
13   1      Alvarado Banks
14   2      Melanie Lowery
15   0        Adams Jensen
16   1        Nielsen Kidd
17   2      Gillespie Sims


Is there still a potential issue with this table?

The pandas library has methods to `read_json()` and write `to_json()`. Is this any different from reading or writing to and from a csv? If there are differences, when do you think those would happen?

## Representing JSON Files as CSV

How could we represent this data in a csv/table?

In [27]:
json_data = {
    "name": "John",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "Exampleville"
    },
    "emails": ["john@example.com", "john@gmail.com"]
}



Here is a slightly more challenging example:

In [8]:
import csv

json_data = {
    "theaters": [
        {
            "theater_id": 1,
            "theater_name": "Cineplex Odeon Westhills",
            "location": "Toronto, Canada",
            "showtimes": [
                {
                    "showtime_id": 1,
                    "movie": {
                        "movie_id": 1,
                        "title": "The Matrix",
                        "release_year": 1999,
                        "rating": 8.7
                    },
                    "showtime": "2023-10-14 18:30:00",
                    "bookings": [
                        {
                            "booking_id": 1,
                            "seat_number": "A-1",
                            "customer_name": "John Doe"
                        }
                    ]
                },
                {
                    "showtime_id": 2,
                    "movie": {
                        "movie_id": 2,
                        "title": "Inception",
                        "release_year": 2010,
                        "rating": 8.8
                    },
                    "showtime": "2023-10-14 21:00:00",
                    "bookings": [
                        {
                            "booking_id": 2,
                            "seat_number": "B-5",
                            "customer_name": "Jane Smith"
                        }
                    ]
                }
            ]
        },
        {
            "theater_id": 2,
            "theater_name": "AMC Empire 25",
            "location": "Vancouver, Canada",
            "showtimes": [
                {
                    "showtime_id": 3,
                    "movie": {
                        "movie_id": 3,
                        "title": "The Shawshank Redemption",
                        "release_year": 1994,
                        "rating": 9.3
                    },
                    "showtime": "2023-10-15 14:00:00",
                    "bookings": [
                        {
                            "booking_id": 3,
                            "seat_number": "C-3",
                            "customer_name": "Mike Johnson"
                        }
                    ]
                },
                {
                    "showtime_id": 4,
                    "movie": {
                        "movie_id": 4,
                        "title": "Pulp Fiction",
                        "release_year": 1994,
                        "rating": 8.9
                    },
                    "showtime": "2023-10-15 16:30:00",
                    "bookings": [
                        {
                            "booking_id": 4,
                            "seat_number": "D-2",
                            "customer_name": "Emily Wilson"
                        }
                    ]
                }
            ]
        }
    ]
}




In [10]:
movies = []
for theater in json_data['theaters']:
    for showtime in theater['showtimes']:
        movie = showtime['movie']
        movies.append([movie['movie_id'],movie['title'],movie['release_year'],movie['rating']])

stack = [json_data]
while stack:
    current = stack.pop()
    if isinstance(current, dict):
        for value in current.values():
            stack.append(value)
    elif isinstance(current, list):
        for item in current:
            stack.append(item)
    else:
        print(current)

Emily Wilson
D-2
4
2023-10-15 16:30:00
8.9
1994
Pulp Fiction
4
4
Mike Johnson
C-3
3
2023-10-15 14:00:00
9.3
1994
The Shawshank Redemption
3
3
Vancouver, Canada
AMC Empire 25
2
Jane Smith
B-5
2
2023-10-14 21:00:00
8.8
2010
Inception
2
2
John Doe
A-1
1
2023-10-14 18:30:00
8.7
1999
The Matrix
1
1
Toronto, Canada
Cineplex Odeon Westhills
1


In [5]:
movies

[[1, 'The Matrix', 1999, 8.7],
 [2, 'Inception', 2010, 8.8],
 [3, 'The Shawshank Redemption', 1994, 9.3],
 [4, 'Pulp Fiction', 1994, 8.9]]

Try extracting theater informaton in different ways.

What if we have multiple showtimes?