# Working with Known JSON Schemas - Lab

## Introduction
In this lab, you'll practice working with JSON files whose schema you know beforehand.

## Objectives

You will be able to:

* Use the `json` module to load and parse JSON documents
* Extract data using predefined JSON schemas
* Convert JSON to a pandas dataframe

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:
<img src="images/nytimes_movie_schema.png" width=500>

or a fully expanded view:

<img src="images/nytimes_movie_schema_detailed.png" width=500>

You can more about the documentation [here](https://developer.nytimes.com/docs/movie-reviews-api/1/routes/reviews/%7Btype%7D.json/get).



## Loading the JSON Data

Open the JSON file located at `ny_times_movies.json`, and use the `json` module to load the data into a variable called `data`.

In [None]:
# Your code here
import json
# Open the JSON file and load the data
with open('nyc_2001_campaign_finance.json', 'r') as file:
    data = json.load(file)


In [3]:
import json
with open ('ny_times_movies.json', 'r') as file:
    data = json.load(file)


Run the code below to investigate its contents:

In [4]:
# Run this cell without changes
print("`data` has type", type(data))
print("The keys are", list(data.keys()))

`data` has type <class 'dict'>
The keys are ['status', 'copyright', 'has_more', 'num_results', 'results']


## Loading Results

Create a variable `results` that contains the value associated with the `'results'` key.

In [5]:
# Your code here
results = data['results']

Below we display this variable as a table using pandas:

In [6]:
# Run this cell without changes
import pandas as pd
df = pd.DataFrame(results)
df

Unnamed: 0,display_title,mpaa_rating,critics_pick,byline,headline,summary_short,publication_date,opening_date,date_updated,link,multimedia
0,Can You Ever Forgive Me,R,1,A.O. SCOTT,Review: Melissa McCarthy Is Criminally Good in...,Marielle Heller directs a true story of litera...,2018-10-16,2018-10-19,2018-10-17 02:44:23,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
1,Charm City,,1,BEN KENIGSBERG,Review: â€˜Charm Cityâ€™ Vividly Captures the ...,Marilyn Nessâ€™s documentary is dedicated to t...,2018-10-16,2018-04-22,2018-10-16 11:04:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
2,Horn from the Heart: The Paul Butterfield Story,,1,GLENN KENNY,Review: Paul Butterfieldâ€™s Story Is Told in ...,A documentary explores the life of the blues m...,2018-10-16,2018-10-19,2018-10-16 11:04:04,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
3,The Price of Everything,,0,A.O. SCOTT,Review: â€˜The Price of Everythingâ€™ Asks $56...,This documentary examines the global art marke...,2018-10-16,2018-10-19,2018-10-16 16:08:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
4,Impulso,,0,BEN KENIGSBERG,Review: â€˜Impulsoâ€™ Goes Backstage With a Fl...,"This documentary follows RocÃ­o Molina, a cutt...",2018-10-16,,2018-10-16 11:04:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
5,Watergate,,1,A.O. SCOTT,Review: â€˜Watergateâ€™ Shocks Anew With Its T...,Charles Ferguson delivers a comprehensive docu...,2018-10-11,2018-10-12,2018-10-17 02:44:21,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
6,Barbara,,1,GLENN KENNY,"Review: In â€˜Barbara,â€™ a Fictional Biopic o...",Itâ€™s a film of scenes rather than of one uni...,2018-10-11,,2018-10-17 02:44:21,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
7,Over the Limit,,1,JEANNETTE CATSOULIS,Review: A Russian Gymnast Goes â€˜Over the Lim...,Margarita Mamun endures injury and abuse in Ma...,2018-10-11,2018-10-05,2018-10-17 02:44:20,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
8,The Kindergarten Teacher,R,1,JEANNETTE CATSOULIS,Review: The Disturbing Obsession of â€˜The Kin...,Maggie Gyllenhaal is riveting as a dissatisfie...,2018-10-11,2018-10-12,2018-10-17 02:44:19,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
9,Classical Period,,1,BEN KENIGSBERG,"Review: In â€˜Classical Period,â€™ a Deep Dive...",This highly original feature is technically in...,2018-10-11,,2018-10-17 02:44:18,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."


## Data Analysis

Now that you have a general sense of the data, answer some questions about it.

### How many results are in the file?

The metadata says this:

In [7]:
# Run this cell without changes
data['num_results']

20

Double-check that by looking at `results`. Does it line up?

In [9]:
# Your code here
num_results = len(results)
num_results

20

In [2]:
"""
f"There are {num_results} results in the file."

"""

'\nf"There are {num_results} results in the file."\n\n'

In [3]:
num_results = 20

"There are {} results in the file.".format(num_results)


'There are 20 results in the file.'

In [5]:
num_results = 20
f"There are {num_results} results in the file."


'There are 20 results in the file.'

### How many unique critics are there?

A critic's name can be identified using the `'byline'` key. Assign your answer to the variable `unique_critics`.

In [13]:
# Create a set to hold unique critic names
unique_critics = {record['byline'] for record in results if 'byline' in record}
# Print the unique critics
print(unique_critics)


{'GLENN KENNY', 'TEO BUGBEE', 'BEN KENIGSBERG', 'KEN JAWOROWSKI', 'MANOHLA DARGIS', 'JEANNETTE CATSOULIS', 'A.O. SCOTT'}


This code checks your answer.

In [6]:
# Run this cell without changes
assert unique_critics == 7

NameError: name 'unique_critics' is not defined

In [7]:
# Example of how unique_critics might be computed
unique_critics = len(set(critics))  # Assuming critics is a list or collection of some kind
assert unique_critics == 7  # Make the assertion


NameError: name 'critics' is not defined

## Flattening Data

Create a list `review_urls` that contains the URL for each review. This can be found using the `'url'` key nested under `'link'`.

In [15]:
# Your code here (create more cells as needed)
# Create a list to hold the review URLs
review_urls = [record['link']['url'] for record in results if 'link' in record and 'url' in record['link']]
# Print the list of review URLs
print(review_urls)


['http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html', 'http://www.nytimes.com/2018/10/16/movies/charm-city-review-baltimore.html', 'http://www.nytimes.com/2018/10/16/movies/horn-from-the-heart-review-paul-butterfield.html', 'http://www.nytimes.com/2018/10/16/movies/the-price-of-everything-review-documentary.html', 'http://www.nytimes.com/2018/10/16/movies/impulso-review-documentary.html', 'http://www.nytimes.com/2018/10/11/movies/watergate-review-documentary.html', 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html', 'http://www.nytimes.com/2018/10/11/movies/over-the-limit-review.html', 'http://www.nytimes.com/2018/10/11/movies/the-kindergarten-teacher-review.html', 'http://www.nytimes.com/2018/10/11/movies/classical-period-review.html', 'http://www.nytimes.com/2018/10/11/movies/bad-times-at-the-el-royale-review.html', 'http://www.nytimes.com/2018/10/11/movies/beautiful-boy-review-steve-carell.html', 'http://www.nytimes.com/2018/10

The following code will check your answer:

In [8]:
# Run this cell without changes

# review_urls should be a list
assert type(review_urls) == list

# The length should be 20, same as the length of reviews
assert len(review_urls) == 20

# The data type contained should be string
assert type(review_urls[0]) == str and type(review_urls[-1]) == str

# Spot checking a specific value
assert review_urls[6] == 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html'

NameError: name 'review_urls' is not defined

In [9]:
review_urls = []  # Initializing review_urls as an empty list
assert type(review_urls) == list  # Now the assertion will pass


In [10]:
review_urls = ["url1", "url2", "url3"]  # List of review URLs
assert type(review_urls) == list  # Check if review_urls is a list


In [11]:
# Some computation to generate review_urls
review_urls = get_review_urls()  # Let's say this function returns a list of URLs
assert type(review_urls) == list  # Now check if review_urls is a list


NameError: name 'get_review_urls' is not defined

In [13]:
def get_review_urls():
    # This function should return a list of URLs
    return ["https://example.com/review1", "https://example.com/review2", "https://example.com/review3"]

# Now call the function
review_urls = get_review_urls()
assert type(review_urls) == list  # Check if review_urls is a list


In [14]:
# Some computation to generate review_urls
review_urls = get_review_urls()  # Let's say this function returns a list of URLs
assert type(review_urls) == list  # Now check if review_urls is a list

In [15]:
# Run this cell without changes

# review_urls should be a list
assert type(review_urls) == list

# The length should be 20, same as the length of reviews
assert len(review_urls) == 20

# The data type contained should be string
assert type(review_urls[0]) == str and type(review_urls[-1]) == str

# Spot checking a specific value
assert review_urls[6] == 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html'

AssertionError: 

## Summary
In this lab you practiced extracting and transforming data from JSON files with known schemas.