# Working with Known JSON Schemas - Lab

## Introduction
In this lab, you'll practice working with JSON files whose schema you know beforehand.

## Objectives

You will be able to:

* Use the `json` module to load and parse JSON documents
* Extract data using predefined JSON schemas
* Convert JSON to a pandas dataframe

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:
<img src="images/nytimes_movie_schema.png" width=500>

or a fully expanded view:

<img src="images/nytimes_movie_schema_detailed.png" width=500>

You can more about the documentation [here](https://developer.nytimes.com/docs/movie-reviews-api/1/routes/reviews/%7Btype%7D.json/get).



## Loading the JSON Data

Open the JSON file located at `ny_times_movies.json`, and use the `json` module to load the data into a variable called `data`.

In [21]:
import json
with open('ny_times_movies.json') as file:
    data = json.load(file)
print(data)

{'status': 'OK', 'copyright': 'Copyright (c) 2018 The New York Times Company. All Rights Reserved.', 'has_more': True, 'num_results': 20, 'results': [{'display_title': 'Can You Ever Forgive Me', 'mpaa_rating': 'R', 'critics_pick': 1, 'byline': 'A.O. SCOTT', 'headline': 'Review: Melissa McCarthy Is Criminally Good in ‘Can You Ever Forgive Me?’', 'summary_short': 'Marielle Heller directs a true story of literary fraud, set amid the bookstores and gay bars of early ’90s Manhattan.', 'publication_date': '2018-10-16', 'opening_date': '2018-10-19', 'date_updated': '2018-10-17 02:44:23', 'link': {'type': 'article', 'url': 'http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html', 'suggested_link_text': 'Read the New York Times Review of Can You Ever Forgive Me'}, 'multimedia': {'type': 'mediumThreeByTwo210', 'src': 'https://static01.nyt.com/images/2018/10/19/arts/19CANYOUEVER-1/19CANYOUEVER-1-mediumThreeByTwo210.jpg', 'width': 210, 'height': 140}}, {'disp

Run the code below to investigate its contents:

In [22]:
# Run this cell without changes
print("`data` has type", type(data))
print("The keys are", list(data.keys()))

`data` has type <class 'dict'>
The keys are ['status', 'copyright', 'has_more', 'num_results', 'results']


## Loading Results

Create a variable `results` that contains the value associated with the `'results'` key.

In [23]:
import pandas as pd
results = pd.Series('results')
results

0    results
dtype: object

Below we display this variable as a table using pandas:

In [24]:
# Run this cell without changes
import pandas as pd
df = pd.DataFrame(results)
df

Unnamed: 0,0
0,results


## Data Analysis

Now that you have a general sense of the data, answer some questions about it.

### How many results are in the file?

The metadata says this:

In [25]:
# Run this cell without changes
data['num_results']

20

Double-check that by looking at `results`. Does it line up?

In [26]:
results

0    results
dtype: object

### How many unique critics are there?

A critic's name can be identified using the `'byline'` key. Assign your answer to the variable `unique_critics`.

In [28]:
unique_critics = len({review['byline'] for review in data})
unique_critics

TypeError: string indices must be integers

This code checks your answer.

In [19]:
# Run this cell without changes
assert unique_critics == 7

NameError: name 'unique_critics' is not defined

## Flattening Data

Create a list `review_urls` that contains the URL for each review. This can be found using the `'url'` key nested under `'link'`.

In [None]:
# Your code here (create more cells as needed)

The following code will check your answer:

In [None]:
# Run this cell without changes

# review_urls should be a list
assert type(review_urls) == list

# The length should be 20, same as the length of reviews
assert len(review_urls) == 20

# The data type contained should be string
assert type(review_urls[0]) == str and type(review_urls[-1]) == str

# Spot checking a specific value
assert review_urls[6] == 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html'

## Summary
In this lab you practiced extracting and transforming data from JSON files with known schemas.