# Working with Known JSON Schemas - Lab

## Introduction
In this lab, you'll practice working with JSON files whose schema you know beforehand.

## Objectives
You will be able to:
* Use the JSON module to load and parse JSON documents
* Extract data using predefined JSON schemas
* Convert JSON to a pandas dataframe

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:
<img src="images/nytimes_movie_schema.png" width=500>

or a fully expanded view:

<img src="images/nytimes_movie_schema_detailed.png" width=500>

You can more about the documentation [here](https://developer.nytimes.com/docs/movie-reviews-api/1/routes/reviews/%7Btype%7D.json/get).

Note that **this is a different schema than the schema used in the previous lesson**, although both come from the New York Times.

## Loading the JSON Data

Open the JSON file located at `ny_times_movies.json`, and use the `json` module to load the data into a variable called `data`.

In [1]:
# Your code here
import json

with open('ny_times_movies.json') as f:
    data = json.load(f)

Run the code below to investigate its contents:

In [2]:
# Run this cell without changes
print("`data` has type", type(data))
print("The keys are", list(data.keys()))

`data` has type <class 'dict'>
The keys are ['status', 'copyright', 'has_more', 'num_results', 'results']


## Loading Results

Create a variable `results` that contains the value associated with the `'results'` key.

In [45]:
# Your code here
results = [data['results']]

Below we display this variable as a table using pandas:

In [46]:
# Run this cell without changes
import pandas as pd
df = pd.DataFrame(results)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,"{'display_title': 'Can You Ever Forgive Me', '...","{'display_title': 'Charm City', 'mpaa_rating':...",{'display_title': 'Horn from the Heart: The Pa...,"{'display_title': 'The Price of Everything', '...","{'display_title': 'Impulso', 'mpaa_rating': ''...","{'display_title': 'Watergate', 'mpaa_rating': ...","{'display_title': 'Barbara', 'mpaa_rating': ''...","{'display_title': 'Over the Limit', 'mpaa_rati...","{'display_title': 'The Kindergarten Teacher', ...","{'display_title': 'Classical Period', 'mpaa_ra...",{'display_title': 'Bad Times at the El Royale'...,"{'display_title': 'Beautiful Boy', 'mpaa_ratin...","{'display_title': 'The Oath', 'mpaa_rating': '...","{'display_title': 'Bikini Moon', 'mpaa_rating'...",{'display_title': 'Goosebumps 2: Haunted Hallo...,"{'display_title': 'The Sentence', 'mpaa_rating...","{'display_title': 'All Square', 'mpaa_rating':...","{'display_title': 'Sadie', 'mpaa_rating': '', ...","{'display_title': 'After Everything', 'mpaa_ra...","{'display_title': 'First Man', 'mpaa_rating': ..."


## Data Analysis

Now that you have a general sense of the data, answer some questions about it.

### How many results are in the file?

The metadata says this:

In [47]:
# Run this cell without changes
data['num_results']

20

Double-check that by looking at `results`. Does it line up?

In [48]:
# Your code here
results

[[{'display_title': 'Can You Ever Forgive Me',
   'mpaa_rating': 'R',
   'critics_pick': 1,
   'byline': 'A.O. SCOTT',
   'headline': 'Review: Melissa McCarthy Is Criminally Good in â€˜Can You Ever Forgive Me?â€™',
   'summary_short': 'Marielle Heller directs a true story of literary fraud, set amid the bookstores and gay bars of early â€™90s Manhattan.',
   'publication_date': '2018-10-16',
   'opening_date': '2018-10-19',
   'date_updated': '2018-10-17 02:44:23',
   'link': {'type': 'article',
    'url': 'http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html',
    'suggested_link_text': 'Read the New York Times Review of Can You Ever Forgive Me'},
   'multimedia': {'type': 'mediumThreeByTwo210',
    'src': 'https://static01.nyt.com/images/2018/10/19/arts/19CANYOUEVER-1/19CANYOUEVER-1-mediumThreeByTwo210.jpg',
    'width': 210,
    'height': 140}},
  {'display_title': 'Charm City',
   'mpaa_rating': '',
   'critics_pick': 1,
   'byline': 'BEN KE

In [None]:
"""
Yes 
"""

### How many unique critics are there?

A critic's name can be identified using the `'byline'` key. Assign your answer to the variable `unique_critics`.

In [64]:
# Your code here
results[0][0]['byline']
unique_critics = len(list(set([critic['byline'] for critic in results[0]])))
unique_critics

7

This code checks your answer.

In [65]:
# Run this cell without changes
assert unique_critics == 7

## Flattening Data

Create a list `review_urls` that contains the URL for each review. This can be found using the `'url'` key nested under `'link'`.

In [71]:
def get_links(review):
    result = [link['link'] for link in results[0]]
    
    return result

In [86]:
# Your code here (create more cells as needed)
links_list = get_links(results)
review_urls = [url['url'] for url in links_list]

The following code will check your answer:

In [87]:
# Run this cell without changes

# review_urls should be a list
assert type(review_urls) == list

# The length should be 20, same as the length of reviews
assert len(review_urls) == 20

# The data type contained should be string
assert type(review_urls[0]) == str and type(review_urls[-1]) == str

# Spot checking a specific value
assert review_urls[6] == 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html'

## Summary
Well done! In this lab you continued to practice extracting and transforming data from JSON files with known schemas.