# Working with Known JSON Schemas - Lab

## Introduction
In this lab you'll practice working with json files whose schema you know beforehand.

## Objectives
You will be able to:
* Read JSON Documentation Schemas and translate into code
* Extract data from known json schemas
* Write data to predefined JSON schemas

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:
<img src="nytimes_movie_schema.png" width=500>

or a fully expanded view:

<img src="nytimes_movie_schema_detailed.png" width=500>

You can see this yourself here:
https://developer.nytimes.com/movie_reviews_v2.json#/Documentation/GET/critics/%7Bresource-type%7D.json

You can see that the master structure is a dictionary and has a key named 'response'. This is also a dictionary and has two keys: 'data' and 'meta'. As you continue to examine the schema hierarchy, you'll notice the vast majority in this case are dictionaries. 

## Loading the Data File

Start by importing the json file. The sample response from the api is stored in a file **ny_times_movies.json**

In [1]:
#Your code here
import json
f = open('ny_times_movies.json', 'r') 
data = json.load(f)
print(type(data))
print(data.keys())


<class 'dict'>
dict_keys(['status', 'copyright', 'has_more', 'num_results', 'results'])


## Loading Specific Data

Create a DataFrame of the major data container within the json file, listed under the 'results' heading in the schema above.

In [2]:
#Your code here
import pandas as pd
df = pd.DataFrame(data['results'])
df.head(3)


Unnamed: 0,byline,critics_pick,date_updated,display_title,headline,link,mpaa_rating,multimedia,opening_date,publication_date,summary_short
0,A.O. SCOTT,1,2018-10-17 02:44:23,Can You Ever Forgive Me,Review: Melissa McCarthy Is Criminally Good in...,"{'type': 'article', 'url': 'http://www.nytimes...",R,"{'type': 'mediumThreeByTwo210', 'src': 'https:...",2018-10-19,2018-10-16,Marielle Heller directs a true story of litera...
1,BEN KENIGSBERG,1,2018-10-16 11:04:03,Charm City,Review: â€˜Charm Cityâ€™ Vividly Captures the ...,"{'type': 'article', 'url': 'http://www.nytimes...",,"{'type': 'mediumThreeByTwo210', 'src': 'https:...",2018-04-22,2018-10-16,Marilyn Nessâ€™s documentary is dedicated to t...
2,GLENN KENNY,1,2018-10-16 11:04:04,Horn from the Heart: The Paul Butterfield Story,Review: Paul Butterfieldâ€™s Story Is Told in ...,"{'type': 'article', 'url': 'http://www.nytimes...",,"{'type': 'mediumThreeByTwo210', 'src': 'https:...",2018-10-19,2018-10-16,A documentary explores the life of the blues m...


## How many unique critics are there?

In [6]:
#Your code here
print(df.info())
print(df.byline.nunique())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 11 columns):
byline              20 non-null object
critics_pick        20 non-null int64
date_updated        20 non-null object
display_title       20 non-null object
headline            20 non-null object
link                20 non-null object
mpaa_rating         20 non-null object
multimedia          20 non-null object
opening_date        16 non-null object
publication_date    20 non-null object
summary_short       20 non-null object
dtypes: int64(1), object(10)
memory usage: 1.8+ KB
None
8


## Create a new column for the review's url. Title the column 'review_url'

In [14]:
#Your code here
keys = df.link.iloc[0].keys()
#Get dictionary keys
#Keep track of columns we make for subsequent preview
new_cols = []
#Create a new feature for each of these keys
for key in keys:
    new_col = 'review_{}'.format(key) 
#Create new column name
    df[new_col] = df.link.map(lambda x: x[key]) 
#Create a new column
    new_cols.append(new_col)
df[new_cols].head()


Unnamed: 0,review_type,review_url,review_suggested_link_text
0,article,http://www.nytimes.com/2018/10/16/movies/can-y...,Read the New York Times Review of Can You Ever...
1,article,http://www.nytimes.com/2018/10/16/movies/charm...,Read the New York Times Review of Charm City
2,article,http://www.nytimes.com/2018/10/16/movies/horn-...,Read the New York Times Review of Horn from th...
3,article,http://www.nytimes.com/2018/10/16/movies/the-p...,Read the New York Times Review of The Price of...
4,article,http://www.nytimes.com/2018/10/16/movies/impul...,Read the New York Times Review of Impulso


## How many results are in the file?

In [16]:
len(df.review_url)


20

## Summary
Well done! Here you continued to gather practice extracting data from JSON files and transforming them into our standard tool of Pandas DataFrames.