# JSON Tutorial

In [4]:
import pandas as pd

Pandas has the function, `read_json()`, that can load JSON either from a file or a url.

In [5]:
url = 'https://raw.githubusercontent.com/chrisalbon/simulated_datasets/master/data.json'
first_json = pd.read_json(url)
first_json.head()

Unnamed: 0,integer,datetime,category
0,5,2015-01-01 00:00:00,0
1,5,2015-01-01 00:00:01,0
2,9,2015-01-01 00:00:02,0
3,6,2015-01-01 00:00:03,0
4,6,2015-01-01 00:00:04,0


Use `to_json()` to write the json data to a file.

In [6]:
first_json.to_json('data/json_columns.json', orient='columns')
first_json.to_json('data/json_index.json', orient='index')

If the output directory is not specified, `to_json()` stores the file in the same directory as our notebook. note that `read_json()` and `to_json()` only works with simple JSON. All arrays inside need to have arrays of same length.

In [8]:
df = pd.read_json('data/nested.json') 

ValueError: All arrays must be of the same length

`read_json()` doesn't work for nested JSON. but there is another method using the json package.

In [13]:
from pprint import pprint
import json

In [17]:
with open('data/nested.json') as f:
    nested_json = json.load(f) # loads json as a python dictionary

pprint(nested_json) # pprint formats the json in a more readable format
print(type(nested_json))

{'article': [{'author': 'Allen',
              'edition': 'first',
              'id': '01',
              'language': 'JSON'},
             {'author': 'Aditya Sharma',
              'edition': 'second',
              'id': '02',
              'language': 'Python'}],
 'blog': [{'URL': 'datacamp.com', 'name': 'Datacamp'}]}
<class 'dict'>


In [18]:
pd.json_normalize(nested_json)

Unnamed: 0,article,blog
0,"[{'id': '01', 'language': 'JSON', 'edition': '...","[{'name': 'Datacamp', 'URL': 'datacamp.com'}]"


Primary keys are the columns of the DataFrame. But here, it doesn't load properly. Adding a parameter `record_path` to `json_normalize` puts a focus on a specific key from the file.

In [19]:
blog = pd.json_normalize(nested_json, record_path='blog')
blog.head()

Unnamed: 0,name,URL
0,Datacamp,datacamp.com


In [20]:
article = pd.json_normalize(nested_json, record_path='article')
article.head()

Unnamed: 0,id,language,edition,author
0,1,JSON,first,Allen
1,2,Python,second,Aditya Sharma


`json_normalize()` has 3 main parameters:
1. data: input data
2. record_path: nested elements
3. meta: ignore nested elements

In [22]:
with open('data/states.json') as f:
    data = json.load(f)

In [23]:
pd.json_normalize(data)

Unnamed: 0,state,shortname,counties,info.governor
0,Florida,FL,"[{'name': 'Dade', 'population': 12345}, {'name...",Rick Scott
1,Ohio,OH,"[{'name': 'Summit', 'population': 1234}, {'nam...",John Kasich


In [25]:
pd.json_normalize(
    data=data,
    record_path='counties',
    meta=['state', 'shortname', ['info', 'governor']]
)

Unnamed: 0,name,population,state,shortname,info.governor
0,Dade,12345,Florida,FL,Rick Scott
1,Broward,40000,Florida,FL,Rick Scott
2,Palm Beach,60000,Florida,FL,Rick Scott
3,Summit,1234,Ohio,OH,John Kasich
4,Cuyahoga,1337,Ohio,OH,John Kasich
