# JSON Tutorial

In [1]:
#start by importing pandas
import pandas as pd

In [2]:
#Panda has the function, read_json(), that can load JSON either from a 
#file or a url
url = "https://raw.githubusercontent.com/chrisalbon/simulated_datasets/master/data.json"
first_json = pd.read_json(url)
first_json.head()

Unnamed: 0,integer,datetime,category
0,5,2015-01-01 00:00:00,0
1,5,2015-01-01 00:00:01,0
2,9,2015-01-01 00:00:02,0
3,6,2015-01-01 00:00:03,0
4,6,2015-01-01 00:00:04,0


Writing JSON data is as simple as reading, and can be done in one line. Instead of **read_json()** you would use **to_json()** with a filename:

In [3]:
first_json.to_json('json_columns.json', orient='columns')
first_json.to_json('json_index.json', orient='index')

If the output directory is not specified **to_json()** stores the file in the *same directory* as the notebook the code is executed on. These functions are the best option to deal with JSON. However, they don't always work.
<br>
<br>*Note: **read_json()** and **to_json()** work only with simple JSON. All arrays inside need to be the same length.*

In [4]:
df = pd.read_json('nested.json')

ValueError: Unexpected character found when decoding 'null'

As shown, when we try to load another file from the url, we get a *ValueError: Unexpected character found when decoding 'null'*. Fortunately, we have another method. This is not a **Pandas** function, rather a method from the package named **JSON** which comes with core python installations:

In [7]:
import json

In [8]:
#load json object
with open('nested.json') as f:
    nested_json = json.load(f)
print(nested_json)
print(type(nested_json))

{'article': [{'id': '01', 'language': 'JSON', 'edition': 'first', 'author': 'Allen'}, {'id': '02', 'language': 'Python', 'edition': 'second', 'author': 'Aditya Sharma'}], 'blog': [{'name': 'Datacamp', 'URL': 'datacamp.com'}]}
<class 'dict'>


As seen, the file is automatically loaded as a python dict.

In [11]:
#use json_normalize()
from pandas import json_normalize
json_normalize(nested_json)

Unnamed: 0,article,blog
0,"[{'id': '01', 'language': 'JSON', 'edition': '...","[{'name': 'Datacamp', 'URL': 'datacamp.com'}]"


We can see from above that the primary keys are the columns of the dataframe. We were able to load it as a Pandas DataFrame but it still looks weird. 
<br>
<br> We are going to add a parameter **record_path** to **json_normalize** to put a focus on a specific key from the file:

In [12]:
blog = json_normalize(nested_json, record_path='blog')
blog.head()

Unnamed: 0,name,URL
0,Datacamp,datacamp.com


In [13]:
#and
article = json_normalize(nested_json, record_path='article')
article.head()

Unnamed: 0,id,language,edition,author
0,1,JSON,first,Allen
1,2,Python,second,Aditya Sharma


3 Main parameters for **json_normalize()**:
* **data** - input data
* **record_path** - nested elements
* **meta** - let them as they are elements

In [15]:
#bit more practice. Lets make new data.
#define json string
data = [{"state": "Florida", 
        "shortname": "FL",
        "info": {"governor": "Rick Scott"},
        "counties": [{"name": "Dade", "population": 12345},
                     {"name": "Broward", "population": 40000},
                     {"name": "Palm Beach", "population": 60000}]},
       {"state": "Ohio",
        "shortname": "OH",
        "info": {"governor": "John Kasich"},
        "counties": [{"name": "Summit", "population": 1234},
                     {"name": "Cuyahoga", "population": 1337}]}]

In [16]:
json_normalize(data)

Unnamed: 0,state,shortname,counties,info.governor
0,Florida,FL,"[{'name': 'Dade', 'population': 12345}, {'name...",Rick Scott
1,Ohio,OH,"[{'name': 'Summit', 'population': 1234}, {'nam...",John Kasich


In [17]:
json_normalize(data=data, record_path='counties',meta=['state', 'shortname', ['info', 'governor']])

Unnamed: 0,name,population,state,shortname,info.governor
0,Dade,12345,Florida,FL,Rick Scott
1,Broward,40000,Florida,FL,Rick Scott
2,Palm Beach,60000,Florida,FL,Rick Scott
3,Summit,1234,Ohio,OH,John Kasich
4,Cuyahoga,1337,Ohio,OH,John Kasich
