# JSON Tutorial

In this activity, we will improve our skills in working with JSON files.

Let's start with the import of Pandas.


In [1]:
import pandas as pd

Pandas has the function, read_json(), that can load JSON either from a file or a url.

In [2]:
url = "http://api.open-notify.org/astros.json"
first_json = pd.read_json(url)
first_json.head()

Unnamed: 0,message,people,number
0,success,"{'name': 'Jasmin Moghbeli', 'craft': 'ISS'}",7
1,success,"{'name': 'Andreas Mogensen', 'craft': 'ISS'}",7
2,success,"{'name': 'Satoshi Furukawa', 'craft': 'ISS'}",7
3,success,"{'name': 'Konstantin Borisov', 'craft': 'ISS'}",7
4,success,"{'name': 'Oleg Kononenko', 'craft': 'ISS'}",7


Writing the JSON data is as simple as reading and is one line of code. Instead of read_json(), you will use to_json() with a filename and that's all!

In [3]:
first_json.to_json('json_columns.json', orient="columns")
first_json.to_json('json_index.json', orient="index")

If the output directory is not specified to_json() stores the file in the same directory as our notebook. Find the two files there, check the two files and see the difference. These functions are the best option to deal with JSON. However, they don't always work. 

read_json() and to_json() works only with simple JSON. All arrays inside need to have arrays of same length.

So what about the nested JSON files?




See the file nested.json, how it looks like and try to load it into pandas with pd.read_json()

```python
df = pd.read_json("nested.json")
```


In [4]:
df = pd.read_json("nested.json")

ValueError: All arrays must be of the same length

We can see that it doesn't work. Fortunately, we have another method. This is not a Pandas function but the method from package JSON which comes with core Python.

In [5]:
import json
#load json object
with open('nested.json') as f:
    nested_json = json.load(f)
print(nested_json)
print(type(nested_json))


{'article': [{'id': '01', 'language': 'JSON', 'edition': 'first', 'author': 'Allen'}, {'id': '02', 'language': 'Python', 'edition': 'second', 'author': 'Aditya Sharma'}], 'blog': [{'name': 'Datacamp', 'URL': 'datacamp.com'}]}
<class 'dict'>


We can see that the file is automatically loaded as a Python dictionary.
Note

We can use package pprint for pretty printing dictionaries. This makes the human-parsing of json requests much easier to understand.

We will use a function from Pandas json_normalize(),

In [6]:
pd.json_normalize(nested_json)

Unnamed: 0,article,blog
0,"[{'id': '01', 'language': 'JSON', 'edition': '...","[{'name': 'Datacamp', 'URL': 'datacamp.com'}]"


We can see from above that the primary keys are the columns of the DataFrame. We were able to load it as a Pandas DataFrame but it still looks weird.

We are going to add a parameter record_path to json_normalize to put a focus on a specific key from the file:

In [8]:
blog = pd.json_normalize(nested_json,record_path ='blog')
blog.head()

Unnamed: 0,name,URL
0,Datacamp,datacamp.com


In [9]:
article = pd.json_normalize(nested_json,record_path ='article')
article.head()

Unnamed: 0,id,language,edition,author
0,1,JSON,first,Allen
1,2,Python,second,Aditya Sharma


json_normalize() has 3 main parameters:

    data - input data
    record_path - nested elements
    meta - let them as they are elements


Let's practice a bit more with json_normalize() on different data that are specified below

In [10]:
# define json string
data = [{"state": "Florida", 
        "shortname": "FL",
        "info": {"governor": "Rick Scott"},
        "counties": [{"name": "Dade", "population": 12345},
                     {"name": "Broward", "population": 40000},
                     {"name": "Palm Beach", "population": 60000}]},
       {"state": "Ohio",
        "shortname": "OH",
        "info": {"governor": "John Kasich"},
        "counties": [{"name": "Summit", "population": 1234},
                     {"name": "Cuyahoga", "population": 1337}]}]


In [12]:
pd.json_normalize(data)
pd.json_normalize(data=data, record_path='counties', meta=['state', 'shortname', ['info', 'governor']])

Unnamed: 0,name,population,state,shortname,info.governor
0,Dade,12345,Florida,FL,Rick Scott
1,Broward,40000,Florida,FL,Rick Scott
2,Palm Beach,60000,Florida,FL,Rick Scott
3,Summit,1234,Ohio,OH,John Kasich
4,Cuyahoga,1337,Ohio,OH,John Kasich
