# convert JSON into a Pandas DataFrame

- Reading simple JSON from a local file
- Reading simple JSON from a URL
- Flattening nested list from JSON object
- Flattening nested list and dict from JSON object
- Extracting a value from deeply nested JSON

**Reading simple JSON from a local file**

In [5]:
# %load command1.py
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity='all'

%config InlineBackend.figure_format='svg'
plt.rcParams['figure.dpi']=120

pd.options.display.float_format='{:,.2f}'.format
pd.set_option('display.max_colwidth', None)

In [6]:
[
  {
    "id": "A001",
    "name": "Tom",
    "math": 60,
    "physics": 66,
    "chemistry": 61
  },
  {
    "id": "A002",
    "name": "James",
    "math": 89,
    "physics": 76,
    "chemistry": 51
  },
  {
    "id": "A003",
    "name": "Jenny",
    "math": 79,
    "physics": 90,
    "chemistry": 78
  }
]




[{'id': 'A001', 'name': 'Tom', 'math': 60, 'physics': 66, 'chemistry': 61},
 {'id': 'A002', 'name': 'James', 'math': 89, 'physics': 76, 'chemistry': 51},
 {'id': 'A003', 'name': 'Jenny', 'math': 79, 'physics': 90, 'chemistry': 78}]

In [11]:
df=pd.read_json('./pandasData/simple.json')
df
print()
df.info()

Unnamed: 0,id,name,math,physics,chemistry
0,A001,Tom,60,66,61
1,A002,James,89,76,51
2,A003,Jenny,79,90,78



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         3 non-null      object
 1   name       3 non-null      object
 2   math       3 non-null      int64 
 3   physics    3 non-null      int64 
 4   chemistry  3 non-null      int64 
dtypes: int64(3), object(2)
memory usage: 248.0+ bytes


**Reading simple JSON from a URL**

In [13]:
URL = 'http://raw.githubusercontent.com/BindiChen/machine-learning/master/data-analysis/027-pandas-convert-json/data/simple.json'
df = pd.read_json(URL)
df

Unnamed: 0,id,name,math,physics,chemistry
0,A001,Tom,60,66,61
1,A002,James,89,76,51
2,A003,Jenny,79,90,78


**Flattening nested list from JSON object**

In [16]:
%%writefile ./pandasData/nested_list.json
{
    "school_name": "ABC primary school",
    "class": "Year 1",
    "students": [
    {
        "id": "A001",
        "name": "Tom",
        "math": 60,
        "physics": 66,
        "chemistry": 61
    },
    {
        "id": "A002",
        "name": "James",
        "math": 89,
        "physics": 76,
        "chemistry": 51
    },
    {
        "id": "A003",
        "name": "Jenny",
        "math": 79,
        "physics": 90,
        "chemistry": 78
    }]
}

Writing ./pandasData/nested_list.json


In [17]:
df=pd.read_json('./pandasData/nested_list.json')
df

Unnamed: 0,school_name,class,students
0,ABC primary school,Year 1,"{'id': 'A001', 'name': 'Tom', 'math': 60, 'physics': 66, 'chemistry': 61}"
1,ABC primary school,Year 1,"{'id': 'A002', 'name': 'James', 'math': 89, 'physics': 76, 'chemistry': 51}"
2,ABC primary school,Year 1,"{'id': 'A003', 'name': 'Jenny', 'math': 79, 'physics': 90, 'chemistry': 78}"


In [20]:
import json

# load data using Python JSON module
with open('./pandasData/nested_list.json','r') as f:
    content=f.read()
    data = json.loads(content)
    
# Flatten data
df_nested_list = pd.json_normalize(data, record_path =['students'])
df_nested_list

Unnamed: 0,id,name,math,physics,chemistry
0,A001,Tom,60,66,61
1,A002,James,89,76,51
2,A003,Jenny,79,90,78


In [22]:
# To include school_name and class

df_nested_list = pd.json_normalize(
    data, 
    record_path =['students'], 
    meta=['school_name', 'class']
)

df_nested_list

Unnamed: 0,id,name,math,physics,chemistry,school_name,class
0,A001,Tom,60,66,61,ABC primary school,Year 1
1,A002,James,89,76,51,ABC primary school,Year 1
2,A003,Jenny,79,90,78,ABC primary school,Year 1


**Flattening nested list and dict from JSON object**

In [23]:
%%writefile ./pandasData/nested_mix.json

{
    "school_name": "local primary school",
    "class": "Year 1",
    "info": {
      "president": "John Kasich",
      "address": "ABC road, London, UK",
      "contacts": {
        "email": "admin@e.com",
        "tel": "123456789"
      }
    },
    "students": [
    {
        "id": "A001",
        "name": "Tom",
        "math": 60,
        "physics": 66,
        "chemistry": 61
    },
    {
        "id": "A002",
        "name": "James",
        "math": 89,
        "physics": 76,
        "chemistry": 51
    },
    {
        "id": "A003",
        "name": "Jenny",
        "math": 79,
        "physics": 90,
        "chemistry": 78
    }]
}

Writing ./pandasData/nested_mix.json


In [25]:
import json
# load data using Python JSON module
with open('./pandasData/nested_mix.json','r') as f:
    data = json.loads(f.read())
    
# Normalizing data
df = pd.json_normalize(data, record_path =['students'])
df

Unnamed: 0,id,name,math,physics,chemistry
0,A001,Tom,60,66,61
1,A002,James,89,76,51
2,A003,Jenny,79,90,78


In [27]:
# And to include class, president (a property of info), and tel (a property of contacts.info), 
# we can use the argument meta to specify the path to the property.

df = pd.json_normalize(
    data, 
    record_path =['students'], 
    meta=[
        'class',
        ['info', 'president'], 
        ['info', 'contacts', 'tel']
    ]
)

df

Unnamed: 0,id,name,math,physics,chemistry,class,info.president,info.contacts.tel
0,A001,Tom,60,66,61,Year 1,John Kasich,123456789
1,A002,James,89,76,51,Year 1,John Kasich,123456789
2,A003,Jenny,79,90,78,Year 1,John Kasich,123456789


**Extracting a single value from deeply nested JSON**

- glom is a Python library that allows us to use . notation to access property from a deeply nested object.

In [28]:
%%writefile ./pandasData/nested_deep.json
{
    "school_name": "local primary school",
    "class": "Year 1",
    "students": [
    {
        "id": "A001",
        "name": "Tom",
        "grade": {
            "math": 60,
            "physics": 66,
            "chemistry": 61
        }
  
    },
    {
        "id": "A002",
        "name": "James",
        "grade": {
            "math": 89,
            "physics": 76,
            "chemistry": 51
        }
        
    },
    {
        "id": "A003",
        "name": "Jenny",
        "grade": {
            "math": 79,
            "physics": 90,
            "chemistry": 78
        }
    }]
}

Writing ./pandasData/nested_deep.json


In [32]:
from glom import glom

df=pd.read_json('./pandasData/nested_deep.json')
df['students'].apply(lambda x:glom(x, 'grade.math'))

0    60
1    89
2    79
Name: students, dtype: int64