**We will see how to read JSON files and what are the different challenges that we face while reading JSON files and how to tackle them.**

**READ JSON FILES**

In [1]:
# import pandas library 
import pandas as pd

In [2]:
pd.__version__

'1.5.2'

In [3]:
# read the json file
data = pd.read_json('datasets/simple.json')

In [4]:
# print the top rows of the dataframe
data.head()

Unnamed: 0,name,age,grade
0,Andew,12,A
1,Bhuvan,18,B
2,Clinton,11,A
3,Drake,12,C
4,Eisha,13,B


***CHALLENGES WITH READING JSON FILES***

- Reading JSON files written as records.

**Some of the json files are written as records i.e. each json object is written in a line.**

**For example:**

{'name':'Akshay', 'roll_no':'100'} # line 1

{'name':'Sanad', 'roll_no':'101'} # line 2

- 
- 
- 
- 
- 

{'name':'Arvind', 'roll_no':'200'} # line 101

In [5]:
# read the data
data_with_records = pd.read_json('datasets/simple_records.json')

ValueError: Trailing data

**If we try to read these type of files directly, you will get an error. So resolve this error, you need to pass the parameter <span style='background:yellow'>lines=True</span>

In [6]:
# read json files with records 
data_with_records = pd.read_json('datasets/simple_records.json', lines=True)

In [7]:
data_with_records.head()

Unnamed: 0,name,age,grade
0,Andew,12,A
1,Bhuvan,18,B
2,Clinton,11,A
3,Drake,12,C
4,Eisha,13,B


**JSON Module of Standard Library**

Most of the json files are nested and we can not directly import them into a dataframe. We first need to clean and filter the json file in order to convert it into a dataframe.

**READING A JSON FILE**

In [8]:
# can't read the nested json in pandas
pd.read_json('datasets/nested.json')

Unnamed: 0,student_roll_no,details
0,101,"{'name': 'Andew', 'age': 12, 'grade': 'A'}"
1,102,"{'name': 'Bhuvan', 'age': 18, 'grade': 'B'}"
2,103,"{'name': 'Clinton', 'age': 11, 'grade': 'A'}"
3,104,"{'name': 'Drake', 'age': 12, 'grade': 'C'}"
4,105,"{'name': 'Eisha', 'age': 13, 'grade': 'B'}"
5,106,"{'name': 'Farhan', 'age': 22, 'grade': 'C'}"
6,107,"{'name': 'Garima', 'age': 11, 'grade': 'A'}"
7,108,"{'name': 'Himanshu', 'age': 19, 'grade': 'A'}"
8,109,"{'name': 'Ishaan', 'age': 10, 'grade': 'D'}"
9,110,"{'name': 'Jason', 'age': 9, 'grade': 'B'}"


In [9]:
# importing the json library from the standard library
import json

In [12]:
# open and load the data in json file 
with open('datasets/nested.json') as f:
    my_json_data = json.load(f)
    
print(my_json_data)

[{'student_roll_no': 101, 'details': {'name': 'Andew', 'age': 12, 'grade': 'A'}}, {'student_roll_no': 102, 'details': {'name': 'Bhuvan', 'age': 18, 'grade': 'B'}}, {'student_roll_no': 103, 'details': {'name': 'Clinton', 'age': 11, 'grade': 'A'}}, {'student_roll_no': 104, 'details': {'name': 'Drake', 'age': 12, 'grade': 'C'}}, {'student_roll_no': 105, 'details': {'name': 'Eisha', 'age': 13, 'grade': 'B'}}, {'student_roll_no': 106, 'details': {'name': 'Farhan', 'age': 22, 'grade': 'C'}}, {'student_roll_no': 107, 'details': {'name': 'Garima', 'age': 11, 'grade': 'A'}}, {'student_roll_no': 108, 'details': {'name': 'Himanshu', 'age': 19, 'grade': 'A'}}, {'student_roll_no': 109, 'details': {'name': 'Ishaan', 'age': 10, 'grade': 'D'}}, {'student_roll_no': 110, 'details': {'name': 'Jason', 'age': 9, 'grade': 'B'}}]


**Pretty Print: https://docs.python.org/3/library/pprint.html

- To view the data in the structured way


In [13]:
# use pprint or (pretty print) to print the data in the structured format

from pprint import pprint
pprint(my_json_data)

[{'details': {'age': 12, 'grade': 'A', 'name': 'Andew'},
  'student_roll_no': 101},
 {'details': {'age': 18, 'grade': 'B', 'name': 'Bhuvan'},
  'student_roll_no': 102},
 {'details': {'age': 11, 'grade': 'A', 'name': 'Clinton'},
  'student_roll_no': 103},
 {'details': {'age': 12, 'grade': 'C', 'name': 'Drake'},
  'student_roll_no': 104},
 {'details': {'age': 13, 'grade': 'B', 'name': 'Eisha'},
  'student_roll_no': 105},
 {'details': {'age': 22, 'grade': 'C', 'name': 'Farhan'},
  'student_roll_no': 106},
 {'details': {'age': 11, 'grade': 'A', 'name': 'Garima'},
  'student_roll_no': 107},
 {'details': {'age': 19, 'grade': 'A', 'name': 'Himanshu'},
  'student_roll_no': 108},
 {'details': {'age': 10, 'grade': 'D', 'name': 'Ishaan'},
  'student_roll_no': 109},
 {'details': {'age': 9, 'grade': 'B', 'name': 'Jason'}, 'student_roll_no': 110}]


**PROBLEM**

Create a new json file contains the age and name of the people whose age is greater than 15.

In [14]:
# we saw that data in the file is in json list form.
# iterate through json

data_0 = my_json_data[0]
data_0

{'student_roll_no': 101, 'details': {'name': 'Andew', 'age': 12, 'grade': 'A'}}

In [15]:
data_0['details']

{'name': 'Andew', 'age': 12, 'grade': 'A'}

In [16]:
data_0['details']['age']

12

In [17]:
# iterate through json data
for data in my_json_data:
    print(data['details']['age'])

12
18
11
12
13
22
11
19
10
9


In [18]:
# create a new empty list to store the filtered data
filtered_data = []

# iterate through the json data
for data in my_json_data:
    
    # create a new empty dictionary
    filtered_variable = {}
    
    # check for the condition 
    if data['details']['age'] > 15 :
        # if condition satisfies, store the age and name
        filtered_variable['age'] = data['details']['age']
        filtered_variable['name'] = data['details']['name']
        filtered_data.append(filtered_variable)

In [19]:
# check the filterd data
filtered_data

[{'age': 18, 'name': 'Bhuvan'},
 {'age': 22, 'name': 'Farhan'},
 {'age': 19, 'name': 'Himanshu'}]

**WRITING A JSON FILE**

In [20]:
# put the filterd data into the new json file
with open('datasets/filtered_v2.json', 'w') as f:
    json.dump(filtered_data, f, indent=4)