---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 3.13 (Pandas-05)</h1>

## _IO with JSON Files.ipynb_

#### Read Pandas Documentation:
- General Info: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html


- For `read_json`:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.read_json.html?highlight=pandas%20read_json#pandas.io.json.read_json


- For `to_json`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html?highlight=to_json

### Learning agenda of this notebook
1. Reading the JSON file.
2. Challenges with reading JSON files.
 - Reading JSON files written as records.
3. JSON Library
 - Reading Nested JSON
 - Filter JSON
 - Wrting JSON files

## 1. Read a  JSON File in Pandas Dataframe
>- Pandas is a popular Python library used for working in tabular data (similar to the data stored in a spreadsheet). Pandas provides helper functions to read data from various file formats like CSV, Excel spreadsheets, HTML tables, JSON, SQL, and more.
>- JavaScript Object Notation is a open standard file format that uses human-readable text consisting of attribute–value pairs and arrays.
>- It is a data interchange format that is used to store and transfer the data via Internet, primarily between a web client and a server.
### What is a JSON File
- JavaScript Object Notation is a open standard file format that uses human-readable text consisting of attribute–value pairs and arrays.
- It is a data interchange format that is used to store and transfer the data via Internet, primarily between a web client and a server.

In [None]:
# To install this library in Jupyter notebook
import sys
!{sys.executable} -m pip install pandas --quiet

In [2]:
import pandas as pd
pd.__version__ , pd.__path__

('1.3.4',
 ['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas'])

In [3]:
# read the json file using read_json method of pandas library
data = pd.read_json('datasets/simple.json')

data

Unnamed: 0,name,age,grade
0,Kamal,12,A
1,Hashim,18,B
2,Salman,11,A
3,Mazhar,12,C
4,Eisha,13,B
5,Farhan,22,C
6,Mohsin,11,A
7,Bilal,19,A
8,Ishaan,10,D
9,Zalaid,9,B


## 2. Reading a JSON File having each record in a separate line
- Some of the json files are written as records i.e each json line is a separate json object. For example:
```
{ 'name' : 'Ahsan', 'roll_no' : '100' } # line 1
{ 'name' : 'Ayesha' , 'roll_no' : '101' } # line 2
```

In [4]:
# If we try to read these type of files (json records) direclty, you may get an error

data_with_records = pd.read_json('datasets/simple_records.json')

ValueError: Trailing data

In [5]:
# to resolve this error, you need to pass the parameter lines=True

# read json files with records 
data_with_records = pd.read_json('datasets/simple_records.json',lines=True)

# view record
data_with_records

Unnamed: 0,name,age,grade
0,Kamal,12,A
1,Hashim,18,B
2,Salman,11,A
3,Mazhar,12,C
4,Eisha,13,B
5,Farhan,22,C
6,Mohsin,11,A
7,Bilal,19,A
8,Ishaan,10,D
9,Zalaid,9,B


## 3. Reading a JSON File having nested JSON records
- Most of the json files are nested and we cannot directly import them into a dataframe properly. We first need to clean and filter the json file in order to convert it into a dataframe.
```
[ 
    {
	"student_roll_no" : 101,
	"details" : {
        		"name": "Kamal",
        		"age": 12,
        		"grade": "A"
    		    }
    },
    {
	"student_roll_no" : 102,
	"details" : {
        		"name": "Hashim",
        		"age": 18,
        		"grade": "B"
    		    }
    }
]

```

In [6]:
# Let us read a json file having nested records
data = pd.read_json('datasets/nested.json')
data

Unnamed: 0,student_roll_no,details
0,101,"{'name': 'Kamal', 'age': 12, 'grade': 'A'}"
1,102,"{'name': 'Hashim', 'age': 18, 'grade': 'B'}"
2,103,"{'name': 'Salman', 'age': 11, 'grade': 'A'}"
3,104,"{'name': 'Mazhar', 'age': 12, 'grade': 'C'}"
4,105,"{'name': 'Eisha', 'age': 13, 'grade': 'B'}"
5,106,"{'name': 'Farhan', 'age': 22, 'grade': 'C'}"
6,107,"{'name': 'Mohsin', 'age': 11, 'grade': 'A'}"
7,108,"{'name': 'Bilal', 'age': 19, 'grade': 'A'}"
8,109,"{'name': 'Ishaan', 'age': 10, 'grade': 'D'}"
9,110,"{'name': 'Zalaid', 'age': 9, 'grade': 'B'}"


In [7]:
# to read nested json file, we need to import json module
import json

# Json nested file can be read just like a text file, open the file using with statement and open function
# and then inside the loop use json.load method to read the nested json file
with open('datasets/nested.json', 'r') as f :
    my_json_data = json.load(f)

my_json_data

[{'student_roll_no': 101,
  'details': {'name': 'Kamal', 'age': 12, 'grade': 'A'}},
 {'student_roll_no': 102,
  'details': {'name': 'Hashim', 'age': 18, 'grade': 'B'}},
 {'student_roll_no': 103,
  'details': {'name': 'Salman', 'age': 11, 'grade': 'A'}},
 {'student_roll_no': 104,
  'details': {'name': 'Mazhar', 'age': 12, 'grade': 'C'}},
 {'student_roll_no': 105,
  'details': {'name': 'Eisha', 'age': 13, 'grade': 'B'}},
 {'student_roll_no': 106,
  'details': {'name': 'Farhan', 'age': 22, 'grade': 'C'}},
 {'student_roll_no': 107,
  'details': {'name': 'Mohsin', 'age': 11, 'grade': 'A'}},
 {'student_roll_no': 108,
  'details': {'name': 'Bilal', 'age': 19, 'grade': 'A'}},
 {'student_roll_no': 109,
  'details': {'name': 'Ishaan', 'age': 10, 'grade': 'D'}},
 {'student_roll_no': 110,
  'details': {'name': 'Zalaid', 'age': 9, 'grade': 'B'}}]

## 4. Iterate Through JSON File

In [8]:
# Note that my_json_data contains all the json records, in json list form.
# let's try to access first element of the list using the zeroth index

first_record = my_json_data[0]
first_record
# this is how a first element of the list look like

{'student_roll_no': 101, 'details': {'name': 'Kamal', 'age': 12, 'grade': 'A'}}

In [9]:
# now to access the age, access the detail part 
first_record['details']

{'name': 'Kamal', 'age': 12, 'grade': 'A'}

In [10]:
# now just to access the aga, use detail, age
first_record['details']['age']

12

In [11]:
# using the above method, you can iterate through entire json file

for data in my_json_data:
    print("Rollno: %s, Name: %s, Age: %s, Grade: %s" % \
          (data['student_roll_no'], data['details']['name'], data['details']['age'], data['details']['grade'])\
         )

Rollno: 101, Name: Kamal, Age: 12, Grade: A
Rollno: 102, Name: Hashim, Age: 18, Grade: B
Rollno: 103, Name: Salman, Age: 11, Grade: A
Rollno: 104, Name: Mazhar, Age: 12, Grade: C
Rollno: 105, Name: Eisha, Age: 13, Grade: B
Rollno: 106, Name: Farhan, Age: 22, Grade: C
Rollno: 107, Name: Mohsin, Age: 11, Grade: A
Rollno: 108, Name: Bilal, Age: 19, Grade: A
Rollno: 109, Name: Ishaan, Age: 10, Grade: D
Rollno: 110, Name: Zalaid, Age: 9, Grade: B


## 5. Iterate Through JSON File and Filter Data
Suppose, you want to Create a new json file containing the age and name of the people whose age is greater than 15.

In [12]:
# first step is to create a new empty list to store the filtered data
filtered_data = []

# in this list, we are going to store records in the form of dictionary

# iterate through the json data
for data in my_json_data:
    
    # create new empty dictionary
    filtered_variable = {}
    
    # check for the condition
    if data['details']['age'] > 15:
        # if condition satisfies, store the age and name
        filtered_variable['age'] = data['details']['age']
        filtered_variable['name']= data['details']['name']
        filtered_data.append(filtered_variable)

filtered_data

[{'age': 18, 'name': 'Hashim'},
 {'age': 22, 'name': 'Farhan'},
 {'age': 19, 'name': 'Bilal'}]

## 6. Write filtered data to the new json file

In [13]:
# put the filtered data into the new json file
# open the file in write mode using with statement
   
with open('datasets/filtered.json','w') as f:
    json.dump(filtered_data, f, indent=4)

In [14]:
# Verify
df  = pd.read_json('datasets/filtered.json')
df

Unnamed: 0,age,name
0,18,Hashim
1,22,Farhan
2,19,Bilal
