# **Working with JSON Data**

### **What's covered in this notebook?**

1. Introduction to JSON
    - Why JSON?
    - JSON vs CSV
    - Data Types supported by JSON
    - Example of a Simple JSON Document
    - Reading a JSON - json.load() vs json.loads()
    - Accessing JSON Data
    - Modifying JSON Data
    - Writing a JSON - json.dump() vs json.dumps()
2. Understanding Nested JSON Structures
    - Reading a Nested JSON File
    - Iterating over JSON Data
    - Extracting Data From Nested JSON
    - Adding and Updating Fields
    - Handling Missing Keys using .get() Method
    - Filtering JSON Data
    - Writing JSON Data Back to a File
3. Advanced JSON Operations
    - Sorting JSON Data
    - Merging Two JSON Objects
4. Handling JSON Data from APIs
    - Fetching JSON from a REST API
    - Handling API Errors

## **Introduction to JSON**
JSON stands for **JavaScript Object Notation.** 

It is a lightweight data-interchange format that is easy to read and write for humans and efficient for machines to parse and generate. 

JSON is **commonly used in APIs and real-time data processing.**

Even though JSON originated from JavaScript, it is language-independent. Today, almost all programming languages, including Python, Java, C++, Go, and SQL, support JSON.

### **Why JSON?**

- **Human-readable** – Easy to understand and edit
- **Lightweight** – Less data overhead compared to XML
- **Universal** – Works across different programming languages
- **Structured** – Well-defined format (key-value pairs)
- **API Friendly** – Most web services return data in JSON

### **JSON vs CSV**
JSON is more flexible than CSV, making it ideal for **APIs**, **real-time data**, and **analytics**.

| Feature | JSON | CSV | 
|---------|------|-----|
| **Format** | Key-Value Pairs | Rows & Columns | 
| **Readability** | High | Moderate | 
| **Nested Structure Support** | ✅ Yes | ❌ No | 
| **Data Types** | Strings, Numbers, Boolean, Null, Arrays, Objects | Only Strings | 
| **APIs Use Case** | ✅ Yes | ❌ No | 
| **File Size** | Moderate | Small | 

### **Data Types supported by JSON**

JSON supports **six** data types:  
| Type | Example |
|------|---------|
| **String** | `"name": "John"` |
| **Number** | `"age": 25` |
| **Boolean** | `"is_active": true` |
| **Null** | `"address": null` |
| **Array** | `"skills": ["Python", "SQL", "Pandas"]` |
| **Object** (Nested JSON) | `"location": {"city": "New York", "country": "USA"}` |

### **Example of a Simple JSON Document**
```json
{
  "name": "Alice",
  "age": 30,
  "is_student": false,
  "courses": ["Data Science", "Machine Learning"],
  "address": {
    "city": "San Francisco",
    "state": "California",
    "zip": 94107
  }
}
```
**Key Observations:**  
1. JSON consists of **key-value pairs**  
2. Strings must be in **double quotes** (`" "`)  
3. JSON supports **arrays** (`[]`) and **nested objects** (`{}`)  

### **Reading a JSON - json.load() vs json.loads()**

- **json.load(file)**: Reads JSON from a file and **converts JSON into a Python dictionary**
- **json.loads(string)** → Reads JSON from a string and **converts JSON into a Python dictionary**

In [1]:
import json

json_string = '{"name": "Alice", "age": 25, "city": "New York"}'

# Convert JSON string to dictionary
data = json.loads(json_string)

# Print JSON data
print(data)

{'name': 'Alice', 'age': 25, 'city': 'New York'}


In [2]:
import json

# Open the JSON file
with open("data/data.json", "r") as file:
    data = json.load(file)  # Parse JSON into a Python dictionary

# Print JSON data
print(data)

{'employee': {'name': 'John Doe', 'age': 28, 'department': 'Data Science', 'skills': ['Python', 'SQL', 'Pandas'], 'location': {'city': 'Bangalore', 'country': 'India'}}}


### **Accessing JSON Data**
Once JSON is loaded into Python as a dictionary, you can access its elements.

In [3]:
# Access employee name
print("Name:", data["employee"]["name"])

# Access employee department
print("Department:", data["employee"]["department"])

# Access skills (Array)
print("Skills:", data["employee"]["skills"])

# Access city (Nested Object)
print("City:", data["employee"]["location"]["city"])

Name: John Doe
Department: Data Science
Skills: ['Python', 'SQL', 'Pandas']
City: Bangalore


### **Modifying JSON Data**
You can **add, update, or delete** fields dynamically.

In [4]:
# Add a new field
data["employee"]["experience"] = 5

In [5]:
# Update an existing field
data["employee"]["department"] = "Machine Learning"

In [6]:
# Delete a field
del data["employee"]["age"]

### **Writing a JSON - json.dump() vs json.dumps()**

- **json.dump(data, file)**: Writes JSON to a file
- **json.dumps(data, indent=4)**: Write JSON to string. Formats JSON with indentation for better readability

In [7]:
# Save the updated JSON back to a file
with open("data/updated_data.json", "w") as file:
    json.dump(data, file, indent=4)

print("Updated JSON file saved successfully!")

Updated JSON file saved successfully!


In [8]:
# Open the JSON file
with open("data/data.json", "r") as file:
    data = json.load(file)  # Parse JSON into a Python dictionary

# Convert json to string
json_string = json.dumps(data, indent=4)

print(json_string)

{
    "employee": {
        "name": "John Doe",
        "age": 28,
        "department": "Data Science",
        "skills": [
            "Python",
            "SQL",
            "Pandas"
        ],
        "location": {
            "city": "Bangalore",
            "country": "India"
        }
    }
}


In [9]:
# Open the JSON file
with open("data/updated_data.json", "r") as file:
    data = json.load(file)  # Parse JSON into a Python dictionary

# Convert json to string
json_string = json.dumps(data, indent=4)

print(json_string)

{
    "employee": {
        "name": "John Doe",
        "department": "Machine Learning",
        "skills": [
            "Python",
            "SQL",
            "Pandas"
        ],
        "location": {
            "city": "Bangalore",
            "country": "India"
        },
        "experience": 5
    }
}


## **Understanding Nested JSON Structures**

Real-world JSON is often deeply nested. Let’s learn how to extract, modify, and handle missing keys.

### **Reading a Nested JSON File**

In [10]:
import json

# Load JSON file
with open("data/company_data.json", "r") as file:
    data = json.load(file)

# Convert json to string
json_string = json.dumps(data, indent=4)

print(json_string)

{
    "company": "TechCorp",
    "employees": [
        {
            "id": 101,
            "name": "Alice",
            "role": "Data Analyst",
            "skills": [
                "Python",
                "SQL"
            ],
            "projects": {
                "current": "Sales Forecasting",
                "completed": [
                    "Customer Segmentation",
                    "Churn Prediction"
                ]
            }
        },
        {
            "id": 102,
            "name": "Bob",
            "role": "Data Engineer",
            "skills": [
                "Spark",
                "Hadoop"
            ],
            "projects": {
                "current": "ETL Pipeline Optimization",
                "completed": [
                    "Data Warehouse Setup"
                ]
            }
        }
    ]
}


### **Iterating over JSON Data**

In [11]:
# Print all key-value pairs
for key, value in data.items():
    print(f"{key}: {value}")
    print()

company: TechCorp

employees: [{'id': 101, 'name': 'Alice', 'role': 'Data Analyst', 'skills': ['Python', 'SQL'], 'projects': {'current': 'Sales Forecasting', 'completed': ['Customer Segmentation', 'Churn Prediction']}}, {'id': 102, 'name': 'Bob', 'role': 'Data Engineer', 'skills': ['Spark', 'Hadoop'], 'projects': {'current': 'ETL Pipeline Optimization', 'completed': ['Data Warehouse Setup']}}]



### **Extracting Data From Nested JSON**

In [12]:
# Extract company name

print("Company Name:", data["company"])

Company Name: TechCorp


In [13]:
# Extract all employees' names

for emp in data["employees"]:
    print("Employee:", emp["name"])

Employee: Alice
Employee: Bob


In [14]:
# Extract Bob's current project

for emp in data["employees"]:
    if emp["name"] == "Bob":
        print("Bob's Current Project:", emp["projects"]["current"])

Bob's Current Project: ETL Pipeline Optimization


### **Adding and Updating Fields**

In [15]:
# Add Bob's email

for emp in data["employees"]:
    if emp["name"] == "Bob":
        emp["email"] = "bob@techcorp.org"

In [16]:
# Updating Alice's role

for emp in data["employees"]:
    if emp["name"] == "Alice":
        emp["role"] = "Senior Data Analyst"

In [17]:
# Add a new employee

new_employee = {
    "id": 103,
    "name": "Charlie",
    "role": "ML Engineer",
    "skills": ["Python", "TensorFlow", "NLP"],
    "projects": {
        "current": "Chatbot Development",
        "completed": ["Sentiment Analysis"]
    }
}

# Append new employee to the JSON data
data["employees"].append(new_employee)

### **Handling Missing Keys using .get() Method**

In [18]:
# Let's try to get the email ids of all the employees

for emp in data["employees"]:
    print(emp["email"])

KeyError: 'email'

In [19]:
for emp in data["employees"]:
    print(emp.get("email", "Not Available"))

Not Available
bob@techcorp.org
Not Available


### **Filtering JSON Data**

In [20]:
# Get Employees Who Know Python

python_experts = [emp["name"] for emp in data["employees"] if "Python" in emp["skills"]]
print("Python Experts:", python_experts)

Python Experts: ['Alice', 'Charlie']


In [21]:
# Get Employees Working on a Specific Project

target_project = "ETL Pipeline Optimization"
for emp in data["employees"]:
    if emp["projects"]["current"] == target_project:
        print(emp["name"], "is working on", target_project)

Bob is working on ETL Pipeline Optimization


### **Writing JSON Data Back to a File**

In [22]:
# Save back to JSON file
with open("data/updated_company_data.json", "w") as file:
    json.dump(data, file, indent=4)

print("Updated JSON file saved successfully!")

Updated JSON file saved successfully!


## **Advanced JSON Operations**

### **Sorting JSON Data**

In [23]:
sorted_employees = sorted(data["employees"], key=lambda x: x["name"])

print(json.dumps(sorted_employees, indent=4))

[
    {
        "id": 101,
        "name": "Alice",
        "role": "Senior Data Analyst",
        "skills": [
            "Python",
            "SQL"
        ],
        "projects": {
            "current": "Sales Forecasting",
            "completed": [
                "Customer Segmentation",
                "Churn Prediction"
            ]
        }
    },
    {
        "id": 102,
        "name": "Bob",
        "role": "Data Engineer",
        "skills": [
            "Spark",
            "Hadoop"
        ],
        "projects": {
            "current": "ETL Pipeline Optimization",
            "completed": [
                "Data Warehouse Setup"
            ]
        },
        "email": "bob@techcorp.org"
    },
    {
        "id": 103,
        "name": "Charlie",
        "role": "ML Engineer",
        "skills": [
            "Python",
            "TensorFlow",
            "NLP"
        ],
        "projects": {
            "current": "Chatbot Development",
            "completed": [
 

### **Merging Two JSON Objects**

In [24]:
import json

json1 = {"name": "Alice", "age": 25}
json2 = {"city": "New York", "role": "Data Scientist"}

merged_json = {**json1, **json2}

print(json.dumps(merged_json, indent=4))

{
    "name": "Alice",
    "age": 25,
    "city": "New York",
    "role": "Data Scientist"
}


## **Handling JSON Data from APIs**

Now, let’s fetch JSON from an API using Python's requests module.

### **Fetching JSON from a REST API**

In [25]:
import requests

# Fetch JSON from API
url = "https://jsonplaceholder.typicode.com/users"
response = requests.get(url)



In [26]:
# Convert response to JSON
users = response.json()

print("Type of 'user' object:", type(users))

# Print number of user's
print("Number of users:", len(users))

# Print first user's details
print(json.dumps(users[0], indent=4))

Type of 'user' object: <class 'list'>
Number of users: 10
{
    "id": 1,
    "name": "Leanne Graham",
    "username": "Bret",
    "email": "Sincere@april.biz",
    "address": {
        "street": "Kulas Light",
        "suite": "Apt. 556",
        "city": "Gwenborough",
        "zipcode": "92998-3874",
        "geo": {
            "lat": "-37.3159",
            "lng": "81.1496"
        }
    },
    "phone": "1-770-736-8031 x56442",
    "website": "hildegard.org",
    "company": {
        "name": "Romaguera-Crona",
        "catchPhrase": "Multi-layered client-server neural-net",
        "bs": "harness real-time e-markets"
    }
}


### **Handling API Errors**

APIs may fail due to **server errors or wrong URLs**. Always handle errors.

In [27]:
import requests

url = "https://jsonplaceholder.typicode.com/users"
response = requests.get(url)

if response.status_code == 200:
    data = response.json()
    print("Data fetched successfully!")
else:
    print("Error fetching data:", response.status_code)

Data fetched successfully!
