# 🧬 Working with JSON Data in Python

**Welcome!** This notebook explores how to effectively work with JSON (JavaScript Object Notation) data in Python using the built-in `json` module and modern best practices. JSON is the de facto standard for data exchange on the web and is widely used for configuration files, APIs, and more.

**Target Audience:** Python developers needing to serialize Python objects to JSON or deserialize JSON data into Python objects.

**Learning Objectives:**
*   Understand the structure and data types of JSON.
*   Serialize Python objects (like dicts, lists) into JSON strings (`dumps`, `dump`).
*   Deserialize JSON strings or files into Python objects (`loads`, `load`).
*   Handle custom Python objects during serialization and deserialization.
*   Leverage modern techniques like `dataclasses` for easier JSON handling.
*   Understand best practices for performance, security, and schema validation.
*   Identify common pitfalls and prepare for related interview questions.

## 1. Introduction: What is JSON?

**JSON (JavaScript Object Notation)** is a lightweight, text-based, human-readable data interchange format. It was derived from JavaScript, but it's language-independent and supported by virtually all modern programming languages.

**Why is it so popular?**

*   **Human-Readable:** Easy for developers to read and write.
*   **Machine-Parsable:** Easy for software to parse and generate.
*   **Widely Used:** The standard for web APIs (REST APIs), configuration files, and data storage/transfer in many applications.
*   **Language Independent:** Works across different programming languages.

**Analogy: Universal Data Translator**

Think of JSON as a universal language for simple data structures. Python has its way of representing lists and dictionaries, Java has its own, JavaScript has its own. JSON provides a common format that all these languages can translate their basic data structures into (serialization) and translate back from (deserialization), allowing them to communicate data effectively.

**Common Use Cases:**
*   **Web APIs:** Receiving data from or sending data to web services.
*   **Configuration Files:** Storing application settings.
*   **Data Storage:** Simple data persistence (though databases are usually better for complex scenarios).
*   **Inter-process Communication:** Exchanging data between different programs or services.

## 2. JSON Structure and Data Types

JSON data consists of **key-value pairs** (like Python dictionaries) and **ordered sequences** (like Python lists).

**JSON Data Types:**

| JSON Type        | Description                                     | Example                     |
| :--------------- | :---------------------------------------------- | :-------------------------- |
| `object`         | An unordered collection of key/value pairs. Keys must be strings, values can be any JSON type. | `{"name": "Alice", "age": 30}` |
| `array`          | An ordered sequence of values. Values can be any JSON type. | `["apple", "banana", 123]` |
| `string`         | A sequence of Unicode characters, enclosed in double quotes (`"`). | `"Hello, World!"`         |
| `number`         | An integer or floating-point number.            | `123`, `-4.56`, `1.2e3`       |
| `boolean`        | `true` or `false`.                              | `true`, `false`             |
| `null`           | Represents an empty or non-existent value.      | `null`                      |

**Example JSON Document:**
```json
{
    "firstName": "Jane",
    "lastName": "Doe",
    "isActive": true,
    "age": 28,
    "address": {
        "streetAddress": "123 Main St",
        "city": "Anytown",
        "postalCode": "12345"
    },
    "phoneNumbers": [
        {
            "type": "home",
            "number": "555-1234"
        },
        {
            "type": "work",
            "number": "555-5678"
        }
    ],
    "spouse": null
}
```

## 3. Python to JSON: Serialization (Encoding)

Serialization means converting a Python object into a JSON formatted string or writing it to a file.

**Core Functions:**
*   `json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)`: Serializes a Python object `obj` to a JSON formatted **string**.
*   `json.dump(obj, fp, *, ...)`: Serializes `obj` and writes it to a **file-like object** `fp` (opened in text mode, usually with UTF-8 encoding).

**Python to JSON Type Conversion:**

| Python        | JSON           |
| :------------ | :------------- |
| `dict`        | `object`       |
| `list`, `tuple`| `array`        |
| `str`         | `string`       |
| `int`, `float`| `number`       |
| `True`        | `true`         |
| `False`       | `false`        |
| `None`        | `null`         |


### 3.1 Using `json.dumps()` (Object to String)

In [1]:
import json
from decimal import Decimal # Example of a non-standard type

person_data = {
    "name": "Alice",
    "age": 30,
    "isStudent": False,
    "courses": [
        {"title": "History 101", "credits": 3},
        {"title": "Math 202", "credits": 4}
    ],
    "address": None,
    # Decimals are not directly serializable by default
    # "balance": Decimal("123.45") 
}

# --- Basic Serialization --- 
json_string_compact = json.dumps(person_data)
print("--- Compact JSON String ---")
print(json_string_compact)
# Output: {"name": "Alice", "age": 30, "isStudent": false, "courses": [{"title": "History 101", "credits": 3}, {"title": "Math 202", "credits": 4}], "address": null}

# --- Pretty-Printing with Indentation --- 
# indent: Adds newlines and spaces for readability
# sort_keys: Sorts dictionary keys alphabetically (useful for consistent output)
json_string_pretty = json.dumps(person_data, indent=4, sort_keys=True)
print("\n--- Pretty-Printed JSON String ---")
print(json_string_pretty)
# Output:
# {
#     "address": null,
#     "age": 30,
#     "courses": [
#         {
#             "credits": 3,
#             "title": "History 101"
#         },
#         {
#             "credits": 4,
#             "title": "Math 202"
#         }
#     ],
#     "isStudent": false,
#     "name": "Alice"
# }

# --- Customizing Separators --- 
# (Less common, can produce more compact output)
json_string_custom_sep = json.dumps(person_data, separators=(',', ':')) # No extra spaces
print("\n--- JSON String with Custom Separators ---")
print(json_string_custom_sep)
# Output: {"name":"Alice","age":30,"isStudent":false,"courses":[{"title":"History 101","credits":3},{"title":"Math 202","credits":4}],"address":null}

# --- Handling Non-ASCII Characters --- 
data_with_unicode = {"name": "Björn", "city": "München"}
# ensure_ascii=True (Default): Escapes non-ASCII chars
json_unicode_escaped = json.dumps(data_with_unicode, indent=2)
print("\n--- JSON with Escaped Unicode (ensure_ascii=True) ---")
print(json_unicode_escaped)
# Output:
# {
#   "name": "Bj\u00f6rn",
#   "city": "M\u00fcnchen"
# }

# ensure_ascii=False: Keeps non-ASCII chars (requires UTF-8 handling)
json_unicode_native = json.dumps(data_with_unicode, indent=2, ensure_ascii=False)
print("\n--- JSON with Native Unicode (ensure_ascii=False) ---")
print(json_unicode_native)
# Output:
# {
#   "name": "Björn",
#   "city": "München"
# }

--- Compact JSON String ---
{"name": "Alice", "age": 30, "isStudent": false, "courses": [{"title": "History 101", "credits": 3}, {"title": "Math 202", "credits": 4}], "address": null}

--- Pretty-Printed JSON String ---
{
    "address": null,
    "age": 30,
    "courses": [
        {
            "credits": 3,
            "title": "History 101"
        },
        {
            "credits": 4,
            "title": "Math 202"
        }
    ],
    "isStudent": false,
    "name": "Alice"
}

--- JSON String with Custom Separators ---
{"name":"Alice","age":30,"isStudent":false,"courses":[{"title":"History 101","credits":3},{"title":"Math 202","credits":4}],"address":null}

--- JSON with Escaped Unicode (ensure_ascii=True) ---
{
  "name": "Bj\u00f6rn",
  "city": "M\u00fcnchen"
}

--- JSON with Native Unicode (ensure_ascii=False) ---
{
  "name": "Björn",
  "city": "München"
}


### 3.2 Using `json.dump()` (Object to File)

Always use a context manager (`with open(...)`) to ensure the file is properly closed.

In [2]:
import json

person_data = {
    "name": "Bob",
    "age": 42,
    "city": "London",
    "occupation": "Engineer"
}

file_path = 'person_data.json'

try:
    # Use 'w' mode for writing, specify UTF-8 encoding (good practice)
    with open(file_path, 'w', encoding='utf-8') as json_file:
        # Use dump() to write directly to the file object
        # Pretty-print using indent for readability in the file
        json.dump(person_data, json_file, indent=4, ensure_ascii=False) 
        print(f"\nSuccessfully wrote data to {file_path}")

except IOError as e:
    print(f"Error writing to file {file_path}: {e}")

# You can now check the contents of 'person_data.json'


Successfully wrote data to person_data.json


## 4. JSON to Python: Deserialization (Decoding)

Deserialization means converting a JSON formatted string or reading from a file into a Python object (usually dictionaries and lists).

**Core Functions:**
*   `json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)`: Deserializes a JSON **string** `s` into a Python object.
*   `json.load(fp, *, ...)`: Deserializes JSON data read from a **file-like object** `fp` (opened in text mode, usually with UTF-8 encoding).

**JSON to Python Type Conversion:**

| JSON           | Python        |
| :------------- | :------------ |
| `object`       | `dict`        |
| `array`        | `list`        |
| `string`       | `str`         |
| `number` (int) | `int`         |
| `number` (real)| `float`       |
| `true`         | `True`        |
| `false`        | `False`       |
| `null`         | `None`        |

### 4.1 Using `json.loads()` (String to Object)

In [3]:
import json

json_string = '''
{
    "name": "Charlie",
    "id": 789,
    "enabled": true,
    "items": ["widget", "gadget"],
    "metadata": null
}
'''

invalid_json_string = '{"name": "Eve", "age": }' # Malformed JSON

# --- Basic Deserialization --- 
try:
    python_dict = json.loads(json_string)
    print("--- Deserialized Python Object ---")
    print(python_dict)
    print(f"Type: {type(python_dict)}")
    print(f"Name: {python_dict['name']}")
    print(f"Items type: {type(python_dict['items'])}")

except json.JSONDecodeError as e:
    print(f"\nError decoding basic JSON: {e}")

# --- Handling Decoding Errors --- 
try:
    python_dict_invalid = json.loads(invalid_json_string)
    print("\n--- Invalid JSON Deserialized (Should not happen) ---")
    print(python_dict_invalid)
except json.JSONDecodeError as e:
    # This is expected
    print(f"\nSuccessfully caught error decoding invalid JSON: {e}")
    # Example output: Successfully caught error decoding invalid JSON: Expecting value: line 1 column 21 (char 20)

--- Deserialized Python Object ---
{'name': 'Charlie', 'id': 789, 'enabled': True, 'items': ['widget', 'gadget'], 'metadata': None}
Type: <class 'dict'>
Name: Charlie
Items type: <class 'list'>

Successfully caught error decoding invalid JSON: Expecting value: line 1 column 24 (char 23)


### 4.2 Using `json.load()` (File to Object)

In [4]:
import json

# Assuming 'person_data.json' was created in the previous step
file_path = 'person_data.json'
non_existent_file = 'no_such_file.json'

# --- Reading from Existing File ---
try:
    # Use 'r' mode for reading, specify UTF-8 encoding
    with open(file_path, 'r', encoding='utf-8') as json_file:
        # Use load() to read from the file object
        loaded_data = json.load(json_file)
        print(f"\n--- Data Loaded from {file_path} ---")
        print(loaded_data)
        print(f"Type: {type(loaded_data)}")
        print(f"Name: {loaded_data.get('name', 'N/A')}") # Use .get for safe access

except FileNotFoundError:
    print(f"\nError: File not found at {file_path}")
except json.JSONDecodeError as e:
    print(f"\nError decoding JSON from file {file_path}: {e}")
except IOError as e:
    print(f"\nError reading file {file_path}: {e}")

# --- Handling File Not Found --- 
try:
    with open(non_existent_file, 'r', encoding='utf-8') as json_file:
        loaded_data_non_existent = json.load(json_file)
except FileNotFoundError:
    # This is expected
    print(f"\nSuccessfully caught error: File not found at {non_existent_file}")


--- Data Loaded from person_data.json ---
{'name': 'Bob', 'age': 42, 'city': 'London', 'occupation': 'Engineer'}
Type: <class 'dict'>
Name: Bob

Successfully caught error: File not found at no_such_file.json


## 5. Handling Custom Python Objects

The default `json` functions only know how to handle standard Python types. Trying to serialize a custom class instance will raise a `TypeError`.

**Traditional Approaches (Less Preferred Now):**

1.  **Custom Encoder Function (`default`):** Pass a function to `json.dumps(default=...)`. This function is called for objects the encoder doesn't recognize. It should return a serializable representation (e.g., a dictionary).
2.  **Custom Encoder Class (`cls`):** Subclass `json.JSONEncoder` and override the `default()` method. Pass this class to `json.dumps(cls=...)`.
3.  **Custom Decoder Hook (`object_hook`):** Pass a function to `json.loads(object_hook=...)`. This function is called with the dictionary result of decoding any JSON object. It can check for specific keys (like a `"__class__"` key added during encoding) and return a custom object instance instead of the dictionary.

**Modern Approach (Recommended): `dataclasses` or `pydantic`**

Libraries like `dataclasses` (built-in since Python 3.7) and `pydantic` simplify this significantly. They provide ways to define structured data classes, and often integrate well with JSON serialization/deserialization libraries or have built-in mechanisms.

### 5.1 Traditional Approach (Illustrative)

In [5]:
import json
from datetime import datetime

class Task:
    def __init__(self, description: str, due_date: datetime, completed: bool = False):
        self.description = description
        self.due_date = due_date
        self.completed = completed

    def __repr__(self):
        status = '✓' if self.completed else '✗'
        return f"Task(description='{self.description}', due='{self.due_date.isoformat()}', status='{status}')"

# --- Custom Encoder Function --- 
def custom_encoder(obj):
    if isinstance(obj, datetime):
        return {"__datetime__": True, "iso_format": obj.isoformat()} # Add metadata
    elif isinstance(obj, Task):
        return {
            "__task__": True, # Add metadata
            "description": obj.description,
            # Recursively call dumps on the due_date, which will use custom_encoder again
            "due_date": json.dumps(obj.due_date, default=custom_encoder),
            "completed": obj.completed
        }
    # Let the default encoder handle standard types or raise TypeError
    # A more robust implementation might check the type explicitly first
    # For simplicity, we assume it's handled or raises error
    try:
        # Check if serializable, otherwise raise error
        # This simple check is not foolproof
        json.dumps(obj) 
        return obj
    except TypeError:
       raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")

task1 = Task("Buy groceries", datetime(2023, 10, 27, 18, 0, 0))

try:
    # This would fail without the 'default' function
    # json_task_string_fail = json.dumps(task1, indent=2)
    
    json_task_string = json.dumps(task1, default=custom_encoder, indent=2)
    print("--- Serialized Custom Object (Traditional) ---")
    print(json_task_string)
    # Output shows nested JSON string for date due to recursive dumps call in encoder
    # {
    #   "__task__": true,
    #   "description": "Buy groceries",
    #   "due_date": "{\"__datetime__\": true, \"iso_format\": \"2023-10-27T18:00:00\"}",
    #   "completed": false
    # }

except TypeError as e:
    print(f"Serialization error: {e}")

# --- Custom Decoder Hook --- 
def custom_decoder(dct):
    if "__datetime__" in dct:
        return datetime.fromisoformat(dct["iso_format"])
    elif "__task__" in dct:
        # Need to decode the nested JSON string for due_date first
        # This highlights the complexity of the traditional approach
        due_date_obj = json.loads(dct["due_date"], object_hook=custom_decoder) 
        return Task(dct["description"], due_date_obj, dct["completed"])
    return dct # Return dict as is if not a recognized custom type

try:
    # Now deserialize the string back into a Task object
    task_object = json.loads(json_task_string, object_hook=custom_decoder)
    print("\n--- Deserialized Custom Object (Traditional) ---")
    print(task_object)
    print(f"Type: {type(task_object)}")
    print(f"Due date type: {type(task_object.due_date)}")

except json.JSONDecodeError as e:
    print(f"Deserialization error: {e}")
except Exception as e:
     print(f"An unexpected error occurred during deserialization: {e}")

--- Serialized Custom Object (Traditional) ---
{
  "__task__": true,
  "description": "Buy groceries",
  "due_date": "{\"__datetime__\": true, \"iso_format\": \"2023-10-27T18:00:00\"}",
  "completed": false
}

--- Deserialized Custom Object (Traditional) ---
Task(description='Buy groceries', due='2023-10-27T18:00:00', status='✗')
Type: <class '__main__.Task'>
Due date type: <class 'datetime.datetime'>


*(Self-correction: The traditional approach shown above is quite complex, especially with nested custom objects. It requires careful handling of encoding/decoding metadata and potential recursion. This motivates the use of modern libraries.)*

### 5.2 Modern Approach: Using `dataclasses`

`dataclasses` provide a decorator and functions for automatically adding special methods like `__init__` and `__repr__` to user-defined classes. They can be easily converted to/from dictionaries, which simplifies JSON handling.

We can combine `dataclasses` with the `default` and `object_hook` mechanism, but make it much cleaner.

In [6]:
import json
import dataclasses
from datetime import datetime
from typing import List, Optional

# Define data structures using dataclasses
@dataclasses.dataclass
class Address:
    street: str
    city: str
    zip_code: str

@dataclasses.dataclass
class Person:
    name: str
    age: int
    is_active: bool
    # Use Optional for fields that might be null/None
    address: Optional[Address] = None 
    # Store dates as ISO strings for direct JSON compatibility
    created_at: str = dataclasses.field(default_factory=lambda: datetime.utcnow().isoformat())

# --- Enhanced Encoder using dataclasses.asdict --- 
class DataclassJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if dataclasses.is_dataclass(obj):
            # Convert dataclass instance to a dictionary
            return dataclasses.asdict(obj)
        # Let the base class default method handle others or raise TypeError
        return super().default(obj)

# Create instances
address1 = Address(street="456 Oak Ave", city="Somewhere", zip_code="67890")
person1 = Person(name="David", age=25, is_active=True, address=address1)
person2 = Person(name="Eve", age=35, is_active=False) # No address

# Serialize using the custom encoder
try:
    json_person1 = json.dumps(person1, cls=DataclassJSONEncoder, indent=2)
    json_person2 = json.dumps(person2, cls=DataclassJSONEncoder, indent=2)
    
    print("--- Serialized Dataclass Object (Modern) ---")
    print(json_person1)
    print("\n")
    print(json_person2)

except TypeError as e:
    print(f"Serialization error: {e}")

# --- Decoding (Simpler as result is dict, manual conversion needed) ---
# Note: `loads` returns dicts. You'd typically write a function 
#       to convert the dict back to your dataclass instance if needed.
try:
    loaded_dict1 = json.loads(json_person1)
    print("\n--- Deserialized to Dictionary (Modern) ---")
    print(loaded_dict1)
    print(f"Type: {type(loaded_dict1)}")

    # Manual conversion back to dataclass (if necessary)
    # Requires handling nested structures correctly
    def dict_to_person(data: dict) -> Person:
        addr_data = data.get('address')
        address_obj = Address(**addr_data) if addr_data else None
        return Person(
            name=data['name'], 
            age=data['age'], 
            is_active=data['is_active'],
            address=address_obj,
            created_at=data['created_at']
        )
        
    person_obj_from_json = dict_to_person(loaded_dict1)
    print("\n--- Converted back to Dataclass Object ---")
    print(person_obj_from_json)
    print(f"Type: {type(person_obj_from_json)}")
    print(f"Address type: {type(person_obj_from_json.address)}")

except json.JSONDecodeError as e:
    print(f"Deserialization error: {e}")
except Exception as e:
    print(f"Error during conversion: {e}")

--- Serialized Dataclass Object (Modern) ---
{
  "name": "David",
  "age": 25,
  "is_active": true,
  "address": {
    "street": "456 Oak Ave",
    "city": "Somewhere",
    "zip_code": "67890"
  },
  "created_at": "2025-04-20T11:01:56.632533"
}


{
  "name": "Eve",
  "age": 35,
  "is_active": false,
  "address": null,
  "created_at": "2025-04-20T11:01:56.632771"
}

--- Deserialized to Dictionary (Modern) ---
{'name': 'David', 'age': 25, 'is_active': True, 'address': {'street': '456 Oak Ave', 'city': 'Somewhere', 'zip_code': '67890'}, 'created_at': '2025-04-20T11:01:56.632533'}
Type: <class 'dict'>

--- Converted back to Dataclass Object ---
Person(name='David', age=25, is_active=True, address=Address(street='456 Oak Ave', city='Somewhere', zip_code='67890'), created_at='2025-04-20T11:01:56.632533')
Type: <class '__main__.Person'>
Address type: <class '__main__.Address'>


  created_at: str = dataclasses.field(default_factory=lambda: datetime.utcnow().isoformat())


**Note on Pydantic:** Libraries like `pydantic` take this further, offering built-in methods like `.model_dump_json()` and `.model_validate_json()` that handle serialization, deserialization, *and* data validation based on type hints, often reducing boilerplate code even more than `dataclasses` alone.

## 6. Performance, Schema Validation & Security

### 6.1 Performance
*   Python's built-in `json` module is implemented in C for CPython, making it reasonably fast for most use cases.
*   For extreme performance needs (e.g., parsing massive JSON files or high-throughput APIs), libraries like `orjson` or `ultrajson` can offer significant speedups as they are highly optimized Rust/C implementations.
    ```python
    # Example using orjson (install with: pip install orjson)
    # import orjson 
    # serialized = orjson.dumps(my_object) # Often much faster
    # deserialized = orjson.loads(json_bytes) 
    ```

### 6.2 Schema Validation
*   **Why?** When receiving JSON from external sources (APIs, user input), you cannot assume it conforms to your expected structure. Missing keys, incorrect types, or unexpected values can cause errors.
*   **How?** Use libraries like `jsonschema` to validate incoming JSON data against a predefined JSON Schema definition.
    ```python
    # Example using jsonschema (install with: pip install jsonschema)
    # from jsonschema import validate
    # 
    # schema = {
    #    "type": "object",
    #    "properties": {
    #        "name": {"type": "string"},
    #        "age": {"type": "integer", "minimum": 0}
    #    },
    #    "required": ["name", "age"]
    # }
    # 
    # try:
    #     validate(instance=loaded_data, schema=schema)
    #     print("JSON data is valid!")
    # except jsonschema.exceptions.ValidationError as err:
    #     print(f"JSON validation error: {err.message}")
    ```
*   Libraries like `pydantic` perform validation automatically during deserialization based on your data class definitions.

### 6.3 Security
*   **Never use `eval()`** to parse JSON-like data from untrusted sources. It can execute arbitrary code.
*   **Validate Input:** Always validate JSON received from external sources (schema validation, type checks).
*   **Resource Limits:** Be cautious when parsing very large JSON files, as they can consume significant memory. Consider streaming parsers (like `ijson`) for large datasets if memory is a concern.
*   **Serialization of Sensitive Data:** Ensure you don't accidentally serialize sensitive information (passwords, keys) into JSON logs or API responses.

## 7. Best Practices & Enterprise Considerations

1.  **Use `with open(...)`:** Always use context managers for file operations.
2.  **Specify Encoding (`utf-8`):** Explicitly set `encoding='utf-8'` when opening files for JSON to avoid issues across different platforms.
3.  **Handle Errors Gracefully:** Wrap `json.load(s)` and `json.dump(s)` calls in `try...except` blocks to catch `JSONDecodeError`, `TypeError`, `FileNotFoundError`, `IOError`, etc.
4.  **Use `dataclasses` or `pydantic`:** Prefer these for defining data structures that need JSON conversion over manual dictionary manipulation, especially for complex/nested data.
5.  **Validate External Data:** Use schema validation (`jsonschema`) or validation-aware libraries (`pydantic`) for data from untrusted sources.
6.  **Consistent Formatting:** Use `indent` and `sort_keys=True` during serialization for human-readable and diff-friendly output where appropriate (e.g., config files).
7.  **Consider Performance:** For high-performance needs, evaluate libraries like `orjson`.
8.  **API Design:** When designing APIs, clearly document the expected JSON structure (using standards like OpenAPI/Swagger is recommended).
9.  **Logging:** Be careful not to log overly verbose or sensitive JSON data directly.

## 8. Pitfalls and Common Interview Questions

**Common Pitfalls:**

*   **`TypeError` on Serialization:** Trying to serialize objects not supported by default (custom classes, sets, datetimes without a `default` handler).
*   **`JSONDecodeError`:** Trying to load malformed or empty JSON strings/files.
*   **Forgetting Encoding:** Reading/writing files without specifying `encoding='utf-8'` leading to `UnicodeDecodeError` or incorrect characters.
*   **Type Loss:** Python `tuple`s become JSON `array`s and deserialize back into Python `list`s. Python `set`s are not directly serializable.
*   **Integer Key Conversion:** JSON object keys *must* be strings. Python dictionaries can have integer keys, but `json.dumps` will convert them to strings.
*   **Security Risks:** Using `eval()`, not validating input.
*   **Performance Issues:** Loading huge JSON files entirely into memory.

**Common Interview Questions:**

1.  What is JSON? Why is it commonly used?
2.  What are the main data types in JSON?
3.  How do you convert a Python dictionary to a JSON string? (Mention `json.dumps`).
4.  How do you read JSON data from a file into Python? (Mention `json.load`).
5.  How do you make JSON output more readable? (Mention `indent`).
6.  What happens if you try to serialize a custom Python object using `json.dumps`? How can you handle this? (Mention `default`, `cls`, `dataclasses`/`pydantic`).
7.  What error might you get when trying to load an invalid JSON string? (`JSONDecodeError`).
8.  Why is it important to specify file encoding when working with JSON files?
9.  What are some security concerns when handling JSON data?
10. How does JSON handle Python's `None`? (Becomes `null`).
11. When might you consider using a library like `orjson` instead of the built-in `json` module?

## 9. Challenge: Configuration Manager

**Goal:** Create a simple configuration manager that loads settings from a JSON file using dataclasses.

**Tasks:**

1.  **Define Dataclasses:** Create dataclasses `DatabaseConfig` and `AppConfig`. 
    *   `DatabaseConfig` should have `host` (str), `port` (int), `username` (str), `password` (str).
    *   `AppConfig` should have `debug_mode` (bool), `log_level` (str, e.g., "INFO"), and `db` (an instance of `DatabaseConfig`).
2.  **Create Sample JSON:** Create a JSON file (`config.json`) representing a valid `AppConfig` structure.
3.  **Write Loading Function:** Create a function `load_config(filepath: str) -> AppConfig` that:
    *   Takes the JSON file path as input.
    *   Reads the JSON file.
    *   Deserializes the JSON into a dictionary.
    *   Handles potential `FileNotFoundError` and `json.JSONDecodeError`.
    *   Converts the loaded dictionary into nested `AppConfig` and `DatabaseConfig` dataclass instances.
    *   Returns the `AppConfig` instance.
4.  **Test:** Call your `load_config` function and print the loaded configuration object's attributes to verify it works.

**(Bonus):** Add a function `save_config(config: AppConfig, filepath: str)` that serializes the `AppConfig` object back to a JSON file using the `DataclassJSONEncoder` from earlier or a similar approach.

In [7]:
# --- Solution Space for Challenge ---
import json
import dataclasses
from typing import Optional # Optional not strictly needed for challenge but good practice

# 1. Define Dataclasses
@dataclasses.dataclass
class DatabaseConfig:
    host: str
    port: int
    username: str
    password: str # In real apps, handle secrets securely!

@dataclasses.dataclass
class AppConfig:
    debug_mode: bool
    log_level: str
    db: DatabaseConfig

# 2. Create Sample JSON (Do this manually or programmatically)
sample_config_dict = {
    "debug_mode": True,
    "log_level": "DEBUG",
    "db": {
        "host": "localhost",
        "port": 5432,
        "username": "admin",
        "password": "secret123"
    }
}
config_filepath = 'challenge_config.json'
try:
    with open(config_filepath, 'w', encoding='utf-8') as f:
        json.dump(sample_config_dict, f, indent=4)
    print(f"Sample config written to {config_filepath}")
except IOError as e:
    print(f"Error writing sample config: {e}")

# 3. Write Loading Function
def load_config(filepath: str) -> Optional[AppConfig]:
    """Loads application configuration from a JSON file.

    Args:
        filepath: Path to the JSON configuration file.

    Returns:
        An AppConfig instance or None if loading fails.
    """
    try:
        with open(filepath, 'r', encoding='utf-8') as f:
            config_dict = json.load(f)
        
        # Convert nested dictionary to DatabaseConfig instance
        db_config_dict = config_dict.get('db')
        if not isinstance(db_config_dict, dict):
             raise ValueError("Missing or invalid 'db' configuration section.")
        db_config = DatabaseConfig(**db_config_dict)
        
        # Convert top-level dictionary to AppConfig instance
        app_config = AppConfig(
            debug_mode=config_dict.get('debug_mode', False),
            log_level=config_dict.get('log_level', 'INFO'),
            db=db_config
        )
        return app_config

    except FileNotFoundError:
        print(f"Error: Configuration file not found at {filepath}")
        return None
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON from {filepath}: {e}")
        return None
    except (TypeError, ValueError, KeyError) as e:
        # Catch errors during dictionary conversion (missing keys, wrong types)
        print(f"Error processing configuration data from {filepath}: {e}")
        return None
    except IOError as e:
        print(f"Error reading file {filepath}: {e}")
        return None

# 4. Test
loaded_app_config = load_config(config_filepath)

if loaded_app_config:
    print("\n--- Loaded Configuration ---")
    print(f"Debug Mode: {loaded_app_config.debug_mode}")
    print(f"Log Level: {loaded_app_config.log_level}")
    print(f"Database Host: {loaded_app_config.db.host}")
    print(f"Database Port: {loaded_app_config.db.port}")
    print(f"Loaded Config Object: {loaded_app_config}") # Uses dataclass repr
else:
    print("\nFailed to load configuration.")

# (Bonus) Save Function
class DataclassJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if dataclasses.is_dataclass(obj):
            return dataclasses.asdict(obj)
        return super().default(obj)

def save_config(config: AppConfig, filepath: str) -> bool:
   """Saves the AppConfig object to a JSON file."""
   try:
       with open(filepath, 'w', encoding='utf-8') as f:
           json.dump(config, f, cls=DataclassJSONEncoder, indent=4)
       print(f"\nConfiguration saved successfully to {filepath}")
       return True
   except (TypeError, IOError) as e:
       print(f"\nError saving configuration to {filepath}: {e}")
       return False

# Test saving (optional)
# if loaded_app_config:
#    save_config(loaded_app_config, 'saved_config.json')


Sample config written to challenge_config.json

--- Loaded Configuration ---
Debug Mode: True
Log Level: DEBUG
Database Host: localhost
Database Port: 5432
Loaded Config Object: AppConfig(debug_mode=True, log_level='DEBUG', db=DatabaseConfig(host='localhost', port=5432, username='admin', password='secret123'))


## 10. Conclusion

Working with JSON in Python is a fundamental skill. The built-in `json` module provides robust tools for serialization (`dumps`, `dump`) and deserialization (`loads`, `load`). While handling custom objects traditionally required careful implementation of encoders and hooks, modern approaches using `dataclasses` or libraries like `pydantic` significantly streamline the process.

By understanding JSON's structure, Python's conversion mechanisms, and best practices around error handling, validation, and security, you can confidently integrate JSON data into your applications, whether consuming APIs, managing configuration, or exchanging data between systems.