## Advanced JSON Data Exercises

This notebook contains a set of **advanced, realistic problems** around JSON encoding/decoding in Python.

We will focus on:

- Custom serialization using the `default` parameter of `json.dumps`
- Custom deserialization using `object_hook` in `json.loads`
- Handling types like `datetime`, `date`, and `Decimal`
- Designing small, reusable helper functions following best practices
- Simple testing of our solutions using `assert` statements

Each problem is followed by a solution and a few sanity checks.

---

In [1]:
import json
from datetime import datetime, date
from decimal import Decimal
from dataclasses import dataclass
from typing import Any, Dict, Iterable, List, Callable

# For pretty-printing intermediate results in some solutions
def pprint_json(obj: Any) -> None:
    """Pretty-print a Python object as JSON (for debugging / exploration)."""
    print(json.dumps(obj, indent=2, sort_keys=True, ensure_ascii=False))

### Problem 1 – Serializing a Trading Log with Custom Types

You are given a small in-memory trading log. Each trade contains:

- a `symbol` (string)
- a `timestamp` (Python `datetime`)
- a `price` (Python `Decimal`)
- a `quantity` (integer)

#### Task

1. Define a list named `trades` containing 3–4 such trade dictionaries.
2. Implement a function `trading_encoder(obj: Any) -> Any` that:
   - encodes `datetime` objects using `.isoformat()` (e.g. `"2020-09-21T10:15:00"`)
   - encodes `Decimal` objects as **floats rounded to 4 decimal places**
   - raises `TypeError` for any other unsupported type
3. Use `json.dumps` with your encoder to create a **pretty-printed** JSON string:
   - `indent=2`
   - `sort_keys=True`
   - `ensure_ascii=False` (to allow non-ASCII symbols if needed)
4. Store the resulting JSON string in a variable called `trades_json`.

Add a couple of `assert` checks to validate that:
- `"T"` is present in at least one serialized timestamp
- prices are of JSON number type (not strings)

---

#### Solution 1 – Serializing the Trading Log

**Design choices & best practices:**

- We keep the encoder **focused**: it only knows how to handle `datetime` and `Decimal`.
- We raise `TypeError` for unsupported types so that bugs do **not** silently pass.
- We use a dedicated helper function `trading_encoder` and pass it to `json.dumps` via `default`, keeping our code modular and reusable.
- We validate the output with `assert` statements, which act as simple regression tests.

In [2]:
# 1. Define some sample trades
trades = [
    {
        "symbol": "AAPL",
        "timestamp": datetime(2020, 9, 21, 10, 15, 0),
        "price": Decimal("110.2534"),
        "quantity": 100,
    },
    {
        "symbol": "MSFT",
        "timestamp": datetime(2020, 9, 21, 10, 30, 5),
        "price": Decimal("202.98765"),
        "quantity": 50,
    },
    {
        "symbol": "TSLA",
        "timestamp": datetime(2020, 9, 21, 11, 0, 0),
        "price": Decimal("420.1"),
        "quantity": 10,
    },
]


# 2. Implement the custom encoder
def trading_encoder(obj: Any) -> Any:
    """Encoder for datetime and Decimal objects used in trading logs.

    - datetime -> ISO 8601 string
    - Decimal  -> float rounded to 4 decimal places
    """
    if isinstance(obj, datetime):
        return obj.isoformat()
    if isinstance(obj, Decimal):
        return round(float(obj), 4)
    raise TypeError(f"Object of type {type(obj).__name__!r} is not JSON serializable")


# 3. Serialize trades using json.dumps with pretty formatting
trades_json = json.dumps(
    trades,
    default=trading_encoder,
    indent=2,
    sort_keys=True,
    ensure_ascii=False,
)


# 4. Simple checks (lightweight tests)
assert "T" in trades_json, "Expected ISO 8601 datetime representation with 'T' separator."
assert '"price": ' in trades_json, "Expected 'price' field to be serialized."
assert '"price": "' not in trades_json, "Prices should not be serialized as JSON strings."

# Optional: uncomment to visually inspect the JSON
# print(trades_json)

### Problem 2 – Robust JSON File Writer Utility

You often need to serialize Python objects to JSON files using a consistent configuration.

#### Task

Implement a function:

```python
def write_json(
    obj: Any,
    filename: str,
    *,
    default: Callable[[Any], Any] | None = None,
) -> None:
    ...
```

The function should:

1. Open `filename` for writing using a context manager (`with open(...)`).
2. Use `json.dump` with the following defaults:
   - `indent=2`
   - `sort_keys=True`
   - `ensure_ascii=False`
3. Accept an **optional** `default` encoder function. If provided, pass it through to `json.dump`.
4. Propagate errors (do not swallow exceptions).

Then:

1. Use `write_json` to serialize the `trades` list from **Problem 1** into a file called `"trades.json"`, using the `trading_encoder` as `default`.
2. Re-open the file, read its contents back into a string `trades_json_from_file` and verify it is valid JSON using `json.loads`.

Add a couple of `assert` statements to check that:

- `trades_json_from_file` is a non-empty string
- `json.loads(trades_json_from_file)` is a list of the same length as `trades`.

---

#### Solution 2 – A Reusable JSON File Writer

**Design choices & best practices:**

- We expose a **narrow, well-defined API**: the function takes an object and a filename.
- We provide **sensible defaults** for formatting that can be reused across the project.
- We accept an optional `default` parameter to keep the function generic.
- We do not catch and silence exceptions; callers should handle them explicitly.
- We include basic tests to ensure the function works as intended.

In [3]:
from typing import Optional


def write_json(
    obj: Any,
    filename: str,
    *,
    default: Optional[Callable[[Any], Any]] = None,
) -> None:
    """Write a Python object to a JSON file.

    Parameters
    ----------
    obj:
        The Python object to serialize.
    filename:
        Path to the output JSON file.
    default:
        Optional encoder function passed through to json.dump.
    """
    dump_kwargs: Dict[str, Any] = {
        "indent": 2,
        "sort_keys": True,
        "ensure_ascii": False,
    }
    if default is not None:
        dump_kwargs["default"] = default

    with open(filename, "w", encoding="utf-8") as f:
        json.dump(obj, f, **dump_kwargs)


# Use the helper to write trades to disk
write_json(trades, "trades.json", default=trading_encoder)

# Read the file back and validate
with open("trades.json", "r", encoding="utf-8") as f:
    trades_json_from_file = f.read()

assert isinstance(trades_json_from_file, str) and trades_json_from_file.strip(), (
    "Expected a non-empty JSON string read from file."
)

trades_loaded = json.loads(trades_json_from_file)
assert isinstance(trades_loaded, list), "Expected a list after loading JSON."
assert len(trades_loaded) == len(trades), (
    "Loaded list should have the same length as the original 'trades'."
)

# Optional: uncomment to inspect
# pprint_json(trades_loaded)

### Problem 3 – Custom Deserialization with `object_hook`

We now want to **deserialize** JSON data into richer Python objects.

Suppose we receive JSON data for end-of-day stock prices in the following format:

```json
{
  "symbol": "IBM",
  "date": "2020-09-21",
  "open": "120.48",
  "high": "120.70",
  "low": "118.58",
  "close": "120.25",
  "volume": 5205413
}
```

We would like to load such JSON objects into a strongly-typed Python structure.

#### Task

1. Define a dataclass `Bar` with the following fields:
   - `symbol: str`
   - `date: date`
   - `open: Decimal`
   - `high: Decimal`
   - `low: Decimal`
   - `close: Decimal`
   - `volume: int`
2. Implement a function `bar_object_hook(d: Dict[str, Any]) -> Any` suitable for use as `object_hook` in `json.loads` that:
   - Detects whether `d` looks like a Bar record (i.e. it has at least the keys above).
   - If it does, returns a `Bar` instance, converting types appropriately:
     - `date` from `"YYYY-MM-DD"` string using `date.fromisoformat`
     - OHLC fields from strings to `Decimal`
   - Otherwise returns the dict unchanged.
3. Create a JSON string `ibm_bar_json` with the structure above.
4. Use `json.loads(ibm_bar_json, object_hook=bar_object_hook)` and store the result in `ibm_bar`.

Add `assert` checks to verify:

- `isinstance(ibm_bar, Bar)`
- `ibm_bar.date` is a `date` instance
- `ibm_bar.close` is a `Decimal` instance

---

#### Solution 3 – Using `object_hook` for Rich Decoding

**Design choices & best practices:**

- We use a `dataclass` for concise, readable data containers.
- We keep `bar_object_hook` **pure** (no side effects) and narrowly focused on its job.
- The hook is defensive: it only converts dicts that match the expected schema.
- We use `date.fromisoformat` and `Decimal` constructors to parse strings in a robust way.
- We add assertions to verify the created object has the expected types.

In [4]:
@dataclass
class Bar:
    symbol: str
    date: date
    open: Decimal
    high: Decimal
    low: Decimal
    close: Decimal
    volume: int


def bar_object_hook(d: Dict[str, Any]) -> Any:
    """object_hook that converts suitable dicts into Bar instances.

    The hook is called for every dict decoded by json.loads. If the dict
    contains the keys of a Bar record, we construct a Bar instance; otherwise
    we return the dict unchanged.
    """
    expected_keys = {"symbol", "date", "open", "high", "low", "close", "volume"}
    if expected_keys.issubset(d.keys()):
        return Bar(
            symbol=d["symbol"],
            date=date.fromisoformat(d["date"]),
            open=Decimal(d["open"]),
            high=Decimal(d["high"]),
            low=Decimal(d["low"]),
            close=Decimal(d["close"]),
            volume=int(d["volume"]),
        )
    return d


# 3. Construct the JSON string
ibm_bar_json = json.dumps(
    {
        "symbol": "IBM",
        "date": "2020-09-21",
        "open": "120.48",
        "high": "120.70",
        "low": "118.58",
        "close": "120.25",
        "volume": 5205413,
    },
    indent=2,
    sort_keys=True,
)

# 4. Decode using the object_hook
ibm_bar = json.loads(ibm_bar_json, object_hook=bar_object_hook)

# Assertions
assert isinstance(ibm_bar, Bar), "Expected ibm_bar to be a Bar instance."
assert isinstance(ibm_bar.date, date), "Expected 'date' to be a datetime.date instance."
assert isinstance(ibm_bar.close, Decimal), "Expected 'close' to be a Decimal instance."

# Optional: uncomment to inspect
# print(ibm_bar)

### Problem 4 – Round-Tripping with Symmetric Encoder/Decoder

For many applications, you want to be able to **encode** Python objects to JSON and then **decode** them back to equivalent Python objects.

We will design a simple but consistent round-trip mechanism for dictionaries that may contain:

- `datetime`
- `date`
- `Decimal`

#### Task

    1. Implement a function `advanced_default_encoder(obj: Any) -> Any` that:
   - For `datetime`, returns a dict: `{"__type__": "datetime", "value": obj.isoformat()}`
   - For `date`, returns a dict: `{"__type__": "date", "value": obj.isoformat()}`
   - For `Decimal`, returns a dict: `{"__type__": "decimal", "value": str(obj)}`
   - Raises `TypeError` for unsupported types.
2. Implement a function `advanced_object_hook(d: Dict[str, Any]) -> Any` that:
   - Detects these special dictionaries (they have a `"__type__"` key).
   - Reconstructs the original Python objects using the corresponding constructors
     (`datetime.fromisoformat`, `date.fromisoformat`, `Decimal`).
   - Returns the dict unchanged if it does not have a `"__type__"` key.
3. Create a dictionary `payload` that contains a mix of these types, for example:
   - `"created_at"`: a `datetime`
   - `"settlement_date"`: a `date`
   - `"notional"`: a `Decimal`
   - some nested structure (e.g. a list of `Decimal` values)
4. Round-trip the data:
   - Serialize: `encoded = json.dumps(payload, default=advanced_default_encoder)`
   - Deserialize: `decoded = json.loads(encoded, object_hook=advanced_object_hook)`

Add `assert` checks that:

- `decoded` is equal (==) to `payload`
- The types of the special values are preserved (`datetime`, `date`, `Decimal`).

*(Note: equality for objects like `datetime` and `Decimal` works as expected here.)*

---

#### Solution 4 – Designing a Symmetric Encoding Scheme

**Design choices & best practices:**

- We tag special values with a `"__type__"` field to avoid ambiguity during decoding.
- We keep the encoding scheme **self-describing**: all information needed to re-create the object is stored in the JSON.
- Using small, focused functions (`advanced_default_encoder` and `advanced_object_hook`) makes testing and reasoning easier.
- We explicitly fail for unknown types instead of silently guessing.
- We verify round-trip correctness using assertions.

In [5]:
def advanced_default_encoder(obj: Any) -> Any:
    """Encode datetime, date, and Decimal with type tags for round-tripping."""
    if isinstance(obj, datetime):
        return {"__type__": "datetime", "value": obj.isoformat()}
    if isinstance(obj, date) and not isinstance(obj, datetime):
        # Important: datetime is a subclass of date, so we must exclude it above.
        return {"__type__": "date", "value": obj.isoformat()}
    if isinstance(obj, Decimal):
        return {"__type__": "decimal", "value": str(obj)}
    raise TypeError(f"Object of type {type(obj).__name__!r} is not JSON serializable")


def advanced_object_hook(d: Dict[str, Any]) -> Any:
    """object_hook paired with advanced_default_encoder.

    It looks for dicts with a "__type__" key and reconstructs the corresponding
    Python objects. All other dicts are returned unchanged.
    """
    type_tag = d.get("__type__")
    if type_tag is None:
        return d

    value = d.get("value")
    if type_tag == "datetime":
        return datetime.fromisoformat(value)
    if type_tag == "date":
        return date.fromisoformat(value)
    if type_tag == "decimal":
        return Decimal(value)

    # Unknown type_tag – return as-is or raise; here, we choose to return as-is.
    return d


# 3. Example payload with nested structure
payload = {
    "created_at": datetime(2021, 1, 5, 14, 30, 45),
    "settlement_date": date(2021, 1, 7),
    "notional": Decimal("1000000.00"),
    "legs": [
        {"side": "buy", "amount": Decimal("500000.00")},
        {"side": "sell", "amount": Decimal("500000.00")},
    ],
}

# 4. Round-trip
encoded = json.dumps(payload, default=advanced_default_encoder)
decoded = json.loads(encoded, object_hook=advanced_object_hook)

# Assertions: structural and type preservation
assert decoded == payload, "Round-tripped object should be equal to the original payload."
assert isinstance(decoded["created_at"], datetime), "Expected 'created_at' to be datetime."
assert isinstance(decoded["settlement_date"], date), "Expected 'settlement_date' to be date."
assert isinstance(decoded["notional"], Decimal), "Expected 'notional' to be Decimal."
assert isinstance(decoded["legs"][0]["amount"], Decimal), "Expected nested 'amount' to be Decimal."

# Optional: uncomment to inspect
# pprint_json(json.loads(encoded))  # Inspect the *encoded* JSON structure

### Problem 5 – Validating JSON Input Against a Simple Schema

In real systems, you often need to **validate** JSON payloads before trusting them.

You receive JSON representing a list of users, like this:

```json
[
  {"id": 1, "name": "Alice", "email": "alice@example.com"},
  {"id": 2, "name": "Bob"}
]
```

The expected schema for each user is:

- `id` – required, integer
- `name` – required, non-empty string
- `email` – optional, if present must be a non-empty string

#### Task

1. Implement a function `validate_user(user: Dict[str, Any]) -> None` that:
   - Raises `ValueError` if any required field is missing or has the wrong type.
   - Raises `ValueError` if `name` is an empty string.
   - Raises `ValueError` if `email` is present but empty or not a string.
2. Implement a function `validate_users_json(json_str: str) -> List[Dict[str, Any]]` that:
   - Parses `json_str` using `json.loads`.
   - Verifies the top-level object is a list.
   - Calls `validate_user` on each element.
   - Returns the parsed list if validation succeeds; otherwise propagates `ValueError`.
3. Test your function with:
   - A valid JSON string `users_ok_json`.
   - An invalid JSON string `users_bad_json` (e.g., with missing `name`).

Use `try/except` around the invalid case to demonstrate that the error is caught and reported.

---

#### Solution 5 – Lightweight Schema Validation

**Design choices & best practices:**

- We keep validation functions small and focused.
- We use **explicit checks** instead of assuming the shape of the data.
- We raise `ValueError` with informative messages instead of returning booleans.
- By separating `validate_user` and `validate_users_json`, we make the code easier to reuse and test.

In [6]:
def validate_user(user: Dict[str, Any]) -> None:
    """Validate a single user dict according to a simple schema.

    Raises ValueError if the user is invalid.
    """
    if not isinstance(user, dict):
        raise ValueError("User must be a dictionary.")

    # id: required int
    if "id" not in user:
        raise ValueError("User is missing required field 'id'.")
    if not isinstance(user["id"], int):
        raise ValueError("Field 'id' must be an integer.")

    # name: required non-empty string
    if "name" not in user:
        raise ValueError("User is missing required field 'name'.")
    if not isinstance(user["name"], str) or not user["name"].strip():
        raise ValueError("Field 'name' must be a non-empty string.")

    # email: optional, non-empty string if present
    if "email" in user:
        email = user["email"]
        if not isinstance(email, str) or not email.strip():
            raise ValueError("Field 'email', if present, must be a non-empty string.")


def validate_users_json(json_str: str) -> List[Dict[str, Any]]:
    """Parse and validate a JSON string containing a list of user objects."""
    data = json.loads(json_str)
    if not isinstance(data, list):
        raise ValueError("Top-level JSON object must be a list of users.")

    for user in data:
        validate_user(user)

    return data


# Valid test case
users_ok_json = json.dumps(
    [
        {"id": 1, "name": "Alice", "email": "alice@example.com"},
        {"id": 2, "name": "Bob"},
    ]
)

validated_users = validate_users_json(users_ok_json)
assert isinstance(validated_users, list) and len(validated_users) == 2

# Invalid test case: missing name
users_bad_json = json.dumps(
    [
        {"id": 1, "name": "Alice", "email": "alice@example.com"},
        {"id": 2},  # missing name
    ]
)

try:
    validate_users_json(users_bad_json)
except ValueError as ex:
    # In a real application, you would log this or return a helpful error response.
    # Here we just store the message for inspection.
    validation_error_message = str(ex)
else:
    raise AssertionError("Expected validation to fail for bad user JSON.")

assert "missing required field 'name'" in validation_error_message

# Optional: uncomment to inspect
# print(validation_error_message)

## Summary

In this advanced JSON notebook we:

- Implemented **custom encoders** using the `default` parameter in `json.dumps`.
- Used **`object_hook`** in `json.loads` to build rich Python objects.
- Designed a symmetric encoding/decoding scheme that supports `datetime`, `date`, and `Decimal` and preserves types across a JSON round-trip.
- Built a small **JSON file writer utility** that centralizes formatting and encoding configuration.
- Implemented **basic schema validation** for JSON payloads using explicit checks and helpful error messages.

These techniques form a strong foundation for working with JSON in real-world Python applications, where you often need:

- Custom handling of domain-specific data types
- Robust, reusable helpers for file I/O
- Clear and maintainable validation logic

Feel free to extend these patterns for more complex schemas, add logging, or integrate with formal JSON schema validation libraries as a next step.