### Notebook 04 strictired outputs & JSON shcema

Objectives:
Extract structured data with json_extract.v1 propmt <br>
validate against JSON schemas<br>
Repair malformed JSOn<br>
Log token costs for validation + repair cycles

In [1]:
import sys
import pprint
sys.path.append('..')

from utils.prompts import render
from utils.llm_client import LLMClient
from utils.logging_utils import log_llm_call
from utils.router import pick_model
from utils.json_utils import safe_parse_json, validate_json_schema, create_simple_schema, format_schema_for_prompt
import json

### Part 1 JSON extraction with Schema

Define a schema and extract structured data.

In [2]:
text = """The cloudsync pro business plan cost $20 per user per month.
It includes 10TB and its currently available."""

schema = create_simple_schema({
                            "name": "string",
                            "price": "number",
                            "currency": "string",
                            "in_stock": "boolean"
                        }, required=["name","price","currency"])
pprint.pprint(schema)

{'properties': {'currency': {'type': 'string'},
                'in_stock': {'type': 'boolean'},
                'name': {'type': 'string'},
                'price': {'type': 'number'}},
 'required': ['name', 'price', 'currency'],
 'type': 'object'}


In [3]:
prompt_text, spec = render("json_extract.v1", schema=schema, text=text)
print(prompt_text)

Extract the requested fields and return ONLY valid JSON matching this schema:
{'type': 'object', 'properties': {'name': {'type': 'string'}, 'price': {'type': 'number'}, 'currency': {'type': 'string'}, 'in_stock': {'type': 'boolean'}}, 'required': ['name', 'price', 'currency']}

Text:
The cloudsync pro business plan cost $20 per user per month.
It includes 10TB and its currently available.

Return ONLY JSON, no extra text.


In [None]:
model = pick_model('google', 'general')
client = LLMClient('google', model)

response = client.json_chat(
    [
        {"role": "user", "content": prompt_text}
    ],
    temperature=0.0
)['text']
print(response)




```json
{
  "name": "cloudsync pro business plan",
  "price": 20,
  "currency": "USD",
  "in_stock": true
}
```
True None {'name': 'cloudsync pro business plan', 'price': 20, 'currency': 'USD', 'in_stock': True}


Handled malformed JSON with automatic repair

In [7]:
success, data, error = safe_parse_json(response)
pprint.pprint(data)

{'currency': 'USD',
 'in_stock': True,
 'name': 'cloudsync pro business plan',
 'price': 20}


### 2: Pydantic models(Recommend Approach)

Use Pyndantic for type-safe structured outputs with automatic validation and IDE support

In [10]:
from pydantic import BaseModel, Field
from utils.json_utils import (
                            format_pydantic_schema_for_prompt,
                            parse_json_with_pydantic
)

class ProductInfo(BaseModel):
    name: str = Field(..., description="The name of the product")
    price: float = Field(..., description="The price of the product")
    currency: str = Field(..., description="The currency of the product")
    in_stock: bool = Field(..., description="Whether the product is in stock")

schema_str = format_pydantic_schema_for_prompt(ProductInfo)
print(schema_str)

{
  "properties": {
    "name": {
      "description": "The name of the product",
      "title": "Name",
      "type": "string"
    },
    "price": {
      "description": "The price of the product",
      "title": "Price",
      "type": "number"
    },
    "currency": {
      "description": "The currency of the product",
      "title": "Currency",
      "type": "string"
    },
    "in_stock": {
      "description": "Whether the product is in stock",
      "title": "In Stock",
      "type": "boolean"
    }
  },
  "required": [
    "name",
    "price",
    "currency",
    "in_stock"
  ],
  "title": "ProductInfo",
  "type": "object"
}


In [12]:
text = """The cloudsync pro business plan cost $20 per user per month.
It includes 10TB and its currently available."""

prompt_text, spec = render("json_extract.v1", schema=schema, text=text)

model = pick_model('google', 'general')
client = LLMClient('google', model)

response = client.json_chat(
    [
        {"role": "user", "content": prompt_text}
    ],
    temperature=0.0
)['text']
print(response)

```json
{
  "name": "cloudsync pro business plan",
  "price": 20,
  "currency": "USD",
  "in_stock": true
}
```


In [13]:
# 1. අපි AI එකට කියවන්න දෙන ඊමේල් එක
text = """
Hello,
I bought the CloudSync Pro last week, but the software keeps crashing every time I try to upload a file larger than 1GB. 
This is completely unacceptable as my entire team is stuck. Please fix this immediately!
Regards, John.
"""

# 2. අච්චුව (Schema) - මෙතන තමයි Routing මැජික් එක තියෙන්නේ
schema = create_simple_schema({
    "department": {
        "type": "string",
        "description": "The department this email should be routed to.",
        "enum": ["sales", "tech_support", "billing", "general_inquiry"] # AI එකට තෝරන්න පුළුවන් මේ 4න් එකක් විතරයි!
    },
    "urgency": {
        "type": "string",
        "description": "How urgent is this customer's request?",
        "enum": ["low", "medium", "high", "critical"] # ප්‍රමුඛතාවය
    },
    "summary": "string" # ඊමේල් එකේ කෙටි සාරාංශයක්
}, required=["department", "urgency", "summary"])

# 3. Prompt එක හදලා AI එකට යැවීම
prompt_text, spec = render("json_extract.v1", schema=schema, text=text)

response = client.json_chat(
    [
        {"role": "user", "content": prompt_text}
    ],
    temperature=0.0
)['text']

# 4. ප්‍රතිඵලය පිරිසිදු කරලා ගැනීම
success, data, error = safe_parse_json(response)
pprint.pprint(data)

{'department': 'tech_support',
 'summary': 'CloudSync Pro software crashes when uploading files larger than '
            "1GB, impacting the user's entire team.",
 'urgency': 'critical'}
