# Lab 1: The Schema Gym

**Module 02 — Function Calling & Tool Systems | Session 1, Part 2**

## Objectives

By the end of this lab you will be able to:

1. **Design** JSON Schemas that guide LLMs to call tools correctly
2. **Use Pydantic** to define strict schemas and validate structured output
3. **Parse and validate** simulated LLM JSON responses

> **No API keys needed** — this lab uses local Python only.

In [3]:
# Setup
# !pip install pydantic

import json
from typing import Optional
from pydantic import BaseModel, Field, ValidationError

---
## Part 1: Manual Schema Design

A tool schema tells the LLM **what** the tool does and **how** to call it.

### Walkthrough: Calculator Schema

Let's examine a well-designed schema:

In [4]:
calculator_schema = {
    "type": "function",
    "function": {
        "name": "execute_calculation",
        "description": (
            "Executes a basic arithmetic or exponentiation operation. "
            "Use for any math in user questions: percentages, growth rates, "
            "compound interest, splits, or simple arithmetic. "
            "Example: For 'What is 15% of 200?', use operation='multiply', "
            "operand_a=200, operand_b=0.15."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "operation": {
                    "type": "string",
                    "enum": ["add", "subtract", "multiply", "divide", "pow"],
                    "description": "The arithmetic operation to perform."
                },
                "operand_a": {
                    "type": "number",
                    "description": "The first operand (base for 'pow')."
                },
                "operand_b": {
                    "type": "number",
                    "description": "The second operand (exponent for 'pow')."
                }
            },
            "required": ["operation", "operand_a", "operand_b"]
        }
    }
}

print(json.dumps(calculator_schema, indent=2))

{
  "type": "function",
  "function": {
    "name": "execute_calculation",
    "description": "Executes a basic arithmetic or exponentiation operation. Use for any math in user questions: percentages, growth rates, compound interest, splits, or simple arithmetic. Example: For 'What is 15% of 200?', use operation='multiply', operand_a=200, operand_b=0.15.",
    "parameters": {
      "type": "object",
      "properties": {
        "operation": {
          "type": "string",
          "enum": [
            "add",
            "subtract",
            "multiply",
            "divide",
            "pow"
          ],
          "description": "The arithmetic operation to perform."
        },
        "operand_a": {
          "type": "number",
          "description": "The first operand (base for 'pow')."
        },
        "operand_b": {
          "type": "number",
          "description": "The second operand (exponent for 'pow')."
        }
      },
      "required": [
        "operation",
     

### Key Design Principles

| Principle | Example in Calculator Schema |
|-----------|----------------------------|
| **Verb name** | `execute_calculation` |
| **Rich description** | Includes purpose, examples, and edge cases |
| **Enums over free text** | `["add", "subtract", "multiply", "divide", "pow"]` |
| **Explicit required** | All three params are required |

### Exercise: Design a Hotel Search Schema

Build a schema for `search_hotels(location, price_range, amenities)`:
- `location`: a string (city name)
- `price_range`: one of "budget", "mid", "luxury"
- `amenities`: an array of strings from a fixed set (pool, wifi, gym, parking, restaurant)

In [5]:
# TODO: Design the search_hotels schema
# Hint: Use "enum" for price_range and for items inside the amenities array.
# Hint: For an array of enums, use:
#   "type": "array", "items": {"type": "string", "enum": [...]}

search_hotels_schema = {
    "type": "function",
    "function": {
        "name": "search_hotels",
        "description": "TODO: Write a clear description that tells the LLM when to use this tool.",
        "parameters": {
            "type": "object",
            "properties": {
                "location":{"type":"string"},# TODO: Define 'location' (type: string)
                "price_range":{"type":"string", "enum": ["budget","mid","luxury"]},# TODO: Define 'price_range' (type: string, enum)
                "amenities":{"type":"array", "item":{"type": "string","enum":["pool", "wifi", "gym", "parking", "restaurant"]}}# TODO: Define 'amenities' (type: array of enum strings)
            },
            "required": ["location"]  # TODO: Which fields should be required?
        }
    }
}

print(json.dumps(search_hotels_schema, indent=2))

{
  "type": "function",
  "function": {
    "name": "search_hotels",
    "description": "TODO: Write a clear description that tells the LLM when to use this tool.",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string"
        },
        "price_range": {
          "type": "string",
          "enum": [
            "budget",
            "mid",
            "luxury"
          ]
        },
        "amenities": {
          "type": "array",
          "item": {
            "type": "string",
            "enum": [
              "pool",
              "wifi",
              "gym",
              "parking",
              "restaurant"
            ]
          }
        }
      },
      "required": [
        "location"
      ]
    }
  }
}


In [6]:
# Validation check — run this to verify your schema
assert search_hotels_schema["function"]["name"] == "search_hotels"
props = search_hotels_schema["function"]["parameters"]["properties"]
assert "location" in props, "Missing 'location' property"
assert "price_range" in props, "Missing 'price_range' property"
assert "amenities" in props, "Missing 'amenities' property"
assert "enum" in props["price_range"], "price_range should use enum"
assert props["amenities"]["type"] == "array", "amenities should be an array"
print("All checks passed!")

All checks passed!


---
## Part 2: Pydantic Structured Output

Instead of writing raw JSON Schema dicts, you can use **Pydantic models** to:
1. Generate schemas automatically
2. Validate and parse LLM output with type safety

### Walkthrough: Calculator as Pydantic

In [7]:
from enum import Enum

class Operation(str, Enum):
    ADD = "add"
    SUBTRACT = "subtract"
    MULTIPLY = "multiply"
    DIVIDE = "divide"
    POW = "pow"

class CalculationRequest(BaseModel):
    """Request model for the calculator tool."""
    operation: Operation = Field(description="The arithmetic operation to perform.")
    operand_a: float = Field(description="The first operand (base for 'pow').")
    operand_b: float = Field(description="The second operand (exponent for 'pow').")

# Pydantic auto-generates JSON Schema
print(json.dumps(CalculationRequest.model_json_schema(), indent=2))

{
  "$defs": {
    "Operation": {
      "enum": [
        "add",
        "subtract",
        "multiply",
        "divide",
        "pow"
      ],
      "title": "Operation",
      "type": "string"
    }
  },
  "description": "Request model for the calculator tool.",
  "properties": {
    "operation": {
      "$ref": "#/$defs/Operation",
      "description": "The arithmetic operation to perform."
    },
    "operand_a": {
      "description": "The first operand (base for 'pow').",
      "title": "Operand A",
      "type": "number"
    },
    "operand_b": {
      "description": "The second operand (exponent for 'pow').",
      "title": "Operand B",
      "type": "number"
    }
  },
  "required": [
    "operation",
    "operand_a",
    "operand_b"
  ],
  "title": "CalculationRequest",
  "type": "object"
}


### Exercise: Hotel Search Result Model

Define a Pydantic model for a structured **search result** that an LLM might return.

The model should represent a single hotel with:
- `name`: string
- `city`: string
- `price_per_night`: float (must be > 0)
- `rating`: float (between 1.0 and 5.0)
- `amenities`: list of strings

In [8]:
# TODO: Define the HotelResult Pydantic model
# Hint: Use Field(gt=0) for price, Field(ge=1.0, le=5.0) for rating

class HotelResult(BaseModel):
    """Structured result for a hotel search."""
    name: str # TODO: Define the fields with appropriate types and constraints
    city: str
    price_per_night: float= Field(gt=0)
    rating: float= Field(ge=1.0, le=5.0)
    amenities: list[str]

    pass


# Print the auto-generated schema
print(json.dumps(HotelResult.model_json_schema(), indent=2))

{
  "description": "Structured result for a hotel search.",
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "city": {
      "title": "City",
      "type": "string"
    },
    "price_per_night": {
      "exclusiveMinimum": 0,
      "title": "Price Per Night",
      "type": "number"
    },
    "rating": {
      "maximum": 5.0,
      "minimum": 1.0,
      "title": "Rating",
      "type": "number"
    },
    "amenities": {
      "items": {
        "type": "string"
      },
      "title": "Amenities",
      "type": "array"
    }
  },
  "required": [
    "name",
    "city",
    "price_per_night",
    "rating",
    "amenities"
  ],
  "title": "HotelResult",
  "type": "object"
}


In [9]:
# Validation check
schema = HotelResult.model_json_schema()
assert "name" in schema["properties"], "Missing 'name' field"
assert "price_per_night" in schema["properties"], "Missing 'price_per_night' field"
assert "rating" in schema["properties"], "Missing 'rating' field"

# Test that validation works
try:
    HotelResult(name="Test", city="Riyadh", price_per_night=-50, rating=3.0, amenities=[])
    print("ERROR: Should have rejected negative price!")
except ValidationError:
    print("Correctly rejected negative price!")

print("All checks passed!")

Correctly rejected negative price!
All checks passed!


---
## Part 3: Parsing & Validation

LLMs return JSON as **strings**. You need to parse them safely.

### Approach 1: Raw `json.loads` (fragile)

In [None]:
# Simulated LLM output (a raw JSON string)
llm_output_good = '{"operation": "multiply", "operand_a": 500, "operand_b": 0.15}'
llm_output_bad = '{"operation": "multiply", "operand_a": 500, operand_b: 0.15}'  # Missing quotes!

# Raw parsing — no type checking, crashes on bad JSON
try:
    parsed = json.loads(llm_output_good)
    print(f"Parsed successfully: {parsed}")
except json.JSONDecodeError as e:
    print(f"Parse error: {e}")

try:
    parsed = json.loads(llm_output_bad)
    print(f"Parsed successfully: {parsed}")
except json.JSONDecodeError as e:
    print(f"Parse error (expected): {e}")

### Approach 2: Pydantic `model_validate_json` (robust)

Pydantic parses JSON **and** validates types and constraints in one step.

In [None]:
# Pydantic parsing — validates types, enums, and constraints
try:
    request = CalculationRequest.model_validate_json(llm_output_good)
    print(f"Valid request: {request}")
    print(f"Operation enum: {request.operation}")
    print(f"Type-safe operand_a: {request.operand_a} (type: {type(request.operand_a).__name__})")
except ValidationError as e:
    print(f"Validation error: {e}")

# Test with an invalid operation
bad_operation = '{"operation": "modulo", "operand_a": 10, "operand_b": 3}'
try:
    request = CalculationRequest.model_validate_json(bad_operation)
except ValidationError as e:
    print(f"\nCaught invalid operation: {e.errors()[0]['msg']}")

### Exercise: Parse Simulated Hotel Results

An LLM returned the following JSON responses. Parse each one using your `HotelResult` model and handle errors gracefully.

In [10]:
# Simulated LLM responses — some valid, some not
responses = [
    # Valid
    '{"name": "Grand Palace", "city": "Riyadh", "price_per_night": 350.0, "rating": 4.5, "amenities": ["pool", "wifi"]}',
    # Invalid: rating out of range
    '{"name": "Budget Inn", "city": "Jeddah", "price_per_night": 80.0, "rating": 6.0, "amenities": ["wifi"]}',
    # Invalid: malformed JSON
    '{"name": "Seaside Resort", "city": "Dammam", price_per_night: 200}',
    # Valid
    '{"name": "Oasis Hotel", "city": "Medina", "price_per_night": 150.0, "rating": 4.0, "amenities": ["gym", "parking"]}',
]

# TODO: Loop through responses and parse each one.
# - On success, print the hotel name and rating
# - On ValidationError, print which fields failed
# - On json.JSONDecodeError, print "Malformed JSON"

valid_hotels = []

for i, response in enumerate(responses):
    # TODO: Try to parse with HotelResult.model_validate_json(response)
    # Catch ValidationError and json.JSONDecodeError separately
    try:
        hotel= HotelResult.model_validate_json(response)
        print(f"Valid hotel: {hotel.name} (Rating: {hotel.rating})")
        valid_hotels.append(hotel)

    except ValidationError as e:
        print(f"Validation error in response {i}: {e.errors()}")

    except json.JSONDecodeError:
        print(f"Malformed JSON in response {i}")

print(f"\nSuccessfully parsed {len(valid_hotels)} out of {len(responses)} responses.")

Valid hotel: Grand Palace (Rating: 4.5)
Validation error in response 1: [{'type': 'less_than_equal', 'loc': ('rating',), 'msg': 'Input should be less than or equal to 5', 'input': 6.0, 'ctx': {'le': 5.0}, 'url': 'https://errors.pydantic.dev/2.12/v/less_than_equal'}]
Validation error in response 2: [{'type': 'json_invalid', 'loc': (), 'msg': 'Invalid JSON: key must be a string at line 1 column 46', 'input': '{"name": "Seaside Resort", "city": "Dammam", price_per_night: 200}', 'ctx': {'error': 'key must be a string at line 1 column 46'}, 'url': 'https://errors.pydantic.dev/2.12/v/json_invalid'}]
Valid hotel: Oasis Hotel (Rating: 4.0)

Successfully parsed 2 out of 4 responses.


---
## Reflection

### Key Takeaways

1. **Description engineering** is critical — the LLM uses it to decide *when* to call a tool
2. **Enums** dramatically improve accuracy by constraining the LLM's output space
3. **Pydantic** gives you type-safe parsing + validation in one step
4. **Always handle errors** — LLMs produce malformed JSON more often than you'd expect

### When to Use What

| Approach | Use When |
|----------|----------|
| Raw `json.loads` | Quick scripts, you control the format |
| Pydantic `model_validate_json` | Production code, untrusted LLM output |
| Raw dict schemas | Simple tools, quick prototyping |
| Pydantic `model_json_schema()` | Complex schemas, reusable validation |

### Up Next

**Lab 2 — Calculator Tool**: Wire up a real tool to OpenAI's API and build a complete tool-calling agent.