# Lab 1: Tool Calling — From Schema to Agent

**Module 02 — Function Calling & Tool Systems**

## Objectives

By the end of this lab you will be able to:

1. **Design** JSON Schemas that guide LLMs to call tools correctly
2. **Use Pydantic** to define schemas and validate structured output
3. **Implement** a tool execution function with structured error returns
4. **Wire** a tool into a LLM API using the two-call pattern with LiteLLM

---

## The Story

We're building a calculator tool. We'll do it in four parts:

1. Design the schema (tell the LLM what the tool does)
2. Upgrade to Pydantic (validate inputs automatically)
3. Implement execution (the actual arithmetic)
4. Connect to the API (let the LLM call it)

In [None]:
# Setup
# !uv pip install litellm pydantic python-dotenv

import json
import os
import logging
from typing import Optional, List, Dict, Any
from enum import Enum
from pydantic import BaseModel, Field, ValidationError
from dotenv import load_dotenv
import litellm

load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Optional: Set litellm to drop unknown params if necessary
litellm.drop_params = True

---
## Part 1: Schema Design

A tool schema tells the LLM **what** the tool does and **how** to call it.
The `description` field is a prompt — it determines when the model decides to use the tool.

### Walkthrough: Calculator Schema

Key design principles:

| Principle | Example |
|-----------|---------|
| **Verb name** | `execute_calculation` |
| **Rich description** | Includes purpose, examples, and edge cases |
| **Enums over free text** | Constrains the model's output space |
| **All required** | No optional fields unless you mean it |

In [None]:
calculator_schema = {
    "type": "function",
    "function": {
        "name": "execute_calculation",
        "description": (
            "Executes a basic arithmetic or exponentiation operation. "
            "Use for any math in user questions: percentages, growth rates, "
            "compound interest, splits, or simple arithmetic. "
            "Example: For 'What is 15% of 200?', use operation='multiply', "
            "operand_a=200, operand_b=0.15."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "operation": {
                    "type": "string",
                    "enum": ["add", "subtract", "multiply", "divide", "pow"],
                    "description": "The arithmetic operation to perform."
                },
                "operand_a": {
                    "type": "number",
                    "description": "The first operand (base for 'pow')."
                },
                "operand_b": {
                    "type": "number",
                    "description": "The second operand (exponent for 'pow')."
                }
            },
            "required": ["operation", "operand_a", "operand_b"]
        }
    }
}

print(json.dumps(calculator_schema, indent=2))

### Exercise: Design a Hotel Search Schema

Build a schema for `search_hotels(location, price_range, amenities)`:
- `location`: a string (city name)
- `price_range`: one of `"budget"`, `"mid"`, `"luxury"`
- `amenities`: an array of strings from a fixed set (`pool`, `wifi`, `gym`, `parking`, `restaurant`)

In [None]:
# TODO: Design the search_hotels schema
# Hint: Use "enum" for price_range and for items inside the amenities array.
# For an array of enums: "type": "array", "items": {"type": "string", "enum": [...]}

search_hotels_schema = {
    "type": "function",
    "function": {
        "name": "search_hotels",
        "description": "TODO: Write a clear description that tells the LLM when to use this tool.",
        "parameters": {
            "type": "object",
            "properties": {
                # TODO: Define 'location' (type: string)
                # TODO: Define 'price_range' (type: string, enum)
                # TODO: Define 'amenities' (type: array of enum strings)
            },
            "required": []  # TODO: Which fields should be required?
        }
    }
}

print(json.dumps(search_hotels_schema, indent=2))

In [None]:
# Validation check
from checker.lab01 import check_hotel_schema
check_hotel_schema(search_hotels_schema)

---
## Part 2: Pydantic Schemas

Raw dicts work, but **Pydantic** gives you:
1. **Single source of truth** — define schema once in Python, generate JSON automatically
2. **Automatic validation** — catches invalid LLM output before it reaches your code
3. **Type safety** — IDE autocomplete and type checking

### Walkthrough: Calculator as a Pydantic Model

In [None]:
class Operation(str, Enum):
    ADD = "add"
    SUBTRACT = "subtract"
    MULTIPLY = "multiply"
    DIVIDE = "divide"
    POW = "pow"


class CalculationRequest(BaseModel):
    """
    Executes a basic arithmetic or exponentiation operation.
    Use for any math in user questions: percentages, growth rates,
    compound interest, splits, or simple arithmetic.
    Example: For 'What is 15% of 200?', use operation='multiply',
    operand_a=200, operand_b=0.15.
    """
    operation: Operation = Field(
        description="The arithmetic operation. 'pow' calculates operand_a to the power of operand_b."
    )
    operand_a: float = Field(description="The first operand (base for 'pow').")
    operand_b: float = Field(description="The second operand (exponent for 'pow', divisor for 'divide').")


# This replaces the hand-written dict above
CALCULATOR_SCHEMA = {
    "type": "function",
    "function": {
        "name": "execute_calculation",
        "description": CalculationRequest.__doc__.strip(),
        "parameters": CalculationRequest.model_json_schema()
    }
}

print(json.dumps(CALCULATOR_SCHEMA, indent=2))

### Exercise: Hotel Search Result Model

Define a Pydantic model for a structured hotel search result:
- `name`: string
- `city`: string
- `price_per_night`: float (must be > 0)
- `rating`: float (between 1.0 and 5.0)
- `amenities`: list of strings

In [None]:
# TODO: Define the HotelResult Pydantic model
# Hint: Use Field(gt=0) for price, Field(ge=1.0, le=5.0) for rating

class HotelResult(BaseModel):
    """Structured result for a hotel search."""
    # TODO: Define the fields with appropriate types and constraints
    pass


print(json.dumps(HotelResult.model_json_schema(), indent=2))

In [None]:
# Validation check
from checker.lab01 import check_hotel_model
check_hotel_model(HotelResult)

# Parse simulated LLM responses
responses = [
    '{"name": "Grand Palace", "city": "Riyadh", "price_per_night": 350.0, "rating": 4.5, "amenities": ["pool", "wifi"]}',
    '{"name": "Budget Inn", "city": "Jeddah", "price_per_night": 80.0, "rating": 6.0, "amenities": ["wifi"]}',   # rating out of range
    '{"name": "Seaside", "city": "Dammam", price_per_night: 200}',  # malformed JSON
    '{"name": "Oasis", "city": "Medina", "price_per_night": 150.0, "rating": 4.0, "amenities": ["gym"]}',
]

valid_hotels = []
for i, response in enumerate(responses):
    # TODO: Parse with HotelResult.model_validate_json(response)
    # Catch ValidationError and json.JSONDecodeError separately
    pass

print(f"\nSuccessfully parsed {len(valid_hotels)} out of {len(responses)} responses.")

---
## Part 3: Tool Execution

The tool execution function must:
- **Always** return a dict with `success`, `result`, `error`
- **Never** raise an uncaught exception — return structured errors instead
- Handle domain errors (e.g., division by zero) explicitly

### Walkthrough: The return contract

In [None]:
# Every tool always returns one of these two shapes:
SUCCESS = {"success": True,  "result": 42.0, "error": None}
FAILURE = {"success": False, "result": None, "error": "Division by zero is not allowed."}

# This consistent contract means the agent loop never needs to special-case tool results

### Exercise: Implement `execute_calculation`

In [None]:
def execute_calculation(operation: str, operand_a: float, operand_b: float) -> Dict[str, Any]:
    """
    Performs the calculation and returns a structured result.

    Args:
        operation: One of "add", "subtract", "multiply", "divide", "pow"
        operand_a: The first operand
        operand_b: The second operand

    Returns:
        {"success": True/False, "result": <number or None>, "error": <str or None>}
    """
    logger.info(f"Executing: {operand_a} {operation} {operand_b}")
    result = None
    error = None

    try:
        # TODO: Implement the operation logic
        # - "add":      operand_a + operand_b
        # - "subtract": operand_a - operand_b
        # - "multiply": operand_a * operand_b
        # - "divide":   operand_a / operand_b  (handle division by zero!)
        # - "pow":      operand_a ** operand_b
        # - else:       set error = f"Unsupported operation: {operation}"
        pass
    except Exception as e:
        error = f"Calculation error: {str(e)}"

    if error:
        return {"success": False, "result": None, "error": error}
    return {"success": True, "result": result, "error": None}


def get_tool_schemas() -> list:
    return [CALCULATOR_SCHEMA]


def execute_tool(tool_name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
    if tool_name == "execute_calculation":
        return execute_calculation(**arguments)
    return {"success": False, "result": None, "error": f"Unknown tool: {tool_name}"}


# Quick test
print(execute_calculation("add", 10, 5))
print(execute_calculation("multiply", 500, 0.15))
print(execute_calculation("divide", 10, 0))
print(execute_calculation("pow", 2, 10))

In [None]:
# Validation checks
from checker.lab01 import check_calculator_logic
check_calculator_logic(execute_calculation)

---
## Part 4: Wiring to the API — The Two-Call Pattern

LLM tool calling works as a two-step conversation:

```
1. You:  messages + tool schemas  →  LLM
   LLM: "Here are tool_calls I need you to execute"

2. You:  execute tools, append results  →  LLM
   LLM: "Based on the results, the answer is..."
```

### Walkthrough: `get_ai_response_with_tools`

In [None]:
def get_ai_response_with_tools(
    messages: List[Dict[str, Any]],
    model: str = "openai/gpt-4o-mini"
) -> Dict[str, Any]:
    """
    Sends messages to the LLM via LiteLLM, handles tool calls, returns final text.
    Returns: {"response_text": str, "tool_results": list}
    """

    # --- First API Call: send messages + tool schemas ---
    try:
        response = litellm.completion(
            model=model,
            messages=messages,
            tools=get_tool_schemas(),
            tool_choice="auto",
            temperature=0.1
        )
    except Exception as e:
        logger.error(f"API call failed: {e}")
        return {"response_text": "Error connecting to API.", "tool_results": []}

    response_message = response.choices[0].message
    tool_results = []

    # --- Handle Tool Calls ---
    if response_message.get("tool_calls"):
        logger.info(f"Model initiated {len(response_message.tool_calls)} tool call(s).")

        # Append assistant message (with tool_calls) to history
        messages.append(response_message)

        # Execute each tool call
        for tool_call in response_message.tool_calls:
            tool_name = tool_call.function.name
            try:
                # Pydantic validates the arguments before execution
                request = CalculationRequest.model_validate_json(tool_call.function.arguments)
                tool_result = execute_tool(tool_name, request.model_dump())
            except ValidationError as e:
                tool_result = {"success": False, "error": f"Validation Error: {e}", "result": None}
            except json.JSONDecodeError:
                tool_result = {"success": False, "error": "Invalid JSON arguments", "result": None}

            tool_results.append(tool_result)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(tool_result)
            })

        # --- Second API Call: get final answer ---
        second_response = litellm.completion(
            model=model, messages=messages, temperature=0.1
        )
        response_text = second_response.choices[0].message.content or "Calculation complete."
    else:
        response_text = response_message.content

    return {"response_text": response_text, "tool_results": tool_results}

---
## Part 5: Live Demo

Run a few questions through the agent and observe the tool being called.

In [None]:
SYSTEM_PROMPT = "You are a helpful assistant with access to a calculator. Use it for any math."

test_questions = [
    "What is 15% of 200?",
    "If I invest 1000 at 8% annual interest, what will it be after 5 years?",
    "What is 2 to the power of 10?",
]

for question in test_questions:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user",   "content": question}
    ]
    result = get_ai_response_with_tools(messages)

    print(f"Q: {question}")
    print(f"A: {result['response_text']}")
    if result["tool_results"]:
        print(f"   [Tool called {len(result['tool_results'])} time(s), results: {result['tool_results']}]")
    print()

---
## Reflection

### Key Takeaways

1. **Description engineering** — the LLM uses the description to decide *when* to call a tool
2. **Enums** dramatically improve accuracy by constraining the model's output space
3. **Pydantic** gives you type-safe parsing + validation in one step
4. **Always return structured errors** — never let a tool raise an exception into the agent loop
5. **Two-call pattern** — tool use is a conversation turn, not a function call
6. **LiteLLM** allows you to stay provider-agnostic while using the same message format.

### Up Next

**Lab 2 — Plugin System**: Scale from one tool to many with a registry, rate limiting, permissions, and MCP.