# Programatic Tool Calling (PTC) with the Claude API

Programmatic Tool Calling (PTC) allows Claude to write code that calls tools programmatically within the Code Execution environment, rather than requiring round-trips through the model for each tool invocation. This substantially reduces end-to-end latency for multiple tool calls, and can dramatically reduce token consumption by allowing the model to write code that removes irrelevant context before it hits the model’s context window (for example, by grepping for key information within large and noisy files).

When faced with third-party APIs and tools that you may not be able to modify directly, PTC can help reduce usage of context by allowing Claude to write code that can be invoked in the Code Execution environment. 

In this cookbook, we will work with a mock third-party API provided by SkyNet, a drone delivery service. SkyNet manages a fleet of autonomous delivery drones across multiple distribution centers. We will use their logging API both without and with Programatic Tool Calling to demonstrate the benefits of PTC.

## By the end of this cookbook, you'll be able to:

- Understand the difference between regular tool calling and programatic tool calling (PTC)
- Write agents that leverage PTC 


## Prerequisites

Before following this guide, ensure you have:

**Required Knowledge**

- Python fundamentals - comfortable with async/await, functions, and basic data structures
- Basic understanding of agentic patterns and tool calling

**Required Tools**

- Python 3.11 or higher
- Anthropic API key


## Setup

First, install the required dependencies:

In [15]:
%%capture
%pip install -qU anthropic python-dotenv

Note: Ensure your .env file contains:

`ANTHROPIC_API_KEY=your_key_here`

Load your environment variables and configure the client. We also load a helper utility to visualize Claude message responses.


In [16]:
from dotenv import load_dotenv
from visualize import visualize

load_dotenv()

MODEL = "claude-sonnet-4-5"

viz = visualize(auto_show=True)

## Understanding the Third-Party API

In [api.py](api.py), there are three functions defined: `get_delivery_logs`, `get_container_logs`, and `check_pod_health`. The `get_delivery_logs` function allows us to provide a time range for logs. The `get_container_logs` function allows us to view logs for a given container. The `check_pod_health` function provides a way to check the health status of different pod types in the infrastructure, albeit with a delay on each call.

We've also defined `TOOLS` which contains the json schema definitions for each tool including descriptions and parameters. We'll import those in this notebook instead of defining them here for brevity. View the [api.py](api.py) to see how we define our tools. The description that Claude receives for these tools is important in helping it understand both when and how to use these tools.

In this scenario, we are investigating a reported issue with drone deliveries. We know that a particular job failed, but aren't sure of the root cause. Traditionally, we might look into multiple services, scan through logs, and try to correlate failures. Instead, we will ask Claude to do the investigation for us, using tools as needed to identify what went wrong.

We'll pass our tool definitions to the messages API and ask Claude to investigate. Read the docs on [implementing tool use](https://docs.claude.com/en/docs/agents-and-tools/tool-use/implement-tool-use) if you are not familiar with how tool use works with Claude's API.

In [17]:
import anthropic
import json
from api import get_delivery_logs, get_container_logs, check_pod_health
from api import TOOLS as tools

client = anthropic.Anthropic()


tool_functions = {
    "get_delivery_logs": get_delivery_logs,
    "get_container_logs": get_container_logs,
    "check_pod_health": check_pod_health,
}

## Traditional Tool Calling (Baseline)

In this first example, we'll use traditional tool calling to establish our baseline.

We'll call the `messages.create` API with our initial query. When the model stops with a `tool_use` reason, we will execute the tool as requested, and then add the output from the tool to the messages and call the model again.

In [18]:
from anthropic.types import MessageParam, TextBlock, ToolUseBlock
import time

def run_agent_without_ptc(user_message):
    """Run agent using traditional tool calling"""
    messages: list[MessageParam] = [{"role": "user", "content": user_message}]
    total_tokens = 0
    start_time = time.time()

    api_counter = 0
    while True:
        response = client.messages.create(
            model=MODEL, max_tokens=4000, tools=tools, messages=messages
        )
        viz.capture(response)
        api_counter += 1

        # Track token usage
        total_tokens += response.usage.input_tokens + response.usage.output_tokens

        # Check if we're done
        if response.stop_reason == "end_turn":
            # Extract the first text block from the response
            final_response = next(
                (
                    block.text
                    for block in response.content
                    if isinstance(block, TextBlock)
                ),
                None,
            )
            elapsed_time = time.time() - start_time
            return final_response, messages, total_tokens, elapsed_time, api_counter

        # Process tool calls
        if response.stop_reason == "tool_use":
            # Add assistant's response to messages
            messages.append({"role": "assistant", "content": response.content})

            # Execute each tool call
            tool_results = []
            for block in response.content:
                if isinstance(block, ToolUseBlock):
                    tool_name = block.name
                    tool_input = block.input
                    tool_use_id = block.id

                    # Execute the tool
                    print(f"Execution of tool: {tool_name} with input: {tool_input}")
                    result = tool_functions[tool_name](**tool_input)

                    tool_results.append(
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_use_id,
                            "content": json.dumps(result),
                        }
                    )

            # Add tool results to messages
            messages.append({"role": "user", "content": tool_results})


Our initial query to the model provides some instructions to help guide the model. For brevity, we've asked the model to only call each tool once. For deeper investigations, the model may wish to look into multiple systems or time spans.

In [19]:
query = (
    "Job J_42 failed on 2025-06-02 between 1:00-2:00PM. What happened and what was the root cause? "
    "Investigate using the available tools. When looking at logs, filter only for WARN and ERROR entries. "
    "For simplicity, avoid calling tools multiple times if not necessary."
)


In [20]:
# Run the agent
result, conversation, total_tokens, elapsed_time, api_count_without_ptc = run_agent_without_ptc(
    query
)

print(f"Result: {result}")
print(f"API calls made: {api_count_without_ptc}")
print(f"Total tokens used: {total_tokens:,}")
print(f"Total time taken: {elapsed_time:.2f}s")

Execution of tool: get_delivery_logs with input: {'start_time': '2025-06-02T13:00:00Z', 'end_time': '2025-06-02T14:00:00Z'}
Execution of tool: get_container_logs with input: {'start_time': '2025-06-02T13:00:00Z', 'end_time': '2025-06-02T14:00:00Z'}


Result: Now let me filter through the logs to identify only WARN and ERROR entries relevant to Job J_42:

## Investigation Results for Job J_42

### Job J_42 Timeline (WARN/ERROR only):

**Delivery Logs:**
- **13:30:00Z [ERROR]** - **Job J_42 FAILED: BATTERY_CRITICAL**

This is the immediate cause - the job failed due to a critical battery condition.

### Root Cause Analysis:

Looking at the container infrastructure logs (WARN/ERROR only) around the time of failure:

**Battery Management System Issues:**
- **13:20:00Z [ERROR]** - `battery-mgmt-7a3f`: "battery voltage sensor malfunction - reporting incorrect readings across fleet"
- **13:30:36Z [ERROR]** - `battery-mgmt-5x4w`: "battery voltage sensor malfunction - reporting incorrect readings across fleet"
- **13:41:24Z [ERROR]** - `battery-mgmt-12ii`: "battery voltage sensor malfunction - reporting incorrect readings across fleet"

**Other Context Around the Timeframe:**
- **13:09:36Z [WARN]** - `navigation-service-6hid`: "GPS signal d

Great! We can see that Claude was able to use the available tools successfully to identify potential system-wide cascading failures. However we can also see that we used a lot of tokens to accomplish this task. Claude had to ingest all the log data from the tool in order to isolate the errors. 

Let's see if we can use PTC to improve performance.

To enable PTC on tools, we must first add the `allowed_callers` field to any tool that should be callable via code execution.

**Key points to consider**

- Tools without allowed_callers default to model-only invocation
- Tools can be invoked by both the model AND code execution by including multiple callers: `["direct", "code_execution_20250825"]`
- Only opt in tools that are safe for programmatic/repeated execution.


In [None]:
from anthropic.types.beta import BetaToolParam
import copy

# Configure tools for PTC by copying the existing tools and adding PTC-specific fields
ptc_tools: list[BetaToolParam] = []

for tool in tools:
    # Deep copy the tool to avoid modifying the original
    ptc_tool: BetaToolParam = copy.deepcopy(tool) # type: ignore

    # Add PTC-specific fields
    ptc_tool["allowed_callers"] = ["code_execution_20250825"]  # type: ignore

    ptc_tools.append(ptc_tool)

# Add the code execution tool
ptc_tools.append(
    {
        "type": "code_execution_20250825",  # type: ignore
        "name": "code_execution",
    }
)

Now that we've updated our tool definitions to allow programmatic tool calling, we can run our agent with PTC. In order to do so, we've had to make a few changes to our function. We must use the `beta` messages API. 

1. We've added `"code-execution-with-tools-2025-09-08,code-execution-2025-08-25"` to betas. 
2. We pass in the `container_id` if it is defined with our request. This is only necessary for stateful workflows like ours. In single-turn workflows this is not required.
3. We can check the `caller` field in the `tool_use` block to determine if this tool call is from a direct model invocation or from programmatic invocation. 

Note that in either case, we send our tool results via the Claude API, however only `direct` invocations will be "seen" by the model. `code_execution_20250825` types will only be seen my the code execution container. 

In [None]:
from anthropic.types.beta import (
    BetaToolUseBlock,
    BetaTextBlock,
)

messages = []

def run_agent_with_ptc(user_message):
    """Run agent using PTC"""
    messages.append({"role": "user", "content": user_message})
    total_tokens = 0
    start_time = time.time()
    container_id = None
    api_counter = 0

    while True:
        # Build request with PTC beta headers
        request_params = {
            "model": MODEL,
            "max_tokens": 4000,
            "tools": ptc_tools,
            "messages": messages,
        }

        response = client.beta.messages.create(
            **request_params,
            betas=[
                "code-execution-with-tools-2025-09-08,code-execution-2025-08-25"
            ],
            extra_body={"container": container_id} if container_id else None,
        )
        viz.capture(response)
        api_counter += 1


        # Track container for stateful execution
        if hasattr(response, "container") and response.container:
            container_id = response.container.id
            print(f"\n[Container] ID: {container_id}")
            if hasattr(response.container, "expires_at"):
                # If the container has expired, we would need to restart our workflow. In our case, it completes before expiration.
                print(f"[Container] Expires at: {response.container.expires_at}")

        # Track token usage
        total_tokens += response.usage.input_tokens + response.usage.output_tokens

        if response.stop_reason == "end_turn":
            # Extract the first text block from the response
            final_response = next(
                (
                    block.text
                    for block in response.content
                    if isinstance(block, BetaTextBlock)
                ),
                None,
            )
            elapsed_time = time.time() - start_time
            return final_response, messages, total_tokens, elapsed_time, api_counter

        # As before, we process tool calls
        if response.stop_reason == "tool_use":
            # First, add the assistant's response to messages
            messages.append({"role": "assistant", "content": response.content})

            # Collect all tool results
            tool_results = []

            for block in response.content:
                if isinstance(block, BetaToolUseBlock):
                    tool_name = block.name
                    tool_input = block.input
                    tool_use_id = block.id

                    # We can use caller type to understand how the tool was invoked
                    caller_type = block.caller['type']  # type: ignore

                    if caller_type == "code_execution_20250825":
                        print(f"[PTC] Tool called from code execution environment: {tool_name}")

                    elif caller_type == "direct":
                        print(f"[Direct] Tool called by model: {tool_name}")

                    result = tool_functions[tool_name](**tool_input)

                    # Format result as proper content for the API
                    if isinstance(result, list) and result and isinstance(result[0], str):
                        content = "\n".join(result)
                    elif isinstance(result, (dict, list)):
                        content = json.dumps(result)
                    else:
                        content = str(result)

                    tool_results.append(
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_use_id,
                            "content": content,
                        }
                    )

            messages.append({"role": "user", "content": tool_results})

        else:
            print(f"\nUnexpected stop reason: {response.stop_reason}")
            elapsed_time = time.time() - start_time

            final_response = next(
                (
                    block.text
                    for block in response.content
                    if isinstance(block, BetaTextBlock)
                ),
                f"Stopped with reason: {response.stop_reason}",
            )
            return final_response, messages, total_tokens, elapsed_time, api_counter

In [23]:
# Run the PTC agent
result_ptc, conversation_ptc, total_tokens_ptc, elapsed_time_ptc, api_count_with_ptc = run_agent_with_ptc(
    query
)

KeyboardInterrupt: 

In [None]:
print(f"\n{'=' * 60}")
print(f"Result: {result_ptc}")
print(f"\n{'=' * 60}")
print("Performance Metrics:")
print(
    f"  Total API calls to Claude: {len([m for m in conversation_ptc if m['role'] == 'assistant'])}"
)
print(f"  Total tokens used: {total_tokens_ptc:,}")
print(f"  Total time taken: {elapsed_time_ptc:.2f}s")


Result: ## Investigation Summary

### What Happened to Job J_42

**Job J_42 failed at 13:30:00 UTC (1:30 PM) on 2025-06-02 with error: BATTERY_CRITICAL**

### Root Cause Analysis

The root cause was a **systemic battery management system failure**, not an actual battery problem:

1. **Battery Management System Malfunction**: The container logs reveal multiple ERROR messages from battery-mgmt pods reporting "battery voltage sensor malfunction - reporting incorrect readings across fleet" starting at 13:20:00 and continuing through 13:41:24.

2. **False Battery Alerts**: This sensor malfunction caused the system to incorrectly report batteries as critically low, leading to:
   - Multiple jobs failing with BATTERY_CRITICAL errors (11+ failures during the time period)
   - Job J_42 specifically failed at 13:30:00 due to this false battery reading

3. **System Health Status**: The battery-mgmt pod health check confirms the issue:
   - **Status**: Unhealthy
   - **40% of battery management p

## Performance Comparison

Let's compare the performance between traditional tool calling and PTC:

In [None]:
import pandas as pd

# Create comparison dataframe
comparison_data = {
    "Metric": [
        "API Calls",
        "Total Tokens",
        "Token Reduction",
    ],
    "Traditional": [
        api_count_without_ptc,
        f"{total_tokens:,}",
        "-",
    ],
    "PTC": [
        api_count_with_ptc,
        f"{total_tokens_ptc:,}",
        f"{((total_tokens - total_tokens_ptc) / total_tokens * 100):.1f}%",
    ],
}

df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))

         Metric Traditional    PTC
      API Calls           2      4
   Total Tokens      38,754 21,976
Token Reduction           -  43.3%


## Key Takeaways

PTC demonstrated one of the core capabilities in this workflow:

### Large Data Parsing Without Context Pollution
Claude wrote code to filter through hundreds of log entries and extract only the relevant ERROR-level messages, avoiding the need to send all logs back through the model's context window.

## NOTE TO REVIEWER, CLAUDE DID NOT USE THESE PATTERNS ## 
----------------------------------------------------------
### 2. Advanced Tool Use Patterns  
Claude implemented conditional logic: first checking delivery logs to identify the failure type (BATTERY_CRITICAL), then dynamically deciding to query the battery management system logs specifically.

### 3. Reduced Latency
All tool calls executed within the code execution environment without requiring round-trips to the model between each invocation, significantly reducing end-to-end latency.

## When to Use PTC

PTC is most beneficial when:
- Working with large datasets that need filtering or parsing
- Multiple tool calls are needed in sequence
- Implementing conditional logic based on intermediate results
- Tools are safe for programmatic/repeated execution

## Next Steps

Try adapting this pattern to your own use cases:
- Log analysis and debugging workflows
- Large file processing (CSV, JSON, XML)
- Database query result filtering
- Multi-step API orchestration
- Health checks with early termination