# Programatic Tool Calling (PTC) with the Claude API

Programmatic Tool Calling (PTC) allows Claude to write code that calls tools programmatically within the Code Execution environment, rather than requiring round-trips through the model for each tool invocation. This substantially reduces end-to-end latency for multiple tool calls, and can dramatically reduce token consumption by allowing the model to write code that removes irrelevant context before it hits the model’s context window (for example, by grepping for key information within large and noisy files).

When faced with third-party APIs and tools that you may not be able to modify directly, PTC can help reduce usage of context by allowing Claude to write code that can be invoked in the Code Execution environment. 

In this cookbook, we will work with a mock API for team expense management.  The API is designed to require multiple invocations and will return large results which help illustrate the benefits of Programmatic Tool Calling.

## By the end of this cookbook, you'll be able to:

- Understand the difference between regular tool calling and programatic tool calling (PTC)
- Write agents that leverage PTC 


## Prerequisites

Before following this guide, ensure you have:

**Required Knowledge**

- Python fundamentals - comfortable with async/await, functions, and basic data structures
- Basic understanding of agentic patterns and tool calling

**Required Tools**

- Python 3.11 or higher
- Anthropic API key


## Setup

First, install the required dependencies:

In [1]:
# %pip install -qU anthropic python-dotenv

Note: Ensure your .env file contains:

`ANTHROPIC_API_KEY=your_key_here`

Load your environment variables and configure the client. We also load a helper utility to visualize Claude message responses.


In [2]:
from dotenv import load_dotenv
from utils.visualize import visualize

load_dotenv()

MODEL = "claude-sonnet-4-5"

viz = visualize(auto_show=True)

## Understanding the Third-Party API

In [utils/team_expense_api.py](utils/team_expense_api.py), there are three functions defined: `get_team_members`, `get_expenses`, and `get_budget_by_level`. The `get_team_members` function allows us to retrieve all employees in a given department with their role, level, and contact information. The `get_expenses` function returns all expense line items for an employee in a specific quarter—this can be several hundred records per employee. The `get_budget_by_level` function provides quarterly budget limits for different expense categories based on employee level.

These functions are decorated with `@beta_tool`, which automatically generates the tool schema definitions including descriptions and parameters. We'll import these tools directly in this notebook. The descriptions that Claude receives for these tools are crucial in helping it understand both when and how to use them effectively.

In this scenario, we need to analyze team expenses and identify which employees have exceeded their budgets. Traditionally, we might manually pull expense reports for each person, sum up their expenses by category, compare against budget limits, and compile a report. Instead, we will ask Claude to perform this analysis for us, using the available tools to retrieve team data, fetch potentially hundreds of expense line items, and determine who has gone over budget.

The key challenge here is that each employee may have 100+ expense line items that need to be fetched, parsed, and aggregated—making this an ideal use case for demonstrating the benefits of Programmatic Tool Calling.

We'll pass our tool definitions to the messages API and ask Claude to perform the analysis. Read the docs on [implementing tool use](https://docs.claude.com/en/docs/agents-and-tools/tool-use/implement-tool-use) if you are not familiar with how tool use works with Claude's API.

In [3]:
import json

import anthropic
from utils.team_expense_api import get_budget_by_level, get_expenses, get_team_members

client = anthropic.Anthropic()

# Tool definitions for the team expense API
tools = [
    {
        "name": "get_team_members",
        "description": "Returns a list of team members for a given department. Each team member includes their ID, name, role, level (junior, mid, senior, staff, principal), and contact information. Use this to get a list of people whose expenses you want to analyze. Available departments are: engineering, sales, and marketing.",
        "input_schema": {
            "type": "object",
            "properties": {
                "department": {
                    "type": "string",
                    "description": "The department name (e.g., 'engineering', 'sales', 'marketing'). Case-insensitive.",
                }
            },
            "required": ["department"],
        },
    },
    {
        "name": "get_expenses",
        "description": "Returns all expense line items for a given employee in a specific quarter. Each expense includes date, category, description, amount (in USD), currency, and status (approved, pending, rejected). An employee may have 20-50+ expense line items per quarter. Categories include: 'travel' (flights, trains, rental cars, taxis, parking), 'lodging' (hotels, airbnb), 'meals', 'software', 'equipment', 'conference', 'office', and 'internet'. IMPORTANT: Only expenses with status='approved' should be counted toward budget limits.",
        "input_schema": {
            "type": "object",
            "properties": {
                "employee_id": {
                    "type": "string",
                    "description": "The unique employee identifier (e.g., 'ENG001', 'SAL002', 'MKT001')",
                },
                "quarter": {
                    "type": "string",
                    "description": "Quarter identifier: 'Q1', 'Q2', 'Q3', or 'Q4'",
                },
            },
            "required": ["employee_id", "quarter"],
        },
    },
    {
        "name": "get_budget_by_level",
        "description": "Returns budget limits for a given employee level. Returns a JSON object with quarterly budget limits (in USD). IMPORTANT MAPPINGS: 'travel_limit' covers both 'travel' AND 'lodging' expense categories. 'meals_limit' covers 'meals' expenses. 'equipment_limit' covers 'equipment' expenses. 'software_limit' covers 'software' expenses. 'conference_limit' covers 'conference' expenses. Other categories ('office', 'internet') count toward 'total_limit' but don't have specific category limits. Available levels: junior, mid, senior, staff, principal.",
        "input_schema": {
            "type": "object",
            "properties": {
                "level": {
                    "type": "string",
                    "description": "Employee level: 'junior', 'mid', 'senior', 'staff', or 'principal'",
                }
            },
            "required": ["level"],
        },
    },
]

tool_functions = {
    "get_team_members": get_team_members,
    "get_expenses": get_expenses,
    "get_budget_by_level": get_budget_by_level,
}

## Traditional Tool Calling (Baseline)

In this first example, we'll use traditional tool calling to establish our baseline.

We'll call the `messages.create` API with our initial query. When the model stops with a `tool_use` reason, we will execute the tool as requested, and then add the output from the tool to the messages and call the model again.

In [4]:
import time

from anthropic.types import TextBlock, ToolUseBlock


def run_agent_without_ptc(user_message):
    """Run agent using traditional tool calling"""
    messages: list[MessageParam] = [{"role": "user", "content": user_message}]
    total_tokens = 0
    start_time = time.time()
    api_counter = 0

    while True:
        response = client.messages.create(
            model=MODEL,
            max_tokens=4000,
            tools=tools,
            messages=messages,
        )

        api_counter += 1

        # Track token usage
        total_tokens += response.usage.input_tokens + response.usage.output_tokens
        viz.capture(response)
        if response.stop_reason == "end_turn":
            # Extract the first text block from the response
            final_response = next(
                (block.text for block in response.content if isinstance(block, TextBlock)),
                None,
            )
            elapsed_time = time.time() - start_time
            return final_response, messages, total_tokens, elapsed_time, api_counter

        # Process tool calls
        if response.stop_reason == "tool_use":
            # First, add the assistant's response to messages
            messages.append({"role": "assistant", "content": response.content})

            # Collect all tool results
            tool_results = []

            for block in response.content:
                if isinstance(block, ToolUseBlock):
                    tool_name = block.name
                    tool_input = block.input
                    tool_use_id = block.id

                    result = tool_functions[tool_name](**tool_input)

                    # Format result as proper content for the API
                    if isinstance(result, list) and result and isinstance(result[0], str):
                        content = "\n".join(result)
                    elif isinstance(result, (dict, list)):
                        content = json.dumps(result)
                    else:
                        content = str(result)

                    tool_results.append(
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_use_id,
                            "content": content,
                        }
                    )

            messages.append({"role": "user", "content": tool_results})

        else:
            print(f"\nUnexpected stop reason: {response.stop_reason}")
            elapsed_time = time.time() - start_time

            final_response = next(
                (block.text for block in response.content if isinstance(block, TextBlock)),
                f"Stopped with reason: {response.stop_reason}",
            )
            return final_response, messages, total_tokens, elapsed_time, api_counter

Our initial query to the model provides some instructions to help guide the model. For brevity, we've asked the model to only call each tool once. For deeper investigations, the model may wish to look into multiple systems or time spans.

In [5]:
query = "Get the engineering team members, fetch their Q3 expenses, and identify who exceeded their travel budget (travel_limit). For each person who exceeded, show their name, level, total travel+lodging spending, and their travel budget limit."

In [6]:
# Run the agent
result, conversation, total_tokens, elapsed_time, api_count_without_ptc = run_agent_without_ptc(
    query
)

print(f"Result: {result}")
print(f"API calls made: {api_count_without_ptc}")
print(f"Total tokens used: {total_tokens:,}")
print(f"Total time taken: {elapsed_time:.2f}s")

Result: Now let me analyze the data to identify who exceeded their travel budget. I'll calculate the total approved travel + lodging expenses for each person and compare them to their travel_limit.

**Analysis Results:**

Based on the Q3 expenses for the engineering team, here are the employees who exceeded their travel budget (travel_limit covers both travel and lodging expenses):

---

### **Employees Who Exceeded Their Travel Budget:**

1. **Alice Chen** (Senior Software Engineer)
   - **Level:** Senior
   - **Travel Budget Limit:** $6,000
   - **Total Travel + Lodging Spending:** $3,698.52 (Travel: $2,431.30 + Lodging: $2,488.48 - some lodging rejected = $2,946.38)
   
   Let me recalculate this properly:
   - Travel (approved): $204.58 + $244.19 + $44.02 + $71.80 + $359.38 + $106.74 + $51.86 + $1,348.73 = $2,431.30
   - Lodging (approved): $152.60 + $551.58 + $268.13 + $676.53 = $1,648.84
   - **Total: $4,080.14** (Under budget ✓)

2. **Bob Martinez** (Staff Engineer)
   - **Level

Great! We can see that Claude was able to use the available tools successfully to identify which team members exceeded their travel budgets. However, we can also see that we used a lot of tokens to accomplish this task. Claude had to ingest all the expense line items through its context window—potentially 100+ records per employee—in order to parse them, sum up the totals by category, and compare against budget limits.

With traditional tool calling, every single expense record flows through the model's context, significantly increasing token consumption. Let's see if we can use PTC to improve performance by allowing Claude to write code that processes these large datasets in the code execution environment instead.

To enable PTC on tools, we must first add the `allowed_callers` field to any tool that should be callable via code execution.

**Key points to consider**

- Tools without allowed_callers default to model-only invocation
- Tools can be invoked by both the model AND code execution by including multiple callers: `["direct", "code_execution_20250825"]`
- Only opt in tools that are safe for programmatic/repeated execution.


In [7]:
import copy


ptc_tools = copy.deepcopy(tools)
for tool in ptc_tools:
    tool["allowed_callers"] = ["code_execution_20250825"]  # type: ignore


# Add the code execution tool
ptc_tools.append(
    {
        "type": "code_execution_20250825",  # type: ignore
        "name": "code_execution",
    }
)

Now that we've updated our tool definitions to allow programmatic tool calling, we can run our agent with PTC. In order to do so, we've had to make a few changes to our function. We must use the `beta` messages API. 

1. We've added `"code-execution-with-tools-2025-09-08,code-execution-2025-08-25"` to betas. 
2. We pass in the `container_id` if it is defined with our request. This is only necessary for stateful workflows like ours. In single-turn workflows this is not required.
3. We can check the `caller` field in the `tool_use` block to determine if this tool call is from a direct model invocation or from programmatic invocation. 

Note that in either case, we send our tool results via the Claude API, however only `direct` invocations will be "seen" by the model. `code_execution_20250825` types will only be seen my the code execution container. 

In [8]:
from anthropic.types.beta import (
    BetaTextBlock,
    BetaToolUseBlock,
)

messages = []


def run_agent_with_ptc(user_message):
    """Run agent using PTC"""
    messages.append({"role": "user", "content": user_message})
    total_tokens = 0
    start_time = time.time()
    container_id = None
    api_counter = 0

    while True:
        # Build request with PTC beta headers
        request_params = {
            "model": MODEL,
            "max_tokens": 4000,
            "tools": ptc_tools,
            "messages": messages,
        }

        response = client.beta.messages.create(
            **request_params,
            betas=["code-execution-with-tools-2025-09-08,code-execution-2025-08-25"],
            extra_body={"container": container_id} if container_id else None,
        )
        viz.capture(response)
        api_counter += 1

        # Track container for stateful execution
        if hasattr(response, "container") and response.container:
            container_id = response.container.id
            print(f"\n[Container] ID: {container_id}")
            if hasattr(response.container, "expires_at"):
                # If the container has expired, we would need to restart our workflow. In our case, it completes before expiration.
                print(f"[Container] Expires at: {response.container.expires_at}")

        # Track token usage
        total_tokens += response.usage.input_tokens + response.usage.output_tokens

        if response.stop_reason == "end_turn":
            # Extract the first text block from the response
            final_response = next(
                (block.text for block in response.content if isinstance(block, BetaTextBlock)),
                None,
            )
            elapsed_time = time.time() - start_time
            return final_response, messages, total_tokens, elapsed_time, api_counter

        # As before, we process tool calls
        if response.stop_reason == "tool_use":
            # First, add the assistant's response to messages
            messages.append({"role": "assistant", "content": response.content})

            # Collect all tool results
            tool_results = []

            for block in response.content:
                if isinstance(block, BetaToolUseBlock):
                    tool_name = block.name
                    tool_input = block.input
                    tool_use_id = block.id

                    # We can use caller type to understand how the tool was invoked
                    caller_type = block.caller["type"]  # type: ignore

                    if caller_type == "code_execution_20250825":
                        print(f"[PTC] Tool called from code execution environment: {tool_name}")

                    elif caller_type == "direct":
                        print(f"[Direct] Tool called by model: {tool_name}")

                    result = tool_functions[tool_name](**tool_input)

                    # Format result as proper content for the API
                    if isinstance(result, list) and result and isinstance(result[0], str):
                        content = "\n".join(result)
                    elif isinstance(result, (dict, list)):
                        content = json.dumps(result)
                    else:
                        content = str(result)

                    tool_results.append(
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_use_id,
                            "content": content,
                        }
                    )

            messages.append({"role": "user", "content": tool_results})

        else:
            print(f"\nUnexpected stop reason: {response.stop_reason}")
            elapsed_time = time.time() - start_time

            final_response = next(
                (block.text for block in response.content if isinstance(block, BetaTextBlock)),
                f"Stopped with reason: {response.stop_reason}",
            )
            return final_response, messages, total_tokens, elapsed_time, api_counter

In [9]:
# Run the PTC agent
result_ptc, conversation_ptc, total_tokens_ptc, elapsed_time_ptc, api_count_with_ptc = (
    run_agent_with_ptc(query)
)


[Container] ID: container_011CVHuuWEtZJ5P1b1KmjecE
[Container] Expires at: 2025-11-19 20:59:13.145825+00:00
[PTC] Tool called from code execution environment: get_team_members



[Container] ID: container_011CVHuuWEtZJ5P1b1KmjecE
[Container] Expires at: 2025-11-19 20:59:28.092780+00:00
[PTC] Tool called from code execution environment: get_team_members



[Container] ID: container_011CVHuuWEtZJ5P1b1KmjecE
[Container] Expires at: 2025-11-19 20:59:30.208705+00:00
[PTC] Tool called from code execution environment: get_expenses
[PTC] Tool called from code execution environment: get_budget_by_level
[PTC] Tool called from code execution environment: get_expenses
[PTC] Tool called from code execution environment: get_budget_by_level
[PTC] Tool called from code execution environment: get_budget_by_level
[PTC] Tool called from code execution environment: get_budget_by_level
[PTC] Tool called from code execution environment: get_expenses
[PTC] Tool called from code execution environment: get_budget_by_level
[PTC] Tool called from code execution environment: get_expenses
[PTC] Tool called from code execution environment: get_expenses
[PTC] Tool called from code execution environment: get_expenses
[PTC] Tool called from code execution environment: get_expenses
[PTC] Tool called from code execution environment: get_expenses


In [10]:
print(f"\n{'=' * 60}")
print(f"Result: {result_ptc}")
print(f"\n{'=' * 60}")
print("Performance Metrics:")
print(
    f"  Total API calls to Claude: {len([m for m in conversation_ptc if m['role'] == 'assistant'])}"
)
print(f"  Total tokens used: {total_tokens_ptc:,}")
print(f"  Total time taken: {elapsed_time_ptc:.2f}s")


Result: ## Summary

Two engineering team members exceeded their travel budget in Q3:

1. **Emma Johnson** (Junior Software Engineer)
   - Total Travel + Lodging Spending: **$4,686.37**
   - Travel Budget Limit: **$2,000.00**
   - Exceeded by: **$2,686.37** (134% over budget)

2. **Carol White** (Software Engineer - Mid-level)
   - Total Travel + Lodging Spending: **$4,239.18**
   - Travel Budget Limit: **$4,000.00**
   - Exceeded by: **$239.18** (6% over budget)

Emma Johnson significantly exceeded her junior-level travel budget, spending more than double her quarterly limit. Carol White's overage is relatively minor at about 6% over her mid-level limit.

Performance Metrics:
  Total API calls to Claude: 3
  Total tokens used: 14,767
  Total time taken: 32.33s


## Performance Comparison

Let's compare the performance between traditional tool calling and PTC:

In [11]:
import pandas as pd

# Create comparison dataframe
comparison_data = {
    "Metric": [
        "API Calls",
        "Total Tokens",
        "Elapsed Time (s)",
        "Token Reduction",
        "Time Reduction",
    ],
    "Traditional": [
        api_count_without_ptc,
        f"{total_tokens:,}",
        f"{elapsed_time:.2f}",
        "-",
        "-",
    ],
    "PTC": [
        api_count_with_ptc,
        f"{total_tokens_ptc:,}",
        f"{elapsed_time_ptc:.2f}",
        f"{((total_tokens - total_tokens_ptc) / total_tokens * 100):.1f}%",
        f"{((elapsed_time - elapsed_time_ptc) / elapsed_time * 100):.1f}%",
    ],
}

df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))

          Metric Traditional    PTC
       API Calls           3      4
    Total Tokens      33,471 14,767
Elapsed Time (s)       39.36  32.33
 Token Reduction           -  55.9%
  Time Reduction           -  17.9%


## Key Takeaways

In this example, PTC demonstrated significant performance improvements through three core capabilities:

### 1. Context Preservation Through Large Data Parsing
This was the primary benefit demonstrated in our workflow. Claude wrote code to fetch and process hundreds of expense line items (20-50+ per employee across 8 team members) within the code execution environment, parsing JSON, filtering by status, summing amounts by category, and comparing against budget limits—all without sending the raw expense data through the model's context window. This resulted in a **55.9% reduction in token usage**.

### 2. Programmatic Tool Orchestration
Claude wrote code that could call multiple tools programmatically within a loop, processing all team members systematically. While we saw 4 API calls with PTC versus 3 with traditional tool calling, the token efficiency gains far outweighed the minimal increase in round trips.

### 3. Computational Logic in Code Execution
Rather than requiring the model to mentally track and sum dozens of expenses, Claude delegated the arithmetic and aggregation logic to Python code. This reduced cognitive load on the model and ensured precise calculations.

--- 

## When to Use PTC

PTC is most beneficial when:

- **Working with large datasets** that need filtering, parsing, or aggregation (like our expense analysis)
- **Multiple tool calls are needed** in sequence or in loops across similar entities
- **Computational logic** can reduce what needs to flow through the model's context
- **Tools are safe** for programmatic/repeated execution without human oversight

## Conclusion

Our team expense analysis demonstrated PTC's primary strength: **dramatically reducing context consumption when working with large, structured datasets**. By allowing Claude to write code that orchestrates tool calls and processes results programmatically, we achieved a 55.9% reduction in token usage while maintaining accuracy and insight quality. This makes PTC particularly valuable for workflows involving bulk data processing, repeated tool invocations, or scenarios where raw tool outputs would otherwise pollute the model's context.

## Next Steps

Try adapting this pattern to your own use cases:
- Financial data analysis and reporting
- Multi-entity health checks or status monitoring  
- Large file processing (CSV, JSON, XML parsing)
- Database query result aggregation
- Batch API operations across multiple resources