# Tombstoning Tool Calls

This notebook demonstrates the concept of **tombstoning tool calls** to reduce token consumption in LLM applications.

**Concept**: When tool calls return very long outputs, we can replace them with summarized versions after their first use, reducing token consumption by > 95%

**Source**: Mentioned in a recent [Anthropic podcast on YouTube.](https://www.youtube.com/watch?v=XuvKFsktX0Q)

**Approach Demonstrated**:
**LLM-Derived Tombstoning**: Using a samller LLM to summarize tool outputs into 5 words or less

## Setup


In [None]:
%pip install openai tiktoken


In [2]:
import tiktoken
encoding = tiktoken.encoding_for_model("gpt-5")

def count_tokens(text):
    return len(encoding.encode(text))

## Initial Tool Call Example


In [3]:
from openai import OpenAI
from getpass import getpass
import json

# Prompt for API key securely
api_key = getpass("Enter your OpenAI API key: ")

# Initialize the OpenAI client
client = OpenAI(api_key=api_key)

In [4]:
# Define a tool that returns a very long response (simulating verbose tool outputs)
tools = [
    {
        "type": "function",
        "name": "get_customer_profile",
        "description": "Retrieve detailed customer profile and interaction history from the CRM system.",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_id": {
                    "type": "string",
                    "description": "The unique identifier for the customer in the CRM system",
                },
            },
            "required": ["customer_id"],
        },
    },
]

def get_customer_profile(customer_id):
    return f"""CUSTOMER PROFILE - ID: {customer_id}

BASIC INFORMATION:
Name: Sarah Chen
Email: sarah.chen@techstartup.io
Phone: +1 (555) 234-9876
Company: TechStartup Inc.
Position: VP of Engineering
Industry: Software Development
Account Value: $485,000 ARR
Customer Since: March 15, 2023
Account Status: Active - Premium Tier

ENGAGEMENT HISTORY:
Sarah has been an exceptionally engaged customer with our platform, demonstrating consistent usage patterns and a deep understanding of our product capabilities. Over the past 18 months, she has participated in 12 webinars, attended 3 in-person conferences, and completed our advanced certification program. Her team of 45 engineers has achieved a 94% platform adoption rate, which is significantly above our customer average of 67%.

RECENT INTERACTIONS:
- Nov 8, 2024: Submitted feature request for enhanced API rate limiting controls
- Oct 22, 2024: Participated in Beta program for new analytics dashboard
- Oct 15, 2024: Attended Q3 Business Review meeting with our Customer Success team
- Sep 30, 2024: Raised support ticket #45892 regarding integration issues (resolved within 4 hours)
- Sep 12, 2024: Provided testimonial for case study on DevOps transformation

PURCHASE HISTORY:
The account has shown steady growth with strategic upsells aligned to their business needs. Initial purchase was our Professional plan at $15,000/month. Six months in, they upgraded to Enterprise at $28,000/month. Most recently, they added our Advanced Security module ($12,500/month) and AI-powered monitoring suite ($8,500/month). Contract renewal is scheduled for March 2025 with strong indicators for expansion into our new Data Governance offering.

SUPPORT METRICS:
Total Tickets: 23 (18 resolved, 5 ongoing)
Average Resolution Time: 6.2 hours
Customer Satisfaction Score: 9.2/10
Net Promoter Score: 9/10
Last Support Contact: 6 days ago

KEY OPPORTUNITIES:
Sarah mentioned in our last QBR that they're planning a major infrastructure modernization project in Q1 2025, which could be an excellent opportunity to introduce our Infrastructure Automation suite. Additionally, they're expanding their team by 30% next quarter, which aligns perfectly with our volume licensing incentives. She's also expressed interest in our upcoming Machine Learning Operations module during the beta program feedback session.

RISK FACTORS:
Minimal churn risk identified. The only concern noted was the recent integration issue, but it was resolved quickly and Sarah expressed satisfaction with our response time. Competitor analysis shows that two of their portfolio companies use alternative solutions, but Sarah has been vocal about the superior ROI they've experienced with our platform.

RELATIONSHIP STRENGTH:
Executive sponsorship is strong with quarterly touchpoints at the C-level. Sarah has introduced us to three other companies in her network, resulting in two new customers. She's scheduled to speak at our annual user conference next month, showcasing their success story on reducing deployment times by 75% using our platform."""


In [5]:
# Initialize conversation with a user query
messages = [
    {"role": "user", "content": "Can you pull up the profile for customer CUST-2847?"}
]

In [6]:
# Make API call with tool definitions
response = client.responses.create(
    model="gpt-5",
    tools=tools,
    input=messages
)

messages += response

# Execute tool calls and append results to messages
for item in response.output:
    if item.type == "function_call":
        if item.name == "get_customer_profile":
            args = json.loads(item.arguments)
            customer_data = get_customer_profile(args["customer_id"])
            
            messages.append({
                "type": "function_call_output",
                "call_id": item.call_id,
                "output": json.dumps({
                  "customer_profile": customer_data
                })
            })

## Analyze Original Tool Call

Save the original tool call and measure its token count to establish a baseline.


In [8]:

original_tool_call = messages[-1]

In [9]:
original_tool_call

{'type': 'function_call_output',
 'call_id': 'call_pBFRjuJ4ngrCap0bZH4p2y30',
 'output': '{"customer_profile": "CUSTOMER PROFILE - ID: CUST-2847\\n\\nBASIC INFORMATION:\\nName: Sarah Chen\\nEmail: sarah.chen@techstartup.io\\nPhone: +1 (555) 234-9876\\nCompany: TechStartup Inc.\\nPosition: VP of Engineering\\nIndustry: Software Development\\nAccount Value: $485,000 ARR\\nCustomer Since: March 15, 2023\\nAccount Status: Active - Premium Tier\\n\\nENGAGEMENT HISTORY:\\nSarah has been an exceptionally engaged customer with our platform, demonstrating consistent usage patterns and a deep understanding of our product capabilities. Over the past 18 months, she has participated in 12 webinars, attended 3 in-person conferences, and completed our advanced certification program. Her team of 45 engineers has achieved a 94% platform adoption rate, which is significantly above our customer average of 67%.\\n\\nRECENT INTERACTIONS:\\n- Nov 8, 2024: Submitted feature request for enhanced API rate limi

In [10]:
original_tool_call_tokens = count_tokens(original_tool_call["output"])

In [11]:
original_tool_call_tokens

679

## LLM-Derived Tombstoning

Use a lightweight LLM to automatically summarize the tool output into 5 words or less.

In [12]:

# Use a lightweight model to summarize the output
new_output = client.responses.create(
    model="gpt-5-nano",
    instructions="Rewrite this into 5 words or less",
    input=original_tool_call["output"]
)

In [13]:
llm_summary_tool_call = new_output.output_text

In [14]:
llm_summary_tool_call

'VP Eng at premium SaaS.'

In [15]:
new_llm_summary_tool_call = {
    "type": "function_call_output",
    "call_id": item.call_id,
    "output": json.dumps({
      "customer_profile": llm_summary_tool_call
    })
}

In [16]:
new_llm_summary_tool_call

{'type': 'function_call_output',
 'call_id': 'call_pBFRjuJ4ngrCap0bZH4p2y30',
 'output': '{"customer_profile": "VP Eng at premium SaaS."}'}

In [17]:
new_llm_summary_tool_call_tokens = count_tokens(new_llm_summary_tool_call["output"])

In [18]:
new_llm_summary_tool_call_tokens

13

## Results Comparison

Compare token counts between the original and tombstoned tool calls.


In [19]:
# Display original vs tombstoned token counts
(original_tool_call_tokens, new_llm_summary_tool_call_tokens)

(679, 13)

In [20]:
# Calculate token savings
diff_tokens = original_tool_call_tokens - new_llm_summary_tool_call_tokens

In [21]:
diff_tokens

666

### Scale to Multiple Tool Calls

Calculate savings if you have 5 tool calls in your agent.

In [22]:
diff_tokens * 5

3330

In [23]:
# Calculate percentage reduction
percentage_diff = (diff_tokens / original_tool_call_tokens) * 100

In [24]:
print(f"Token reduction: {percentage_diff:.2f}%")

Token reduction: 98.09%
