# Automatic Context Compaction

Long-running agentic tasks can often exceed context limits. Tool heavy workflows or long conversations quickly consume the token context window. In [Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents), we discussed how managing context can help avoid performance degradation and context rot.

The Claude Agent Python SDK can help manage this context by automatically compressing conversation history when token usage exceeds a configurable threshold, allowing tasks to continue beyond the typical 200k token context limit.

In this cookbook, we'll demonstrate context compaction through an **agentic customer service workflow**. Imagine you've built an AI customer service agent tasked with processing a queue of support tickets. For each ticket, you must classify the issue, search the knowledge base, set priority, route to the appropriate team, draft a response, and mark it complete. As you process ticket after ticket, the conversation history fills with classifications, knowledge base searches, and drafted responses‚Äîquickly consuming thousands of tokens.

## What is Context Compaction?

When building agentic workflows with tool use, conversations can grow very large as the agent iterates on complex tasks. The `compaction_control` parameter provides automatic context management by:

1. Monitoring token usage per turn in the conversation
2. When a threshold is exceeded, injecting a summary prompt as a user turn
3. Having the model generate a summary wrapped in `<summary></summary>` tags. These tags aren't parsed, but are there to help guide the model.
4. Clearing the conversation history and resuming with only the summary
5. Continuing the task with the compressed context

## By the end of this cookbook, you'll be able to:
 
 - Understand how to effectively manage context limits in iterative workflows
 - Write agents that leverage automatic context compaction
 - Design workflows that maintain focus across multiple iterations

##  Prerequisites

Before following this guide, ensure you have:

**Required Knowledge**

- Basic understanding of agentic patterns and tool calling

**Required Tools**

Python 3.11 or higher
Anthropic API key
Anthropic SDK >= 0.74.1

## Setup

First, install the required dependencies:

In [None]:
# %pip install -qU anthropic python-dotenv

/Users/pedram/code/claude-cookbooks-private/.venv/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.


Note: Ensure your .env file contains:

`ANTHROPIC_API_KEY=your_key_here`

Load your environment variables and configure the client. We also load a helper utility to visualize Claude message responses.


In [2]:
from dotenv import load_dotenv
from utils.visualize import visualize

load_dotenv()

MODEL = "claude-sonnet-4-5"
viz = visualize(auto_show=True)

## Setting the Stage

In [utils/customer_service_tools.py](utils/customer_service_tools.py), we've defined several functions for processing customer support tickets:

- `get_next_ticket()` - Retrieves the next unprocessed ticket from the queue
- `classify_ticket(ticket_id, category)` - Categorizes issues as billing, technical, account, product, or shipping
- `search_knowledge_base(query)` - Finds relevant help articles and solutions
- `set_priority(ticket_id, priority)` - Assigns priority levels (low, medium, high, urgent)
- `route_to_team(ticket_id, team)` - Routes tickets to the appropriate support team
- `draft_response(ticket_id, response_text)` - Creates customer-facing responses
- `mark_complete(ticket_id)` - Finalizes processed tickets

For a customer service agent, these tools enable processing tickets systematically. Each ticket requires classification, research, prioritization, routing, and response drafting. When processing 20-30 tickets in sequence, the conversation history fills with tool results from every classification, every knowledge base search, and every drafted response, causing exponential token growth.

We'll start by using the `beta_tool` decorator to import these functions as tools for our agent:

```python
import anthropic
from anthropic import beta_tool

@beta_tool
def get_next_ticket() -> dict:
    """Retrieve the next unprocessed support ticket from the queue."""
    ...
```

Since our API already has the functions defined using the `beta_tool` decoractor, so we can import them and use them with the Claude API directly here.

In [3]:
import anthropic
from utils.customer_service_tools import (
    classify_ticket,
    draft_response,
    get_next_ticket,
    initialize_ticket_queue,
    mark_complete,
    route_to_team,
    search_knowledge_base,
    set_priority,
)

client = anthropic.Anthropic()

tools = [
    get_next_ticket,
    classify_ticket,
    search_knowledge_base,
    set_priority,
    route_to_team,
    draft_response,
    mark_complete,
]

## Baseline: Running Without Compaction

Let's start with a realistic customer service scenario: Processing a queue of support tickets. The workflow looks like this:

**For Each Ticket:**
1. Fetch the ticket using `get_next_ticket()`
2. Classify the issue category (billing, technical, account, product, shipping)
3. Search the knowledge base for relevant information
4. Set appropriate priority (low, medium, high, urgent)
5. Route to the correct team
6. Draft a customer response
7. Mark the ticket complete
8. Move to the next ticket

**The Challenge**: With 5 tickets in the queue, and each requiring 7 tool calls, Claude will make 35 tool calls. The results from each step including classification knowledge base search, and drafted responses accumulate in the conversation history. Without compaction, all this data stays in memory for every ticket, by ticket #5, the context includes complete details from all 4 previous tickets.

Let's run this workflow **without compaction** and observe what happens:

In [4]:
from anthropic.types.beta import BetaMessageParam

num_tickets = 5
initialize_ticket_queue(num_tickets)

messages: list[BetaMessageParam] = [
    {
        "role": "user",
        "content": f"""You are an AI customer service agent. Your task is to process support tickets from a queue.

For EACH ticket, you must complete ALL these steps:

1. **Fetch ticket**: Call get_next_ticket() to retrieve the next unprocessed ticket
2. **Classify**: Call classify_ticket() to categorize the issue (billing/technical/account/product/shipping)
3. **Research**: Call search_knowledge_base() to find relevant information for this ticket type
4. **Prioritize**: Call set_priority() to assign priority (low/medium/high/urgent) based on severity
5. **Route**: Call route_to_team() to assign to the appropriate team
6. **Draft**: Call draft_response() to create a helpful customer response using KB information
7. **Complete**: Call mark_complete() to finalize this ticket
8. **Continue**: Immediately fetch the next ticket and repeat

IMPORTANT RULES:
- Process tickets ONE AT A TIME in sequence
- Complete ALL 7 steps for each ticket before moving to the next
- Keep fetching and processing tickets until you get an error that the queue is empty
- There are {num_tickets} tickets total - process all of them
- Be thorough but efficient

Begin by fetching the first ticket.""",
    }
]

total_input = 0
total_output = 0
turn_count = 0

runner = client.beta.messages.tool_runner(
    model=MODEL,
    max_tokens=4096,
    tools=tools,
    messages=messages,
)

for message in runner:
    turn_count += 1
    total_input += message.usage.input_tokens
    total_output += message.usage.output_tokens
    print(
            f"Turn {turn_count:2d}: Input={message.usage.input_tokens:7,} tokens | "
            f"Output={message.usage.output_tokens:5,} tokens | "
            f"Messages={len(runner._params['messages']):2d} | "
            f"Cumulative In={total_input:8,}"
        )

print(f"\n{'=' * 60}")
print("BASELINE RESULTS (NO COMPACTION)")
print(f"{'=' * 60}")
print(f"Total turns:   {turn_count}")
print(f"Input tokens:  {total_input:,}")
print(f"Output tokens: {total_output:,}")
print(f"Total tokens:  {total_input + total_output:,}")
print(f"{'=' * 60}")

Turn  1: Input=  1,537 tokens | Output=   57 tokens | Messages= 1 | Cumulative In=   1,537
Turn  2: Input=  1,759 tokens | Output=  153 tokens | Messages= 3 | Cumulative In=   3,296
Turn  3: Input=  2,221 tokens | Output=  137 tokens | Messages= 5 | Cumulative In=   5,517
Turn  4: Input=  2,526 tokens | Output=  245 tokens | Messages= 7 | Cumulative In=   8,043
Turn  5: Input=  2,820 tokens | Output=   58 tokens | Messages= 9 | Cumulative In=  10,863
Turn  6: Input=  3,003 tokens | Output=   57 tokens | Messages=11 | Cumulative In=  13,866
Turn  7: Input=  3,234 tokens | Output=  141 tokens | Messages=13 | Cumulative In=  17,100
Turn  8: Input=  3,703 tokens | Output=  137 tokens | Messages=15 | Cumulative In=  20,803
Turn  9: Input=  4,008 tokens | Output=  277 tokens | Messages=17 | Cumulative In=  24,811
Turn 10: Input=  4,334 tokens | Output=   58 tokens | Messages=19 | Cumulative In=  29,145
Turn 11: Input=  4,516 tokens | Output=   56 tokens | Messages=21 | Cumulative In=  33,661

Now that we have our baseline, we can see how token usage grows as we process more tickets. By ticket #5, the context is bloated with all previous ticket details, leading to high token consumption. Let's preview the final response from Claude after processing all tickets:

In [5]:
print(message.content[-1].text)

Perfect! I have successfully processed all 5 tickets in the queue. Here's a summary of what was completed:

## Processing Summary

**TICKET-1 - Jane Johnson** (Two-factor authentication issue)
- Category: Account
- Priority: High
- Team: Account Services
- Issue: Customer locked out due to 2FA/lost phone with backup codes not working

**TICKET-2 - Alex Johnson** (Billing cycle confusion)
- Category: Billing
- Priority: Medium
- Team: Billing Team
- Issue: Confusion about billing dates (signed up 11/3 but charged 11/13)

**TICKET-3 - Morgan Smith** (App crashes on startup)
- Category: Technical
- Priority: High
- Team: Tech Support
- Issue: App crashing after v4.5.1 update on iPad Pro with macOS 14.2

**TICKET-4 - Jane Davis** (Payment method update error)
- Category: Billing
- Priority: Medium
- Team: Billing Team
- Issue: Unable to update credit card information - getting error message

**TICKET-5 - Chris Jones** (Payment method update error)
- Category: Billing
- Priority: Medium
- T

### Understanding the Problem

In the baseline workflow above, Claude had to:
- Process **5 support tickets** sequentially
- Complete **7 steps per ticket** (fetch, classify, research, prioritize, route, draft, complete)
- Make **35 tool calls** with results accumulating in conversation history
- Store **every classification, every knowledge base search, every drafted response** in memory

**Why This Happens**:
1. **Exponential token growth** - With each tool use, the entire conversation history (including all previous tool results) is sent to Claude
2. **Context pollution** - Ticket A's classification and drafted response remain in context while processing Ticket B
3. **Compounding costs** - By the time you're on Ticket #5, you're sending data from all 4 previous tickets on every API call
4. **Slower responses** - Processing massive contexts takes longer
5. **Risk of hitting limits** - Eventually you hit the 200k token context window


**What We Actually Need**: After completing Ticket A, we only need a **brief summary** (ticket resolved, category, priority) - not the full classification result, knowledge base search, and complete drafted response. The detailed workflow should be discarded, keeping only completion summaries.

Let's see how automatic context compaction solves this problem.

## Enabling Automatic Context Compaction

Now let's run the exact same customer service workflow, but with automatic context compaction enabled. We simply add the `compaction_control` parameter to our tool runner.

The `compaction_control` parameter has one required field and several optional ones:

- **`enabled`** (required): Boolean to turn compaction on/off
- **`context_token_threshold`** (optional): Token count that triggers compaction (default: 100,000)
- **`model`** (optional): Model to use for summarization (defaults to the main model)
- **`summary_prompt`** (optional): Custom prompt for generating summaries

For this customer service workflow, we'll use a **5,000 token threshold** - this means after processing several tickets (which generates classifications, KB searches, and responses), compaction will trigger. This allows Claude to:
1. **Keep completion summaries** (tickets resolved, categories, outcomes)
2. **Discard detailed tool results** (full KB articles, complete classifications, drafted response text)
3. **Start fresh** when processing the next batch of tickets

This mimics how a real support agent works: resolve the ticket, document it briefly, move to the next case.

In [6]:
# Re-initialize queue and run with compaction
initialize_ticket_queue(num_tickets)

total_input_compact = 0
total_output_compact = 0
turn_count_compact = 0
compaction_count = 0
prev_msg_count = 0

runner = client.beta.messages.tool_runner(
    model=MODEL,
    max_tokens=4096,
    tools=tools,
    messages=messages,
    compaction_control={
        "enabled": True,
        "context_token_threshold": 3000,
    },
)

for message in runner:
    turn_count_compact += 1
    total_input_compact += message.usage.input_tokens
    total_output_compact += message.usage.output_tokens
    messages_list = list(runner._params["messages"])
    curr_msg_count = len(messages_list)

    if curr_msg_count < prev_msg_count:
        compaction_count += 1
        print(f"üîÑ Compaction occurred! Messages: {prev_msg_count} ‚Üí {curr_msg_count}")
        print("   Summary message after compaction:")
        print(messages_list[-1]['content'][-1].text)    # type: ignore

    prev_msg_count = curr_msg_count
    print(
            f"Turn {turn_count_compact:2d}: Input={message.usage.input_tokens:7,} tokens | "
            f"Output={message.usage.output_tokens:5,} tokens | "
            f"Messages={len(runner._params['messages']):2d} | "
            f"Cumulative In={total_input_compact:8,}"
        )



print(f"\n{'=' * 60}")
print("OPTIMIZED RESULTS (WITH COMPACTION)")
print(f"{'=' * 60}")
print(f"Total turns:   {turn_count_compact}")
print(f"Compactions:   {compaction_count}")
print(f"Input tokens:  {total_input_compact:,}")
print(f"Output tokens: {total_output_compact:,}")
print(f"Total tokens:  {total_input_compact + total_output_compact:,}")
print(f"{'=' * 60}")

Turn  1: Input=  1,537 tokens | Output=   61 tokens | Messages= 1 | Cumulative In=   1,537
Turn  2: Input=  1,775 tokens | Output=  154 tokens | Messages= 3 | Cumulative In=   3,312
Turn  3: Input=  2,259 tokens | Output=  137 tokens | Messages= 5 | Cumulative In=   5,571
Turn  4: Input=  2,564 tokens | Output=  254 tokens | Messages= 7 | Cumulative In=   8,135
Turn  5: Input=  2,867 tokens | Output=   58 tokens | Messages= 9 | Cumulative In=  11,002
Turn  6: Input=  3,051 tokens | Output=   56 tokens | Messages=11 | Cumulative In=  14,053
üîÑ Compaction occurred! Messages: 11 ‚Üí 1
   Summary message after compaction:
<summary>
## Task Progress Summary

**Objective:** Process 5 support tickets from the queue, completing all 7 steps for each ticket in sequence.

**Steps Required Per Ticket:**
1. Fetch ticket (get_next_ticket)
2. Classify (classify_ticket)
3. Research (search_knowledge_base)
4. Prioritize (set_priority)
5. Route (route_to_team)
6. Draft response (draft_response)
7. Mar

In [7]:
print(message.content[-1].text)

üéâ **TASK COMPLETED!**

All 5 tickets have been successfully processed through all 7 required steps:

### Final Summary:

**TICKET-1** ‚úì - Jane Davis - Billing (unexpected charge after cancellation) ‚Üí billing-team
**TICKET-2** ‚úì - Alex Jones - Technical (app crash v5.7.18) ‚Üí tech-support  
**TICKET-3** ‚úì - Jane Johnson - Billing (duplicate charges) ‚Üí billing-team
**TICKET-4** ‚úì - Alex Smith - Shipping (damaged product) ‚Üí logistics-team
**TICKET-5** ‚úì - Chris Smith - Account (email change without old access) ‚Üí account-services

All tickets have been:
1. ‚úì Fetched from queue
2. ‚úì Classified appropriately
3. ‚úì Researched in knowledge base
4. ‚úì Prioritized (4 high, 1 medium)
5. ‚úì Routed to correct teams
6. ‚úì Response drafted
7. ‚úì Marked complete

**Status:** 5 of 5 tickets completed and ready for team review! üéØ


### Comparing Results

Notice the dramatic difference! With compaction enabled at 5,000 tokens:

1. **Context resets after several tickets** - When processing 5-7 tickets generates 5k+ tokens of tool results, the SDK automatically:
   - Injects a summary prompt
   - Has Claude generate a completion summary wrapped in `<summary></summary>` tags
   - Clears the conversation history and discards detailed classifications, KB searches, and responses
   - Continues with only the completion summary

2. **Input tokens stay bounded** - Instead of accumulating to 100k+ as we process more tickets, input tokens reset after each compaction. When processing Ticket #5, we're NOT carrying the full tool results from Tickets #1-4.

3. **Task completes successfully** - The workflow continues smoothly through all tickets without hitting context limits

4. **Quality is preserved** - The summaries retain critical information:
   - Tickets processed with their IDs
   - Categories and priorities assigned
   - Teams routed to
   - Overall progress status
   
   All tickets are still properly classified, prioritized, routed, and responded to.

5. **Natural workflow** - This mirrors how real support agents work: resolve a ticket, document it briefly in the system, close it, move to the next one. You don't keep every knowledge base article and full response draft open while working on new tickets.

Let's visualize the token savings:

In [8]:
# Compare baseline vs compaction
print("=" * 70)
print("TOKEN USAGE COMPARISON")
print("=" * 70)
print(f"{'Metric':<30} {'Baseline':<20} {'With Compaction':<20}")
print("-" * 70)
print(f"{'Input tokens:':<30} {total_input:>19,} {total_input_compact:>19,}")
print(f"{'Output tokens:':<30} {total_output:>19,} {total_output_compact:>19,}")
print(
    f"{'Total tokens:':<30} {total_input + total_output:>19,} {total_input_compact + total_output_compact:>19,}"
)
print(f"{'Compactions:':<30} {'N/A':>19} {compaction_count:>19}")
print("=" * 70)

# Calculate savings
token_savings = (total_input + total_output) - (total_input_compact + total_output_compact)
savings_percent = (
    (token_savings / (total_input + total_output)) * 100 if (total_input + total_output) > 0 else 0
)

print(f"\nüí∞ Token Savings: {token_savings:,} tokens ({savings_percent:.1f}% reduction)")

TOKEN USAGE COMPARISON
Metric                         Baseline             With Compaction     
----------------------------------------------------------------------
Input tokens:                              150,881              51,894
Output tokens:                               3,902               3,856
Total tokens:                              154,783              55,750
Compactions:                                   N/A                   5

üí∞ Token Savings: 99,033 tokens (64.0% reduction)


## How Compaction Works Under the Hood

When the `tool_runner` detects that token usage has exceeded the threshold, it automatically:

1. **Pauses the workflow** before making the next API call
2. **Injects a summary request** as a user message asking Claude to summarize progress
3. **Generates a summary** - Claude produces a summary wrapped in `<summary></summary>` tags containing:
   - **Completed tickets**: Brief records of tickets resolved (IDs, categories, priorities, outcomes)
   - **Progress status**: How many tickets processed, how many remain
   - **Key patterns**: Any notable trends across tickets
   - **Next steps**: What to do next (continue processing remaining tickets)
4. **Clears history** - The entire conversation history (including all tool results) is replaced with just the summary
5. **Resumes processing** - Claude continues working with the compressed context, processing the next batch of tickets

## Customizing Compaction Configuration

You can customize how compaction works to fit your specific use case. Here are the key configuration options:

### Adjusting the Threshold

The `context_token_threshold` determines when compaction triggers:

```python
compaction_control={
    "enabled": True,
    "context_token_threshold": 5000,  # Compact after processing 5-7 tickets
}
```

**Guidelines:**
- **Very low thresholds (5k-20k)**: 
  - Use for iterative task processing with clear boundaries
  - Our customer service workflow uses 5k to compact after processing several tickets
  - More frequent compaction, minimal context accumulation
  - Best for sequential entity processing
  
- **Medium thresholds (50k-100k)**: 
  - Multi-phase workflows with fewer, larger natural checkpoints
  - Balance between context retention and management
  - Suitable for workflows with expensive tool calls
  
- **High thresholds (100k-150k)**: 
  - Tasks requiring substantial historical context
  - Less frequent compaction preserves more raw details
  - Higher per-call costs but fewer compactions
  
- **Default (150k)**: Good balance for general long-running tasks

**For ticket processing**: The 5k threshold works well because each ticket's workflow generates substantial tool results, but tickets are independent. After resolving Ticket A, you don't need its detailed KB searches when processing Ticket B.

### Using a Different Model for Summarization

You can use a faster/cheaper model for generating summaries:

```python
compaction_control={
    "enabled": True,
    "model": "claude-haiku-4-5",  # Use Haiku for cost-effective summaries
}
```

This is useful when you want to optimize costs - Haiku can generate good summaries much cheaper than Sonnet.

### Custom Summary Prompts

You can provide a custom prompt to guide how summaries are generated. This is especially useful for customer service workflows where you need to preserve specific types of information.

For our workflow, we want to ensure each compaction retains:
- **Ticket summaries** for all completed tickets
- **Categories and priorities** assigned
- **Teams routed to**
- **Progress status** (tickets completed, tickets remaining)
- **Next steps** in the workflow

```python
compaction_control={
    "enabled": True,
    "summary_prompt": """You are processing customer support tickets from a queue.

Create a focused summary that preserves:

1. **COMPLETED TICKETS**: For each ticket you've fully processed:
   - Ticket ID and customer name
   - Issue category and priority assigned
   - Team routed to
   - Brief outcome

2. **PROGRESS STATUS**: 
   - How many tickets you've completed
   - Approximately how many remain in the queue

3. **NEXT STEPS**: Continue processing the next ticket

Format with clear sections and wrap in <summary></summary> tags."""
}
```

Let's see this in action:

In [9]:
# Re-initialize queue with custom summary prompt
initialize_ticket_queue(num_tickets)

custom_summary_prompt = """You are processing customer support tickets from a queue.

Create a focused summary that preserves:

1. **COMPLETED TICKETS**: For each ticket you've fully processed:
   - Ticket ID and customer name
   - Issue category and priority assigned
   - Team routed to
   - Brief outcome

2. **PROGRESS STATUS**:
   - How many tickets you've completed
   - Approximately how many remain in the queue

3. **NEXT STEPS**: Continue processing the next ticket

Format with clear sections. Wrap in <summary></summary> tags."""

runner = client.beta.messages.tool_runner(
    model=MODEL,
    max_tokens=4096,
    tools=tools,
    messages=messages,
    compaction_control={
        "enabled": True,
        "context_token_threshold": 5000,
        "summary_prompt": custom_summary_prompt,
    },
)

turn_count_custom = 0
for message in runner:
    turn_count_custom += 1

print(f"\n‚úÖ Custom summaries completed in {turn_count_custom} turns")
print("   Each summary maintains structured records while discarding verbose tool results.")


‚úÖ Custom summaries completed in 21 turns
   Each summary maintains structured records while discarding verbose tool results.


In [10]:
print(message.content[-1].text)

Perfect! It appears TICKET-5 has already been completed. 

## ‚úÖ ALL TICKETS PROCESSED - SESSION COMPLETE

### FINAL SUMMARY

**5 of 5 tickets successfully processed:**

1. **TICKET-1** - Morgan Brown: App crashes (iPhone) ‚Üí Tech Support [HIGH]
2. **TICKET-2** - Chris Smith: Invalid tracking number ‚Üí Logistics Team [MEDIUM]
3. **TICKET-3** - John Brown: Plan comparison inquiry ‚Üí Product Success [LOW]
4. **TICKET-4** - Alex Johnson: App crashes (Android) ‚Üí Tech Support [HIGH]
5. **TICKET-5** - Morgan Davis: Upload permission errors ‚Üí Tech Support [HIGH/URGENT]

### PROCESSING STATISTICS
- **Total Tickets:** 5
- **Completion Rate:** 100%
- **Categories Handled:** Technical (3), Shipping (1), Product (1)
- **Priority Distribution:** 
  - High/Urgent: 3 tickets
  - Medium: 1 ticket
  - Low: 1 ticket

All tickets have been classified, prioritized, routed to appropriate teams, and provided with drafted responses. The support teams can now review and send the responses to customers

## Best Practices

When using automatic context compaction, follow these guidelines:

### 1. Choose the Right Threshold

- **Start with the default (150k)** for most workflows
- **Lower thresholds (5k-20k)** for:
  - **Iterative processing** like our ticket workflow (compact after processing several items)
  - **Multi-phase workflows** with natural checkpoints
  - Very long-running tasks that need many iterations
  - Cost-sensitive applications
  - Tasks where older context becomes less relevant
- **Higher thresholds (100k-150k)** for:
  - Tasks requiring detailed historical context across all phases
  - Complex reasoning that builds on previous steps
  - Workflows with expensive tool calls where you want to minimize repetition

**For ticket processing**: A threshold of 5k-10k works well because each ticket workflow is self-contained. After resolving Ticket A, you don't need its detailed tool results when processing Ticket B.

### 2. Craft Effective Summary Prompts

Good summary prompts should specify:
- **What to preserve**: Critical information, IDs, categories, outcomes, progress
- **Format**: Structured output helps maintain consistency
- **Context**: Remind Claude of the task and current phase
- **Next steps**: What remains to be done

Example for ticket processing:
```python
summary_prompt = """You are processing customer support tickets.

Include in your summary:
- COMPLETED: Brief record of each ticket (ID, category, priority, team, outcome)
- PROGRESS: Tickets completed vs remaining
- NEXT: Continue processing remaining tickets

Wrap in <summary></summary> tags."""
```

### 3. Use Appropriate Models

- **Main task**: Use the model best suited for the task (Sonnet, Haiku)
- **Summarization**: Consider using Haiku for cost savings if summaries don't require complex reasoning

```python
compaction_control={
    "enabled": True,
    "model": "claude-haiku-4-5",  # Cheaper summaries
}
```

### 4. Monitor Compaction Behavior

Track when compaction occurs to optimize your threshold:

```python
messages_list = list(runner._params["messages"])
curr_msg_count = len(messages_list)

if curr_msg_count < prev_msg_count:
    print(f"Compaction at turn {turn_count}")
    
prev_msg_count = curr_msg_count
```

In our ticket workflow, you should see compaction trigger after processing several tickets.

## Limitations and Considerations

While automatic context compaction is powerful, there are important limitations to understand:

### ‚ö†Ô∏è Server-Side Sampling Loops

**Current Limitation**: Compaction does not work optimally with server-side sampling loops, such as server-side web search tools.

**Why**: Cache tokens accumulate across sampling loops, which can trigger compaction prematurely based on cached content rather than actual conversation history.

**Workaround**: This feature works best with:
- ‚úÖ Client-side tools (like the customer service API in this cookbook)
- ‚úÖ Standard agentic workflows with regular tool use
- ‚úÖ File operations, database queries, API calls
- ‚ùå Server-side Extended Thinking
- ‚ùå Server-side web search tools

### Information Loss

**Trade-off**: Summaries inherently lose some information. While Claude is good at identifying key points, some details will be compressed or omitted.

**In ticket processing**: 
- ‚úÖ **Retained**: Ticket IDs, categories, priorities, teams, outcomes, progress status
- ‚ùå **Lost**: Full knowledge base article text, complete drafted response text, detailed classification reasoning

This is usually acceptable‚Äîyou don't need every KB article and full response text in perpetuity, just the completion records.

**Mitigation**:
- Use custom summary prompts to preserve critical information
- Set higher thresholds for tasks requiring extensive historical context
- Structure your tasks to be modular (each phase builds on summaries, not raw details)

### When NOT to Use Compaction

Avoid compaction for:

1. **Short tasks**: If your task completes within 50k-100k tokens, compaction adds unnecessary overhead
2. **Tasks requiring full audit trails**: Some tasks need access to ALL previous details
3. **Server-side sampling workflows**: As mentioned above, wait for this limitation to be addressed
4. **Highly iterative refinement**: Tasks where each step critically depends on exact details from all previous steps

### When TO Use Compaction

Compaction is ideal for:

1. **Sequential processing**: Like our ticket workflow‚Äîprocess multiple items one after another
2. **Multi-phase workflows**: Where each phase can summarize progress before moving on
3. **Iterative data processing**: Processing large datasets in chunks or entities one at a time
4. **Extended analysis sessions**: Analyzing data across many entities
5. **Batch operations**: Processing hundreds of items where each is independent

**Ticket processing is a perfect use case** because:
- Each ticket workflow is largely independent
- You need completion summaries, not full tool results
- Natural compaction points exist (after completing several tickets)
- The workflow is iterative and sequential

## Summary

Automatic context compaction is a powerful feature that enables long-running agentic workflows to exceed typical context limits. In this cookbook, we've explored compaction through a customer service ticket processing workflow.

### Key Takeaways

1. **The Problem**: Without compaction, iterative workflows face severe challenges:
   - **Context accumulation**: Processing Ticket #10 means carrying tool results from Tickets #1-9
   - **Exponential token growth**: Each new ticket adds to an ever-growing context
   - **High costs**: Thousands of unnecessary tokens sent with each API call
   - **Context limits**: Risk of hitting 200k token limit before completing all tickets

2. **The Solution**: The `compaction_control` parameter automates context management:
   ```python
   compaction_control={
       "enabled": True,
       "context_token_threshold": 5000,  # Compact after processing several tickets
       "model": "claude-haiku-4-5",      # Optional, defaults to main model
       "summary_prompt": "..."            # Optional, customize for your use case
   }
   ```

3. **How It Works**: When tokens exceed the threshold, the SDK:
   - Pauses the workflow
   - Generates a summary of progress and completed tickets
   - Clears the conversation history (discarding detailed tool results)
   - Continues seamlessly with compressed context (retaining completion summaries)

4. **Real Impact**: In our ticket processing example, compaction enabled:
   - Processing all 10 tickets with 70+ tool calls
   - Managing classifications, KB searches, and drafted responses for each ticket
   - **Significant token savings** (30-60%+ reduction in total tokens)
   - **Successful completion** without hitting context limits
   - **Quality preserved**: All tickets properly classified, prioritized, routed, and resolved

5. **Natural Workflow**: Compaction mirrors how real support agents work:
   - Process Ticket A thoroughly ‚Üí Document completion briefly
   - Close that ticket and clear your workspace
   - Move to Ticket B with a clean slate
   - At the end, all tickets are documented in the system
   
   You don't keep every knowledge base article and full response draft open while working on new tickets.

### When to Use Compaction

- ‚úÖ Sequential processing (customer service, data processing)
- ‚úÖ Iterative workflows (process hundreds of items)
- ‚úÖ Multi-phase workflows with natural checkpoints
- ‚úÖ Tasks processing large volumes of tool results
- ‚úÖ Cost-sensitive production applications
- ‚ùå Tasks requiring full, uncompressed conversation history
- ‚ùå Server-side sampling loop workflows (current limitation)

### Key Configuration Guidelines

- **5k-20k threshold**: For iterative processing with natural per-item compaction points
- **50k-100k threshold**: For multi-phase workflows with fewer, larger compaction points
- **150k threshold (default)**: For general long-running tasks
- **Custom summary prompts**: Essential for preserving workflow-specific information

### Next Steps

Try implementing compaction in your own workflows:
1. Identify natural compaction points (after processing each item, completing each phase, etc.)
2. Start with an aggressive threshold (5k-10k) if you have clear per-item boundaries
3. Use custom summary prompts to preserve critical information
4. Monitor when compaction triggers and verify quality is maintained
5. Adjust threshold based on your specific needs

For more on effective context management, see [Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents).