## Context Management (Automatic Tool Clearing) with Claude Sonnet 4.5 in Amazon Bedrock

This notebook demonstrates how to use **Automatic Tool Call Clearing** in Amazon Bedrock, a beta feature in Anthropic Claude Sonnet 4.5 that helps manage context window size by automatically removing old tool use/result pairs as conversations grow.

**Key Benefits:**
- Reduces token usage in long conversations with multiple tool calls
- Maintains recent context while clearing older tool interactions
- Prevents hitting context window limits in multi-turn tool use scenarios

**Documentation:** [Amazon Bedrock - Automatic Tool Call Clearing](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-tool-use.html#model-parameters-anthropic-claude-automatic-tool-call-clearing)

### Initial Setup

First, we'll install the latest AWS Boto3 SDK and import the required libraries to interact with Amazon Bedrock.

In [None]:
!pip install -U -q boto3

In [11]:
import boto3
import json
import random
from datetime import datetime

bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-west-2')

### Weather Function Implementation

We'll create a simple `get_weather` function that returns simulated weather data. In a production scenario, this would call a real weather API. This function will be made available to Claude as a tool.

In [13]:
def get_weather(location):
    """
    Dummy weather function that returns placeholder weather data for any location.
    In a real implementation, this would call an actual weather API.
    """
    # Generate random but realistic weather data
    temperatures = [15, 18, 22, 25, 28, 20, 16]
    conditions = ['sunny', 'partly cloudy', 'cloudy', 'light rain', 'clear']
    humidity_levels = [45, 55, 65, 70, 80]
    wind_speeds = [5, 8, 12, 15, 20]
    
    weather_data = {
        "location": location,
        "temperature": random.choice(temperatures),
        "condition": random.choice(conditions),
        "humidity": random.choice(humidity_levels),
        "wind_speed": random.choice(wind_speeds),
        "timestamp": datetime.now().isoformat(),
        "unit": "Celsius"
    }
    
    return json.dumps(weather_data)

# Test the function
print("Sample weather data:")
print(get_weather("New York"))

Sample weather data:
{"location": "New York", "temperature": 18, "condition": "clear", "humidity": 45, "wind_speed": 15, "timestamp": "2025-10-09T11:39:04.369565", "unit": "Celsius"}


### Using Amazon Bedrock InvokeModel API

Now let's make a single API call to demonstrate the basic setup. Notice the key configuration parameters:

**Important Configuration:**
- `anthropic_beta: ["context-management-2025-06-27"]` - Enables the context management beta feature
- `context_management.edits` - Defines the automatic tool clearing strategy:
  - `trigger: 50 input_tokens` - Start clearing when context exceeds 50 tokens (set low for demo purposes)
  - `keep: 1 tool_uses` - Keep only the most recent tool use/result pair
  - `clear_at_least: 50 input_tokens` - Only clear if we can remove at least 50 tokens
  - `exclude_tools: ["memory"]` - Never clear memory tool interactions

**Learn more:** [Tool Use with Claude](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-tool-use.html)

In [14]:
response = bedrock.invoke_model(
    modelId="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["context-management-2025-06-27"],
        "system":[{"type":"text", "text": "You're a helpful assistant"}],
        "max_tokens": 4096,
        "messages": [
            {
                "role": "user",
                "content": "Create a comprehensive report about the current weather in Madrid. Validate the weather at least 3 times to make sure the measurement is accurate."
        }
        ],
        "tools": [
            {
                "type": "memory_20250818",
                "name": "memory"
            },
            {
                "name": "get_weather",
                "description": "Get the weather in a given city",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        ],
        "context_management": {
            "edits": [{
                "type": "clear_tool_uses_20250919",
                "trigger": {
                    "type": "input_tokens",
                    "value": 50
                },
                "keep": {
                    "type": "tool_uses",
                    "value": 1
                },
                "clear_at_least": {
                    "type": "input_tokens",
                    "value": 50
                },
                "exclude_tools": ["memory"]
            }]
        },
    })
)

response_body = json.loads(response['body'].read().decode('utf8'))
print(f"Response: {json.dumps(response_body, indent=2)}")

Response: {
  "model": "claude-sonnet-4-5-20250929",
  "id": "msg_bdrk_01EtpYa6bXTEpQ47y9ZxKt5N",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "I'll help you create a comprehensive weather report for Madrid with multiple validations. Let me start by checking my memory, then gather the weather data."
    },
    {
      "type": "tool_use",
      "id": "toolu_bdrk_01Q6iTeUjBZ3RhvsLbJSyduA",
      "name": "memory",
      "input": {
        "command": "view",
        "path": "/memories"
      }
    }
  ],
  "stop_reason": "tool_use",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 1680,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "output_tokens": 100
  },
  "context_management": {
    "applied_edits": []
  }
}


### Complete Conversation Loop with Tool Handling

Now we'll implement a complete multi-turn conversation that:
1. Sends a request to Claude asking for weather validation (3 times)
2. Handles tool use requests from Claude
3. Executes the tools and returns results
4. Continues the conversation until Claude completes the task

**Watch for:** The `context_management.applied_edits` field in responses, which shows when and how tool clearing occurred.

In [15]:
def handle_tool_calls(tool_uses):
    """
    Handle tool calls and return tool results.
    """
    tool_results = []
    
    for tool_use in tool_uses:
        tool_name = tool_use['name']
        tool_input = tool_use['input']
        tool_use_id = tool_use['id']
        
        if tool_name == 'get_weather':
            # Call our weather function
            result = get_weather(tool_input['location'])
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use_id,
                "content": result
            })
        elif tool_name == 'memory':
            # Handle memory tool (placeholder - would need actual memory implementation)
            result = json.dumps({"status": "Memory accessed", "data": "No previous weather data found"})
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use_id,
                "content": result
            })
        else:
            # Unknown tool
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use_id,
                "content": f"Error: Unknown tool {tool_name}",
                "is_error": True
            })
    
    return tool_results

def run_conversation_with_tools():
    """
    Run a complete conversation with automatic tool clearing.
    """
    messages = [
        {
            "role": "user",
            "content": "Create a comprehensive report about the current weather in Madrid. Validate the weather at least 3 times to make sure the measurement is accurate."
        }
    ]
    
    max_iterations = 5  # Prevent infinite loops
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        print(f"\n=== Iteration {iteration} ===")
        
        # Make API call
        response = bedrock.invoke_model(
            modelId="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "anthropic_beta": ["context-management-2025-06-27"],
                "system": [{"type": "text", "text": "You're a helpful assistant that provides detailed weather reports."}],
                "max_tokens": 4096,
                "messages": messages,
                "tools": [
                    {
                        "type": "memory_20250818",
                        "name": "memory"
                    },
                    {
                        "name": "get_weather",
                        "description": "Get the weather in a given city",
                        "input_schema": {
                            "type": "object",
                            "properties": {
                                "location": {
                                    "type": "string",
                                    "description": "The city and state, e.g. San Francisco, CA"
                                }
                            },
                            "required": ["location"]
                        }
                    }
                ],
                "context_management": {
                    "edits": [{
                        "type": "clear_tool_uses_20250919",
                        "trigger": {
                            "type": "input_tokens",
                            "value": 50
                        },
                        "keep": {
                            "type": "tool_uses",
                            "value": 1
                        },
                        "clear_at_least": {
                            "type": "input_tokens",
                            "value": 50
                        },
                        "exclude_tools": ["memory"]
                    }]
                }
            })
        )
        
        # Parse response
        response_body = json.loads(response['body'].read().decode('utf8'))
        print(f"Response: {json.dumps(response_body, indent=2)}")
        
        # Add assistant message to conversation
        messages.append({
            "role": "assistant",
            "content": response_body['content']
        })
        
        # Check if there are tool uses
        tool_uses = [content for content in response_body['content'] if content.get('type') == 'tool_use']
        
        if not tool_uses:
            print("\nConversation completed - no more tool uses.")
            break
        
        # Handle tool calls
        print(f"\nHandling {len(tool_uses)} tool use(s)...")
        tool_results = handle_tool_calls(tool_uses)
        
        # Add tool results to conversation
        messages.append({
            "role": "user",
            "content": tool_results
        })
        
        print(f"Tool results: {json.dumps(tool_results, indent=2)}")
    
    return messages

# Run the conversation
conversation_history = run_conversation_with_tools()


=== Iteration 1 ===
Response: {
  "model": "claude-sonnet-4-5-20250929",
  "id": "msg_bdrk_0187Qa4K8HzgiAg95zzUepQL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "I'll help you create a comprehensive weather report for Madrid with multiple validations. Let me start by checking my memory, then get the weather data."
    },
    {
      "type": "tool_use",
      "id": "toolu_bdrk_01D5TKwwZ8JvMpLdrZ2BhciQ",
      "name": "memory",
      "input": {
        "command": "view",
        "path": "/memories"
      }
    }
  ],
  "stop_reason": "tool_use",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 1686,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "output_tokens": 100
  },
  "context_management": {
    "applied_edits": []
  }
}

Handling 1 tool use(s)...
Tool results: [
  {
    "type": "tool_result",
    "tool_use_id": "toolu_bdrk_01D5TKwwZ8JvMpLdrZ2BhciQ",
    "content": "{\"status\": \"Memory

### Understanding the Context Management Applied

Let's analyze what happened during the conversation above. The automatic tool clearing feature activated in **Iterations 3 and 4**.

#### What Happened in Each Iteration:

**Iteration 1:**
- Input tokens: 1,686
- Claude requested the `memory` tool
- No clearing occurred (as we're below the trigger threshold and the memory tool is excluded in our context management config)

**Iteration 2:**
- Input tokens: 1,815
- Claude requested 3 `get_weather` tool calls (for validation)
- No clearing occurred yet

**Iteration 3:** ⚡ **First Clearing Event**
- Input tokens: 2,130
- Context management activated:
  - `cleared_tool_uses: 2` - Removed 2 old tool use/result pairs
  - `cleared_input_tokens: 62` - Freed up 62 tokens
- **What was cleared:** The first 2 of the 3 `get_weather` calls from Iteration 2
- **What was kept:** The most recent `get_weather` call (as per `keep: 1` config)
- **What was excluded:** The `memory` tool call (as per `exclude_tools: ["memory"]`)

**Iteration 4:** ⚡ **Second Clearing Event**
- Input tokens: 2,404
- Same clearing pattern applied:
  - `cleared_tool_uses: 2`
  - `cleared_input_tokens: 62`
- Final response generated with comprehensive weather report

### Visual Representation of Context Clearing

Here's a visual breakdown of how the context window was managed throughout the conversation:

```
ITERATION 1 (1,686 tokens)
┌─────────────────────────────────────────┐
│ User: "Create weather report..."        │
│ Assistant: [text] + [memory tool_use]   │
│ User: [memory tool_result]              │
└─────────────────────────────────────────┘
Context Management: No clearing (below threshold)

ITERATION 2 (1,815 tokens)
┌─────────────────────────────────────────┐
│ User: "Create weather report..."        │
│ Assistant: [memory tool_use]            │
│ User: [memory tool_result]              │
│ Assistant: [get_weather #1 tool_use]    │ ← 3 weather calls
│            [get_weather #2 tool_use]    │
│            [get_weather #3 tool_use]    │
│ User: [get_weather #1 result]           │
│       [get_weather #2 result]           │
│       [get_weather #3 result]           │
└─────────────────────────────────────────┘
Context Management: No clearing yet

ITERATION 3 (2,130 tokens) ⚡ CLEARING ACTIVATED
┌─────────────────────────────────────────┐
│ User: "Create weather report..."        │
│ Assistant: [memory tool_use]            │ ← KEPT (excluded)
│ User: [memory tool_result]              │ ← KEPT (excluded)
│ Assistant: [get_weather #1] ❌ CLEARED  │
│            [get_weather #2] ❌ CLEARED  │
│            [get_weather #3] ✓ KEPT      │ ← Most recent kept
│ User: [result #1] ❌ CLEARED            │
│       [result #2] ❌ CLEARED            │
│       [result #3] ✓ KEPT                │
│ Assistant: [memory tool_use]            │
│ User: [memory tool_result]              │
└─────────────────────────────────────────┘
Cleared: 2 tool uses, 62 tokens freed

ITERATION 4 (2,404 tokens) ⚡ CLEARING ACTIVATED AGAIN
┌─────────────────────────────────────────┐
│ User: "Create weather report..."        │
│ Assistant: [memory tool_use]            │ ← KEPT (excluded)
│ User: [memory tool_result]              │ ← KEPT (excluded)
│ Assistant: [get_weather #3] ✓ KEPT      │ ← Most recent kept
│ User: [result #3] ✓ KEPT                │
│ Assistant: [memory tool_use]            │
│ User: [memory tool_result]              │
│ Assistant: [Final Report Text]          │
└─────────────────────────────────────────┘
Cleared: 2 more tool uses, 62 tokens freed
```

### Key Insights from the Context Management

#### 1. Trigger Threshold
We set a very low trigger threshold (`50 tokens`) for demonstration purposes. In production, you'd typically use much higher values (e.g., `100,000 tokens`) to only clear when approaching context limits.

#### 2. Selective Clearing
The clearing strategy was smart about what to remove:
- ✅ **Kept:** Memory tool interactions (excluded via `exclude_tools`)
- ✅ **Kept:** Most recent `get_weather` tool use/result pair
- ❌ **Cleared:** Older `get_weather` tool interactions that were no longer needed

#### 3. Token Savings
- Total tokens cleared: 124 tokens (62 + 62)
- This prevented the context from growing unnecessarily while maintaining enough information for Claude to complete the task

#### 4. Preserved Functionality
Despite clearing old tool calls, Claude still had access to:
- The original user request
- Memory tool interactions (for long-term context)
- The most recent weather data
- All previous text responses

This allowed Claude to generate a comprehensive final report without needing all the intermediate weather validation data.

### Configuration Best Practices

When implementing automatic tool clearing in your applications, consider these guidelines:

#### Trigger Threshold
```python
"trigger": {
    "type": "input_tokens",
    "value": 100000  # Start clearing when approaching context limits
}
```
- Use higher values (50K-100K) for production
- Consider your model's context window size
- Balance between context preservation and token efficiency

#### Keep Recent Tools
```python
"keep": {
    "type": "tool_uses",
    "value": 3  # Keep last 3 tool interactions
}
```
- Keep enough recent tools for Claude to maintain context
- Typical values: 1-5 depending on your use case

#### Exclude Critical Tools
```python
"exclude_tools": ["memory", "database_query", "user_preferences"]
```
- Exclude tools that provide essential long-term context
- Examples: memory, user data, session information

#### Clear Threshold
```python
"clear_at_least": {
    "type": "input_tokens",
    "value": 1000  # Only clear if we can free at least 1000 tokens
}
```
- Prevents clearing for minimal token savings
- Useful when using prompt caching (avoid breaking cache for small gains)

**Learn more:** [Bedrock Tool Use Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-tool-use.html)

### Common Use Cases for Automatic Tool Clearing

This feature is particularly valuable for:

1. **Long-Running Conversations**
   - Customer support chatbots with extended interactions
   - Multi-step workflows requiring many tool calls
   - Iterative data analysis tasks

2. **Repetitive Tool Usage**
   - Monitoring systems that check status repeatedly
   - Data validation requiring multiple checks
   - Search operations with refinement iterations

3. **Resource-Intensive Applications**
   - Applications with large tool outputs
   - Systems processing many documents
   - Multi-agent systems with frequent tool interactions

4. **Cost Optimization**
   - Reducing token usage in high-volume applications
   - Extending conversation length without hitting limits
   - Maintaining performance while controlling costs

### Additional Resources

- [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html)
- [Anthropic Claude Messages API - Tool Use](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-tool-use.html)
- [Automatic Tool Call Clearing](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-tool-use.html#model-parameters-anthropic-claude-automatic-tool-call-clearing)
- [Anthropic Tool Use Documentation](https://docs.anthropic.com/en/docs/tool-use)
- [Anthropic blog on Context Management](https://www.anthropic.com/news/context-management)