# Context Offloading with Amazon Bedrock

**Context Offloading** = Storing information outside the LLM's context via a tool that stores and manages the data.

## Why Offload Context?

1. **Avoid Context Pollution**: Keep the main context clean and focused
2. **Structured Reasoning**: Give the model a dedicated space to think through complex problems
3. **Better Tool Chains**: Process tool outputs carefully before acting
4. **Policy Compliance**: Verify decisions against guidelines without cluttering context

**Key Insight**: Anthropic's research shows a "think" tool (scratchpad) can yield up to **54% improvement** on complex agent tasks.

**References:**
- [Claude's Think Tool](https://www.anthropic.com/engineering/claude-think-tool)
- [How to Fix Your Context](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html#context-offloading)

In [None]:
import boto3
import json
from typing import List, Dict, Any, Optional

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = "anthropic.claude-sonnet-4-20250514-v1:0"

## 1. The "Think" Tool Pattern

A simple scratchpad tool that lets the model reason without polluting the main context.

From Anthropic's œÑ-Bench research:
- **Airline domain**: 54% improvement with think tool + optimized prompt
- **Retail domain**: 3.7% improvement with think tool alone
- **SWE-Bench**: 1.6% improvement on code tasks

In [None]:
# The "think" tool definition - a scratchpad for reasoning
THINK_TOOL = {
    "toolSpec": {
        "name": "think",
        "description": """Use this tool to think about something. It will not obtain new information or change any state, but just log the thought. Use it when:
- Complex reasoning or brainstorming is needed
- You need to analyze tool outputs before acting
- You need to verify compliance with policies
- You need to plan multi-step actions""",
        "inputSchema": {
            "json": {
                "type": "object",
                "properties": {
                    "thought": {
                        "type": "string",
                        "description": "Your reasoning, analysis, or notes"
                    }
                },
                "required": ["thought"]
            }
        }
    }
}

print("Think tool defined:")
print(json.dumps(THINK_TOOL, indent=2))

## 2. Basic Agent with Think Tool

Let's build a simple agent that uses the think tool for reasoning.

In [None]:
class ThinkingAgent:
    """Agent with a think tool for structured reasoning."""
    
    def __init__(self, system_prompt: str, tools: List[Dict] = None):
        self.system_prompt = system_prompt
        self.tools = [THINK_TOOL] + (tools or [])
        self.messages = []
        self.thought_log = []  # Store thoughts separately
        
    def _call_bedrock(self) -> Dict:
        """Make a Bedrock API call."""
        return bedrock.converse(
            modelId=MODEL_ID,
            messages=self.messages,
            system=[{"text": self.system_prompt}],
            toolConfig={"tools": self.tools},
            inferenceConfig={"temperature": 0.7, "maxTokens": 2000}
        )
    
    def _handle_tool_use(self, tool_use: Dict) -> Dict:
        """Handle tool calls - think tool just logs, others would execute."""
        tool_name = tool_use["name"]
        tool_input = tool_use["input"]
        
        if tool_name == "think":
            # Log the thought but don't add to main context
            thought = tool_input["thought"]
            self.thought_log.append(thought)
            print(f"üí≠ THINKING: {thought[:200]}...")
            return {"status": "thought recorded"}
        
        # Handle other tools here
        return {"error": f"Unknown tool: {tool_name}"}
    
    def run(self, user_input: str) -> str:
        """Run the agent with user input."""
        self.messages.append({
            "role": "user",
            "content": [{"text": user_input}]
        })
        
        while True:
            response = self._call_bedrock()
            stop_reason = response["stopReason"]
            output = response["output"]["message"]
            
            self.messages.append(output)
            
            if stop_reason == "end_turn":
                # Extract final text response
                for block in output["content"]:
                    if "text" in block:
                        return block["text"]
                return ""
            
            elif stop_reason == "tool_use":
                # Process tool calls
                tool_results = []
                for block in output["content"]:
                    if "toolUse" in block:
                        result = self._handle_tool_use(block["toolUse"])
                        tool_results.append({
                            "toolResult": {
                                "toolUseId": block["toolUse"]["toolUseId"],
                                "content": [{"text": json.dumps(result)}]
                            }
                        })
                
                self.messages.append({"role": "user", "content": tool_results})
            else:
                break
        
        return "Agent stopped unexpectedly"
    
    def get_thoughts(self) -> List[str]:
        """Get all recorded thoughts."""
        return self.thought_log

## 3. Demo: Problem Solving with Think Tool

In [None]:
# Create an agent with instructions to use the think tool
system_prompt = """You are a helpful assistant that thinks carefully before responding.

## Using the think tool
Before answering complex questions, use the think tool to:
- Break down the problem into parts
- Consider different approaches
- Verify your reasoning
- Check for edge cases

Always think before giving a final answer on complex topics."""

agent = ThinkingAgent(system_prompt)

# Test with a complex question
question = """I'm designing a caching strategy for an e-commerce API. 
We have product listings, user sessions, and shopping carts.
What should I cache and what invalidation strategy should I use?"""

print("üë§ USER:", question)
print("\n" + "="*60 + "\n")

response = agent.run(question)

print("\n" + "="*60)
print("ü§ñ FINAL RESPONSE:")
print(response)

print("\n" + "="*60)
print("üìù THOUGHT LOG:")
for i, thought in enumerate(agent.get_thoughts(), 1):
    print(f"\n--- Thought {i} ---")
    print(thought)

## 4. Policy-Heavy Environment

The think tool shines in environments with complex rules. Here's a customer service example.

In [None]:
# Customer service tools
CUSTOMER_TOOLS = [
    {
        "toolSpec": {
            "name": "get_order",
            "description": "Get order details by order ID",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "order_id": {"type": "string", "description": "The order ID"}
                    },
                    "required": ["order_id"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "process_refund",
            "description": "Process a refund for an order",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "order_id": {"type": "string"},
                        "amount": {"type": "number"},
                        "reason": {"type": "string"}
                    },
                    "required": ["order_id", "amount", "reason"]
                }
            }
        }
    }
]

# Mock order database
ORDERS = {
    "ORD-123": {
        "status": "delivered",
        "total": 150.00,
        "items": ["Widget A", "Widget B"],
        "delivery_date": "2025-11-25",
        "customer_tier": "gold"
    }
}

In [None]:
# Policy-aware system prompt with think tool guidance
POLICY_PROMPT = """You are a customer service agent for an e-commerce company.

## Refund Policy
- Full refunds within 30 days of delivery
- 50% refund between 30-60 days
- No refunds after 60 days
- Gold/Platinum customers get extended 90-day full refund window
- Damaged items always get full refund regardless of time

## Using the think tool
Before processing ANY refund, you MUST use the think tool to:
1. List the specific policy rules that apply
2. Check the order details against each rule
3. Calculate the correct refund amount
4. Verify the decision is compliant

Example think tool usage:
<think_example>
Order ORD-123 refund request analysis:
- Order status: delivered
- Delivery date: 2025-11-25 (5 days ago)
- Customer tier: gold
- Applicable rules:
  * Within 30 days? YES (5 days)
  * Gold customer? YES (90-day window applies)
- Decision: Full refund of $150.00 is compliant
</think_example>

Never process a refund without thinking through the policy first."""

In [None]:
class PolicyAgent(ThinkingAgent):
    """Agent that handles policy-heavy decisions."""
    
    def _handle_tool_use(self, tool_use: Dict) -> Dict:
        tool_name = tool_use["name"]
        tool_input = tool_use["input"]
        
        if tool_name == "think":
            thought = tool_input["thought"]
            self.thought_log.append(thought)
            print(f"üí≠ POLICY CHECK: {thought[:300]}...")
            return {"status": "thought recorded"}
        
        elif tool_name == "get_order":
            order_id = tool_input["order_id"]
            if order_id in ORDERS:
                print(f"üì¶ Retrieved order: {order_id}")
                return ORDERS[order_id]
            return {"error": "Order not found"}
        
        elif tool_name == "process_refund":
            print(f"üí∞ Processing refund: ${tool_input['amount']} for {tool_input['order_id']}")
            return {"status": "refund_processed", "confirmation": "REF-" + tool_input["order_id"]}
        
        return {"error": f"Unknown tool: {tool_name}"}

# Create policy-aware agent
policy_agent = PolicyAgent(POLICY_PROMPT, CUSTOMER_TOOLS)

# Test refund request
request = "I'd like a refund for order ORD-123. The product just wasn't what I expected."

print("üë§ CUSTOMER:", request)
print("\n" + "="*60 + "\n")

response = policy_agent.run(request)

print("\n" + "="*60)
print("ü§ñ AGENT RESPONSE:")
print(response)

print("\nüìã POLICY REASONING:")
for thought in policy_agent.get_thoughts():
    print(thought)

## 5. Extended Scratchpad Pattern

For complex multi-step tasks, use a more structured scratchpad with multiple storage areas.

In [None]:
# Extended scratchpad tools
SCRATCHPAD_TOOLS = [
    {
        "toolSpec": {
            "name": "note",
            "description": "Save a note to the scratchpad for later reference",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "category": {
                            "type": "string",
                            "enum": ["facts", "decisions", "questions", "todos"],
                            "description": "Category for the note"
                        },
                        "content": {"type": "string", "description": "The note content"}
                    },
                    "required": ["category", "content"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "recall",
            "description": "Recall notes from a specific category",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "category": {
                            "type": "string",
                            "enum": ["facts", "decisions", "questions", "todos", "all"]
                        }
                    },
                    "required": ["category"]
                }
            }
        }
    }
]

In [None]:
class ScratchpadAgent:
    """Agent with categorized scratchpad for complex reasoning."""
    
    def __init__(self, system_prompt: str, tools: List[Dict] = None):
        self.system_prompt = system_prompt
        self.tools = SCRATCHPAD_TOOLS + (tools or [])
        self.messages = []
        self.scratchpad = {
            "facts": [],
            "decisions": [],
            "questions": [],
            "todos": []
        }
    
    def _call_bedrock(self) -> Dict:
        return bedrock.converse(
            modelId=MODEL_ID,
            messages=self.messages,
            system=[{"text": self.system_prompt}],
            toolConfig={"tools": self.tools},
            inferenceConfig={"temperature": 0.7, "maxTokens": 2000}
        )
    
    def _handle_tool_use(self, tool_use: Dict) -> Dict:
        tool_name = tool_use["name"]
        tool_input = tool_use["input"]
        
        if tool_name == "note":
            category = tool_input["category"]
            content = tool_input["content"]
            self.scratchpad[category].append(content)
            print(f"üìù [{category.upper()}]: {content[:100]}...")
            return {"status": "noted", "category": category}
        
        elif tool_name == "recall":
            category = tool_input["category"]
            if category == "all":
                return self.scratchpad
            return {category: self.scratchpad.get(category, [])}
        
        return {"error": f"Unknown tool: {tool_name}"}
    
    def run(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": [{"text": user_input}]})
        
        while True:
            response = self._call_bedrock()
            stop_reason = response["stopReason"]
            output = response["output"]["message"]
            self.messages.append(output)
            
            if stop_reason == "end_turn":
                for block in output["content"]:
                    if "text" in block:
                        return block["text"]
                return ""
            
            elif stop_reason == "tool_use":
                tool_results = []
                for block in output["content"]:
                    if "toolUse" in block:
                        result = self._handle_tool_use(block["toolUse"])
                        tool_results.append({
                            "toolResult": {
                                "toolUseId": block["toolUse"]["toolUseId"],
                                "content": [{"text": json.dumps(result)}]
                            }
                        })
                self.messages.append({"role": "user", "content": tool_results})
            else:
                break
        
        return "Agent stopped"
    
    def get_scratchpad(self) -> Dict:
        return self.scratchpad

In [None]:
# Test the scratchpad agent
scratchpad_prompt = """You are a technical architect helping design systems.

Use the scratchpad tools to organize your thinking:
- note(facts): Record important facts and constraints
- note(decisions): Record design decisions and rationale
- note(questions): Record questions that need answers
- note(todos): Record action items
- recall: Review your notes when needed

Always organize your thoughts before giving recommendations."""

architect = ScratchpadAgent(scratchpad_prompt)

task = """Help me design a notification system for a mobile app. 
Requirements: push notifications, in-app notifications, email digests.
We have 1M daily active users and need to handle 10M notifications/day."""

print("üë§ USER:", task)
print("\n" + "="*60 + "\n")

response = architect.run(task)

print("\n" + "="*60)
print("ü§ñ RECOMMENDATION:")
print(response)

print("\nüìã SCRATCHPAD CONTENTS:")
for category, notes in architect.get_scratchpad().items():
    if notes:
        print(f"\n[{category.upper()}]")
        for note in notes:
            print(f"  ‚Ä¢ {note}")

## 6. With vs Without Think Tool

Let's compare agent behavior with and without the think tool.

In [None]:
def run_without_think(question: str) -> Dict:
    """Run without think tool - direct response."""
    response = bedrock.converse(
        modelId=MODEL_ID,
        messages=[{"role": "user", "content": [{"text": question}]}],
        system=[{"text": "You are a helpful assistant. Answer directly and concisely."}],
        inferenceConfig={"temperature": 0.7, "maxTokens": 1000}
    )
    return {
        "response": response["output"]["message"]["content"][0]["text"],
        "input_tokens": response["usage"]["inputTokens"],
        "output_tokens": response["usage"]["outputTokens"]
    }

def run_with_think(question: str) -> Dict:
    """Run with think tool - structured reasoning."""
    agent = ThinkingAgent(
        """You are a helpful assistant that thinks carefully.
        
Before answering, use the think tool to:
1. Identify the key aspects of the question
2. Consider different perspectives
3. Structure your response

Then provide a clear, well-reasoned answer."""
    )
    
    response = agent.run(question)
    return {
        "response": response,
        "thoughts": agent.get_thoughts(),
        "message_count": len(agent.messages)
    }

In [None]:
# Compare on a complex question
complex_question = """Should I use a monolithic or microservices architecture 
for a new e-commerce platform that needs to handle Black Friday traffic spikes?"""

print("üìä COMPARISON TEST")
print("="*60)
print(f"Question: {complex_question}")
print("="*60)

print("\nüîµ WITHOUT THINK TOOL:")
result_without = run_without_think(complex_question)
print(f"Response length: {len(result_without['response'])} chars")
print(f"Tokens: {result_without['input_tokens']} in, {result_without['output_tokens']} out")
print(f"\nResponse preview: {result_without['response'][:300]}...")

print("\n" + "="*60)

print("\nüü¢ WITH THINK TOOL:")
result_with = run_with_think(complex_question)
print(f"Response length: {len(result_with['response'])} chars")
print(f"Thoughts recorded: {len(result_with['thoughts'])}")
print(f"\nThoughts:")
for i, t in enumerate(result_with['thoughts'], 1):
    print(f"  {i}. {t[:150]}...")
print(f"\nResponse preview: {result_with['response'][:300]}...")

## Key Takeaways

‚úÖ **Think tool = scratchpad** - A place for reasoning that doesn't pollute main context

‚úÖ **Best for complex scenarios**:
- Tool output analysis (process before acting)
- Policy-heavy environments (verify compliance)
- Sequential decisions (each step builds on previous)

‚úÖ **Implementation tips**:
- Add domain-specific examples in the system prompt
- Use structured categories for complex tasks
- Log thoughts separately for debugging/analysis

‚úÖ **When NOT to use**:
- Simple, single-turn questions
- Non-sequential tool calls
- When extended thinking (prefill) is sufficient

**The key insight**: Sometimes the best way to improve context quality is to give the model a place to put its working notes that won't interfere with the main task.