# Context Summarization with Amazon Bedrock

**Context Summarization** = Boiling down an accrued context into a condensed summary to prevent context distraction.

## Why Summarize Context?

1. **Context Distraction**: Beyond ~100k tokens, models favor repeating past actions over novel synthesis
2. **Cost & Latency**: Smaller contexts = faster responses and lower costs
3. **Focus**: Preserves what matters, removes noise

**Reference**: [How to Fix Your Context](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html#context-summarization)

In [5]:
import boto3
import json
from typing import List, Dict

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = "us.anthropic.claude-sonnet-4-20250514-v1:0"

## 1. Basic Summarization Helper

In [6]:
def summarize_text(text: str, focus: str = None, max_tokens: int = 500) -> str:
    """Summarize text with optional focus area."""
    focus_instruction = f" Focus on: {focus}" if focus else ""
    
    response = bedrock.converse(
        modelId=MODEL_ID,
        messages=[{"role": "user", "content": [{"text": 
            f"Summarize this concisely, preserving key facts and decisions.{focus_instruction}\n\n{text}"
        }]}],
        inferenceConfig={"temperature": 0.3, "maxTokens": max_tokens}
    )
    return response['output']['message']['content'][0]['text']

## 2. Conversation Summarizer

Summarize conversation history when it grows too long.

In [7]:
class ConversationSummarizer:
    """Manages conversation with automatic summarization."""
    
    def __init__(self, system_prompt: str, max_messages: int = 10):
        self.system_prompt = system_prompt
        self.max_messages = max_messages
        self.messages = []
        self.summary = None
        
    def _summarize_history(self) -> str:
        """Create summary of conversation history."""
        history = "\n".join([
            f"{m['role'].upper()}: {m['content'][0]['text'][:200]}..." 
            for m in self.messages
        ])
        
        response = bedrock.converse(
            modelId=MODEL_ID,
            messages=[{"role": "user", "content": [{"text": 
                f"Summarize this conversation, preserving: key decisions, facts, user preferences, and current task state.\n\n{history}"
            }]}],
            inferenceConfig={"temperature": 0.3, "maxTokens": 400}
        )
        return response['output']['message']['content'][0]['text']
    
    def _maybe_compress(self):
        """Compress if messages exceed threshold."""
        if len(self.messages) >= self.max_messages:
            self.summary = self._summarize_history()
            # Keep only last 2 exchanges
            self.messages = self.messages[-4:]
            print(f"üì¶ Compressed! Summary: {self.summary[:100]}...")
    
    def chat(self, user_input: str) -> str:
        """Send message and get response."""
        self._maybe_compress()
        
        # Build system with summary if exists
        system = self.system_prompt
        if self.summary:
            system += f"\n\nPrevious conversation summary:\n{self.summary}"
        
        self.messages.append({"role": "user", "content": [{"text": user_input}]})
        
        response = bedrock.converse(
            modelId=MODEL_ID,
            messages=self.messages,
            system=[{"text": system}],
            inferenceConfig={"temperature": 0.7, "maxTokens": 1000}
        )
        
        answer = response['output']['message']['content'][0]['text']
        self.messages.append({"role": "assistant", "content": [{"text": answer}]})
        
        return answer
    
    def stats(self):
        return {"messages": len(self.messages), "has_summary": self.summary is not None}

In [8]:
# Demo: Conversation with auto-summarization
chat = ConversationSummarizer(
    "You are a helpful coding assistant.",
    max_messages=6  # Low threshold for demo
)

questions = [
    "What's the best way to handle errors in Python?",
    "Can you show me a try-except example?",
    "What about custom exceptions?",
    "How do I log errors properly?",
    "What logging levels should I use?"
]

for q in questions:
    print(f"\nüë§ {q}")
    response = chat.chat(q)
    print(f"ü§ñ {response[:150]}...")
    print(f"   Stats: {chat.stats()}")


üë§ What's the best way to handle errors in Python?
ü§ñ Here are the best practices for handling errors in Python:

## 1. Use Try-Except Blocks

```python
try:
    # Code that might raise an exception
    r...
   Stats: {'messages': 2, 'has_summary': False}

üë§ Can you show me a try-except example?
ü§ñ Here are several try-except examples from simple to more complex:

## Basic Try-Except

```python
try:
    number = int(input("Enter a number: "))
   ...
   Stats: {'messages': 4, 'has_summary': False}

üë§ What about custom exceptions?
ü§ñ Here's a comprehensive guide to creating and using custom exceptions in Python:

## 1. Basic Custom Exception

```python
class CustomError(Exception):...
   Stats: {'messages': 6, 'has_summary': False}

üë§ How do I log errors properly?
üì¶ Compressed! Summary: ## Conversation Summary

**Topic**: Python error handling best practices

**Key Decisions/Recommenda...
ü§ñ Here's a comprehensive guide to logging errors properly in Python:

## 1. 

## 3. Hierarchical Summarization

For very long documents: summarize chunks, then summarize summaries.

In [9]:
def chunk_text(text: str, chunk_size: int = 2000) -> List[str]:
    """Split text into chunks."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunks.append(' '.join(words[i:i + chunk_size]))
    return chunks

def hierarchical_summarize(text: str, focus: str = None) -> Dict:
    """Summarize long text hierarchically."""
    chunks = chunk_text(text)
    
    if len(chunks) == 1:
        return {"summary": summarize_text(text, focus), "chunks": 1, "levels": 1}
    
    # Level 1: Summarize each chunk
    chunk_summaries = [summarize_text(c, focus, max_tokens=300) for c in chunks]
    print(f"üìÑ Summarized {len(chunks)} chunks")
    
    # Level 2: Combine summaries
    combined = "\n\n".join([f"Section {i+1}: {s}" for i, s in enumerate(chunk_summaries)])
    final = summarize_text(combined, focus, max_tokens=500)
    
    return {
        "summary": final,
        "chunk_summaries": chunk_summaries,
        "chunks": len(chunks),
        "levels": 2
    }

In [10]:
# Demo with sample long text
long_text = """
Amazon Web Services (AWS) provides a comprehensive cloud computing platform. 
The platform includes compute services like EC2 for virtual servers and Lambda for serverless functions.
Storage options include S3 for object storage, EBS for block storage, and EFS for file systems.
Database services range from RDS for relational databases to DynamoDB for NoSQL.
Networking is handled through VPC, Route 53 for DNS, and CloudFront for CDN.
Security features include IAM for access management, KMS for encryption, and WAF for web protection.
Machine learning services include SageMaker for model training and Bedrock for foundation models.
Analytics services include Athena for queries, Redshift for warehousing, and Kinesis for streaming.
Developer tools include CodePipeline, CodeBuild, and CodeDeploy for CI/CD workflows.
Management tools include CloudWatch for monitoring, CloudFormation for infrastructure as code.
""" * 10  # Repeat to make it longer

result = hierarchical_summarize(long_text, focus="key AWS services")
print(f"\nüìä Result: {result['chunks']} chunks, {result['levels']} levels")
print(f"\nüìù Final Summary:\n{result['summary']}")


üìä Result: 1 chunks, 1 levels

üìù Final Summary:
## AWS Key Services Summary

AWS offers a comprehensive cloud platform with core services across multiple categories:

**Compute:** EC2 (virtual servers), Lambda (serverless functions)

**Storage:** S3 (object storage), EBS (block storage), EFS (file systems)

**Database:** RDS (relational), DynamoDB (NoSQL)

**Networking:** VPC, Route 53 (DNS), CloudFront (CDN)

**Security:** IAM (access management), KMS (encryption), WAF (web protection)

**Machine Learning:** SageMaker (model training), Bedrock (foundation models)

**Analytics:** Athena (queries), Redshift (data warehousing), Kinesis (streaming)

**Developer Tools:** CodePipeline, CodeBuild, CodeDeploy (CI/CD)

**Management:** CloudWatch (monitoring), CloudFormation (infrastructure as code)


## 4. Structured Context Manager

Maintain structured context with selective summarization.

In [11]:
class StructuredContextManager:
    """Manages context with separate sections that can be individually summarized."""
    
    def __init__(self):
        self.sections = {
            "goals": [],      # Never summarize - always keep full
            "facts": [],      # Summarize when large
            "history": [],    # Aggressively summarize
            "scratchpad": []  # Temporary, can be cleared
        }
        self.summaries = {}
        
    def add(self, section: str, content: str):
        """Add content to a section."""
        if section in self.sections:
            self.sections[section].append(content)
            
    def _summarize_section(self, section: str, threshold: int) -> str:
        """Summarize a section if it exceeds threshold."""
        content = "\n".join(self.sections[section])
        if len(content) > threshold:
            summary = summarize_text(content, max_tokens=300)
            self.summaries[section] = summary
            self.sections[section] = []  # Clear after summarizing
            return summary
        return content
    
    def compile_context(self) -> str:
        """Compile all sections into final context string."""
        # Summarize sections that need it
        self._summarize_section("facts", threshold=2000)
        self._summarize_section("history", threshold=1000)
        
        parts = []
        
        # Goals: always full
        if self.sections["goals"]:
            parts.append(f"## Goals\n" + "\n".join(self.sections["goals"]))
        
        # Facts: use summary if available
        facts = self.summaries.get("facts") or "\n".join(self.sections["facts"])
        if facts:
            parts.append(f"## Key Facts\n{facts}")
        
        # History: use summary if available
        history = self.summaries.get("history") or "\n".join(self.sections["history"])
        if history:
            parts.append(f"## History Summary\n{history}")
        
        return "\n\n".join(parts)
    
    def clear_scratchpad(self):
        """Clear temporary notes."""
        self.sections["scratchpad"] = []

In [12]:
# Demo: Structured context
ctx = StructuredContextManager()

# Add goals (never summarized)
ctx.add("goals", "Build a REST API for user management")
ctx.add("goals", "Use Python with FastAPI framework")

# Add facts (summarized when large)
ctx.add("facts", "Database: PostgreSQL on RDS")
ctx.add("facts", "Auth: JWT tokens with 1-hour expiry")
ctx.add("facts", "Deployment: ECS Fargate")

# Add history (aggressively summarized)
ctx.add("history", "Created project structure")
ctx.add("history", "Set up database models")
ctx.add("history", "Implemented user CRUD endpoints")
ctx.add("history", "Added authentication middleware")

print("üìã Compiled Context:")
print(ctx.compile_context())

üìã Compiled Context:
## Goals
Build a REST API for user management
Use Python with FastAPI framework

## Key Facts
Database: PostgreSQL on RDS
Auth: JWT tokens with 1-hour expiry
Deployment: ECS Fargate

## History Summary
Created project structure
Set up database models
Implemented user CRUD endpoints
Added authentication middleware


## 5. Progressive Summarization Agent

An agent that progressively summarizes as it works through a task.

In [13]:
class ProgressiveSummarizationAgent:
    """Agent that maintains a running summary of its work."""
    
    def __init__(self, task: str):
        self.task = task
        self.running_summary = f"Task: {task}"
        self.step_count = 0
        
    def execute_step(self, step_description: str) -> str:
        """Execute a step and update running summary."""
        self.step_count += 1
        
        # Execute the step
        response = bedrock.converse(
            modelId=MODEL_ID,
            messages=[{"role": "user", "content": [{"text": 
                f"Context:\n{self.running_summary}\n\nExecute this step: {step_description}"
            }]}],
            inferenceConfig={"temperature": 0.7, "maxTokens": 800}
        )
        result = response['output']['message']['content'][0]['text']
        
        # Update running summary (compress old + add new)
        self._update_summary(step_description, result)
        
        return result
    
    def _update_summary(self, step: str, result: str):
        """Update running summary with new step."""
        update_prompt = f"""Current summary:
{self.running_summary}

New step completed: {step}
Result: {result[:500]}

Create an updated summary that:
1. Preserves the original task
2. Keeps key decisions and findings
3. Notes current progress
4. Stays under 300 words"""
        
        response = bedrock.converse(
            modelId=MODEL_ID,
            messages=[{"role": "user", "content": [{"text": update_prompt}]}],
            inferenceConfig={"temperature": 0.3, "maxTokens": 400}
        )
        self.running_summary = response['output']['message']['content'][0]['text']
        
    def get_summary(self) -> str:
        return f"Steps completed: {self.step_count}\n\n{self.running_summary}"

In [14]:
# Demo: Progressive summarization
agent = ProgressiveSummarizationAgent("Design a caching strategy for an e-commerce API")

steps = [
    "Identify what data should be cached (products, user sessions, cart)",
    "Choose caching technology (Redis vs Memcached)",
    "Define cache invalidation strategy",
    "Plan cache warming approach"
]

for step in steps:
    print(f"\nüîß Step: {step}")
    result = agent.execute_step(step)
    print(f"‚úÖ Result: {result[:200]}...")

print("\n" + "="*60)
print("üìä FINAL SUMMARY")
print("="*60)
print(agent.get_summary())


üîß Step: Identify what data should be cached (products, user sessions, cart)
‚úÖ Result: # Caching Strategy: Data Identification for E-commerce API

## Primary Data Categories for Caching

### 1. **Product Data** üõçÔ∏è
**High Priority - Cache Aggressively**

- **Product catalogs and listings...

üîß Step: Choose caching technology (Redis vs Memcached)
‚úÖ Result: # Technology Selection: Redis vs Memcached Analysis

## Decision: **Redis Selected** ‚úÖ

### Comprehensive Comparison Analysis

| Feature | Redis | Memcached | E-commerce Requirement |
|---------|-----...

üîß Step: Define cache invalidation strategy
‚úÖ Result: # Cache Invalidation Strategy Design

## Cache Invalidation Patterns Implementation

### 1. Time-Based Expiration (TTL) - Primary Strategy
**Implementation across data types:**

```redis
# Product cat...

üîß Step: Plan cache warming approach
‚úÖ Result: # Cache Warming Strategy for E-commerce API

## Cache Warming Approach ‚úÖ

### 1. **Proactive Cache Warmin

## Key Takeaways

‚úÖ **Summarize proactively** - Don't wait until context is full  
‚úÖ **Preserve what matters** - Goals, decisions, key facts  
‚úÖ **Use structure** - Different sections need different treatment  
‚úÖ **Hierarchical approach** - For very long content, summarize in levels  
‚úÖ **Running summaries** - Update as you go, not all at once  

**When to use:**
- Long conversations approaching context limits
- Multi-step agent tasks
- Processing large documents
- When you notice quality degradation

**References:**
- [How to Fix Your Context](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html)
- [LangChain Context Engineering](https://github.com/langchain-ai/context_engineering)