# Extended Context Window Demonstration

This notebook demonstrates the **Extended Context Window** feature for AWS Bedrock's Claude Sonnet 4 model, which supports up to **1 million tokens** in a single request.

## Key Features

- üöÄ **Simple Flag**: Use `enable_extended_context=True` for easy activation
- üîß **Manual Control**: Pass custom `additionalModelRequestFields` for fine-grained control
- üìä **Token Usage**: Monitor input/output token consumption
- üîÑ **Automatic Fallback**: Graceful handling when extended context isn't supported
- üéØ **Parameter Tracking**: Learn which model/region combinations support which parameters

## ‚ö†Ô∏è Important Notes

- **Beta Feature**: Extended context is a beta service as defined in AWS Service Terms
- **Model Support**: Currently only Claude Sonnet 4 supports 1M token context
- **Pricing**: Separate pricing applies for extended context usage
- **Quotas**: Separate service quotas apply
- **Region Availability**: Not all regions may support this beta feature

## Prerequisites

- AWS credentials configured with Bedrock access
- Access to Claude Sonnet 4 model in your AWS account
- Sufficient quota for extended context usage

## Setup and Imports

In [1]:
import sys
import json
from pathlib import Path
import logging

# Add the src directory to path for imports
sys.path.append(str(Path.cwd().parent / "src"))

# Import the LLMManager and related classes
from bestehorn_llmmanager import LLMManager, ParallelLLMManager
from bestehorn_llmmanager.bedrock.models.llm_manager_structures import (
    AuthConfig, RetryConfig, AuthenticationType, RetryStrategy
)
from bestehorn_llmmanager.bedrock.models.model_specific_structures import ModelSpecificConfig
from bestehorn_llmmanager.bedrock.models.parallel_structures import BedrockConverseRequest
from bestehorn_llmmanager.bedrock.tracking.parameter_compatibility_tracker import (
    ParameterCompatibilityTracker
)

# Configure logging for better visibility
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

print("‚úÖ Imports successful!")
print(f"üìÅ Working directory: {Path.cwd()}")

‚úÖ Imports successful!
üìÅ Working directory: d:\Users\bestem\Code Workspace\LLMManager\notebooks


## Helper Functions

In [2]:
def display_response(response, title="Response"):
    """Display a formatted response from LLMManager."""
    print(f"\n{title}")
    print("=" * len(title))
    
    if response.success:
        print(f"‚úÖ Success: {response.success}")
        print(f"ü§ñ Model: {response.model_used}")
        print(f"üåç Region: {response.region_used}")
        print(f"‚è±Ô∏è  Total Duration: {response.total_duration_ms:.2f}ms")
        
        # Display content
        content = response.get_content()
        if content:
            print(f"\nüí¨ Response Content:")
            print("-" * 20)
            # Truncate long responses for readability
            if len(content) > 500:
                print(content[:500] + "...\n[truncated]")
            else:
                print(content)
        
        # Display usage statistics
        input_tokens = response.get_input_tokens()
        output_tokens = response.get_output_tokens()
        total_tokens = response.get_total_tokens()
        
        if total_tokens > 0:
            print(f"\nüìä Token Usage:")
            print(f"   Input Tokens: {input_tokens:,}")
            print(f"   Output Tokens: {output_tokens:,}")
            print(f"   Total Tokens: {total_tokens:,}")
        
        # Display parameter information
        if response.had_parameters_removed():
            print(f"\n‚ö†Ô∏è  Parameters Removed: {response.parameters_removed}")
            warnings = response.get_parameter_warnings()
            if warnings:
                print(f"   Warnings:")
                for warning in warnings:
                    print(f"      - {warning}")
    else:
        print(f"‚ùå Success: {response.success}")
        print(f"üîÑ Attempts: {len(response.attempts)}")
    
    if response.warnings:
        print(f"\n‚ö†Ô∏è  Warnings:")
        for warning in response.warnings:
            print(f"   - {warning}")

def create_large_text(size_kb: int = 100) -> str:
    """Create a large text string for testing extended context."""
    # Approximate: 1 token ‚âà 4 characters
    # 1 KB ‚âà 250 tokens
    base_text = "This is a sample text for testing extended context windows. "
    repetitions = (size_kb * 1024) // len(base_text)
    return base_text * repetitions

print("‚úÖ Helper functions defined!")

‚úÖ Helper functions defined!


## Example 1: Simple Extended Context with Flag üöÄ

The easiest way to enable extended context is using the `enable_extended_context=True` flag.
This automatically adds the required beta header for Claude Sonnet 4.

In [3]:
print("üöÄ Example 1: Simple Extended Context with Flag")
print("=" * 50)

# Initialize manager with Claude Sonnet 4
manager = LLMManager(
    models=["Claude Sonnet 4"],
    regions=["us-east-1"],
    auth_config=AuthConfig(
        auth_type=AuthenticationType.PROFILE,
        profile_name="default"
    )
)

# Create a large text (simulating a large document)
# For demo purposes, we'll use a moderate size
# In production, you could use up to ~1M tokens
large_text = create_large_text(size_kb=2500)  # 50kb ~ 12,500 tokens
print(f"üìÑ Created text of approximately {len(large_text):,} characters")
print(f"   Estimated tokens: ~{len(large_text) // 4:,}")

# Create message
messages = [
    {
        "role": "user",
        "content": [
            {
                "text": f"Please summarize the following text in 2-3 sentences:\n\n{large_text}"
            }
        ]
    }
]

try:
    # Use extended context with simple flag
    print("\nüîÑ Sending request with enable_extended_context=True...")
    response = manager.converse(
        messages=messages,
        enable_extended_context=True,
        inference_config={
            "maxTokens": 512,
            "temperature": 0.3
        }
    )
    
    display_response(response, "üöÄ Extended Context Response")
    
    # Show that extended context was used
    print("\n‚úÖ Extended context feature was successfully enabled!")
    print("   The model can now handle up to 1 million tokens.")

except Exception as e:
    print(f"‚ùå Error: {e}")
    print(f"   Type: {type(e).__name__}")
    print("\nüí° Troubleshooting:")
    print("   - Ensure you have access to Claude Sonnet 4")
    print("   - Check that your region supports the extended context beta")
    print("   - Verify your AWS credentials are configured correctly")

üöÄ Example 1: Simple Extended Context with Flag




üìÑ Created text of approximately 2,559,960 characters
   Estimated tokens: ~639,990

üîÑ Sending request with enable_extended_context=True...





üöÄ Extended Context Response
‚úÖ Success: True
ü§ñ Model: Claude Sonnet 4
üåç Region: us-east-1
‚è±Ô∏è  Total Duration: 27005.01ms

üí¨ Response Content:
--------------------
The provided text consists entirely of the same sentence repeated thousands of times: "This is a sample text for testing extended context windows." This appears to be a test document designed to evaluate how well AI systems can handle and process very long, repetitive content within their context window limitations. The repetitive nature suggests it's specifically created to test extended context processing capabilities rather than convey any meaningful information.

üìä Token Usage:
   Input Tokens: 469,349
   Output Tokens: 82
   Total Tokens: 469,431

   - Model 'Claude Sonnet 4' in region 'us-east-1' requires inference profile access. Using regional_cris profile.

‚úÖ Extended context feature was successfully enabled!
   The model can now handle up to 1 million tokens.


## Example 2: Manual additionalModelRequestFields üîß

For more control, you can manually specify `additionalModelRequestFields`.
This allows you to combine multiple beta features or use other model-specific parameters.

In [4]:
print("üîß Example 2: Manual additionalModelRequestFields")
print("=" * 50)

# Create a message
messages = [
    {
        "role": "user",
        "content": [
            {
                "text": "Explain the concept of extended context windows in large language models."
            }
        ]
    }
]

try:
    # Pass custom model-specific parameters directly
    print("\nüîÑ Sending request with manual additionalModelRequestFields...")
    response = manager.converse(
        messages=messages,
        additional_model_request_fields={
            "anthropic_beta": [
                "context-1m-2025-08-07"  # Extended context beta header
            ]
        },
        inference_config={
            "maxTokens": 500,
            "temperature": 0.5
        }
    )
    
    display_response(response, "üîß Manual Configuration Response")
    
    print("\n‚úÖ Successfully used manual additionalModelRequestFields!")
    print("   This approach gives you full control over model-specific parameters.")

except Exception as e:
    print(f"‚ùå Error: {e}")
    print(f"   Type: {type(e).__name__}")

üîß Example 2: Manual additionalModelRequestFields

üîÑ Sending request with manual additionalModelRequestFields...

üîß Manual Configuration Response
‚úÖ Success: True
ü§ñ Model: Claude Sonnet 4
üåç Region: us-east-1
‚è±Ô∏è  Total Duration: 11355.49ms

üí¨ Response Content:
--------------------
# Extended Context Windows in Large Language Models

## What is a Context Window?

A **context window** is the maximum amount of text (measured in tokens) that a language model can process and remember at one time. Think of it as the model's "working memory" - it determines how much previous conversation, document content, or input the model can reference when generating responses.

## Traditional Limitations

Early language models had relatively small context windows:
- **GPT-3**: ~4,000 tokens...
[truncated]

üìä Token Usage:
   Input Tokens: 20
   Output Tokens: 500
   Total Tokens: 520

‚úÖ Successfully used manual additionalModelRequestFields!
   This approach gives you full control

## Example 3: Token Usage Reporting üìä

Monitor token consumption to understand the cost and performance of extended context requests.

In [None]:
print("üìä Example 3: Token Usage Reporting")
print("=" * 40)

# Create texts of different sizes
test_sizes = [
    (100, "Small"),
    (500, "Medium"),
    (1000, "Large")
]

results = []

for size_kb, label in test_sizes:
    text = create_large_text(size_kb=size_kb)
    estimated_tokens = len(text) // 4
    
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "text": f"Provide a one-sentence summary of this text:\n\n{text}"
                }
            ]
        }
    ]
    
    try:
        print(f"\nüîÑ Testing {label} text (~{estimated_tokens:,} tokens)...")
        response = manager.converse(
            messages=messages,
            enable_extended_context=True,
            inference_config={"maxTokens": 100}
        )
        
        if response.success:
            results.append({
                "label": label,
                "estimated": estimated_tokens,
                "actual_input": response.get_input_tokens(),
                "output": response.get_output_tokens(),
                "total": response.get_total_tokens()
            })
            print(f"   ‚úÖ Input: {response.get_input_tokens():,} tokens")
            print(f"   ‚úÖ Output: {response.get_output_tokens():,} tokens")
    
    except Exception as e:
        print(f"   ‚ùå Error: {e}")
        results.append({
            "label": label,
            "estimated": estimated_tokens,
            "error": str(e)
        })

# Display summary
print("\nüìä Token Usage Summary")
print("=" * 60)
print(f"{'Size':<10} {'Estimated':<12} {'Actual Input':<15} {'Output':<10} {'Total':<10}")
print("-" * 60)
for result in results:
    if 'error' not in result:
        print(f"{result['label']:<10} {result['estimated']:<12,} "
              f"{result['actual_input']:<15,} {result['output']:<10,} "
              f"{result['total']:<10,}")
    else:
        print(f"{result['label']:<10} {result['estimated']:<12,} ERROR")

print("\nüí° Note: Extended context allows up to 1M tokens, but costs scale with usage.")

## Example 4: Handling Parameter Incompatibility üîÑ

The system automatically handles cases where extended context isn't supported,
falling back gracefully to standard context.

In [None]:
print("üîÑ Example 4: Handling Parameter Incompatibility")
print("=" * 50)

# Initialize manager with multiple models (some may not support extended context)
multi_model_manager = LLMManager(
    models=["Claude Sonnet 4", "Claude 3 Haiku"],
    regions=["us-east-1", "us-west-2"],
    auth_config=AuthConfig(
        auth_type=AuthenticationType.PROFILE,
        profile_name="default"
    ),
    retry_config=RetryConfig(
        max_retries=3,
        retry_strategy=RetryStrategy.MODEL_FIRST
    )
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "text": "What are the benefits of extended context windows in AI models?"
            }
        ]
    }
]

try:
    print("\nüîÑ Sending request with enable_extended_context=True...")
    print("   The system will try Claude Sonnet 4 first, then fall back if needed.")
    
    response = multi_model_manager.converse(
        messages=messages,
        enable_extended_context=True,
        inference_config={"maxTokens": 400}
    )
    
    display_response(response, "üîÑ Fallback Handling Response")
    
    # Check if parameters were removed
    if response.had_parameters_removed():
        print("\n‚ö†Ô∏è  Parameter Removal Detected:")
        print(f"   Removed parameters: {response.parameters_removed}")
        print(f"   Model used: {response.model_used}")
        print(f"   Region used: {response.region_used}")
        print("\nüí° The system automatically retried without extended context.")
    else:
        print("\n‚úÖ Extended context was supported by the selected model/region!")
        print(f"   Model: {response.model_used}")
        print(f"   Region: {response.region_used}")

except Exception as e:
    print(f"‚ùå Error: {e}")
    print(f"   Type: {type(e).__name__}")

## Example 5: ModelSpecificConfig for Reusable Configuration üéØ

Use `ModelSpecificConfig` to create reusable parameter configurations.

In [None]:
print("üéØ Example 5: ModelSpecificConfig")
print("=" * 35)

# Create reusable configuration
config = ModelSpecificConfig(
    enable_extended_context=True,
    custom_fields={
        # You can add other model-specific parameters here
    }
)

print("üìã Created ModelSpecificConfig:")
print(f"   enable_extended_context: {config.enable_extended_context}")
print(f"   custom_fields: {config.custom_fields}")

# Initialize manager with default config
config_manager = LLMManager(
    models=["Claude Sonnet 4"],
    regions=["us-east-1"],
    auth_config=AuthConfig(
        auth_type=AuthenticationType.PROFILE,
        profile_name="default"
    ),
    model_specific_config=config  # Set as default
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "text": "Explain how configuration objects improve code maintainability."
            }
        ]
    }
]

try:
    print("\nüîÑ Sending request (config applied automatically)...")
    # All requests use the config by default
    response = config_manager.converse(
        messages=messages,
        inference_config={"maxTokens": 300}
    )
    
    display_response(response, "üéØ Config-Based Response")
    
    print("\n‚úÖ ModelSpecificConfig was applied automatically!")
    print("   This approach is great for consistent parameter usage across requests.")

except Exception as e:
    print(f"‚ùå Error: {e}")
    print(f"   Type: {type(e).__name__}")

## Example 6: Querying Parameter Compatibility üîç

The system tracks which model/region combinations support which parameters.
You can query this information for optimization and debugging.

In [None]:
print("üîç Example 6: Querying Parameter Compatibility")
print("=" * 45)

# Get compatibility tracker
tracker = ParameterCompatibilityTracker.get_instance()

# Get statistics
stats = tracker.get_statistics()

print("\nüìä Parameter Compatibility Statistics:")
print(f"   Total tracked combinations: {stats.get('total_combinations', 0)}")
print(f"   Compatible combinations: {stats.get('compatible_count', 0)}")
print(f"   Incompatible combinations: {stats.get('incompatible_count', 0)}")

# Check specific combination
test_params = {"anthropic_beta": ["context-1m-2025-08-07"]}
is_incompatible = tracker.is_known_incompatible(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region="us-east-1",
    parameters=test_params
)

print(f"\nüîç Checking specific combination:")
print(f"   Model: Claude Sonnet 4")
print(f"   Region: us-east-1")
print(f"   Parameters: {test_params}")
print(f"   Known incompatible: {'‚ùå Yes' if is_incompatible else '‚úÖ No'}")

print("\nüí° The tracker learns from each request to optimize future attempts.")
print("   This helps skip known incompatible combinations during retry.")

## Summary and Best Practices üìù

### Key Takeaways

1. **Simple Flag**: Use `enable_extended_context=True` for easy activation
2. **Manual Control**: Use `additionalModelRequestFields` for fine-grained control
3. **Automatic Fallback**: System handles incompatibility gracefully
4. **Token Monitoring**: Always check token usage for cost management
5. **Configuration Objects**: Use `ModelSpecificConfig` for reusable settings

### Best Practices

- ‚úÖ **Monitor token usage** to understand costs
- ‚úÖ **Use multi-model configuration** for automatic fallback
- ‚úÖ **Check response warnings** to detect parameter removal
- ‚úÖ **Test with smaller contexts** before scaling to 1M tokens
- ‚úÖ **Verify region support** for beta features

### Common Pitfalls

- ‚ùå Assuming all regions support extended context
- ‚ùå Not monitoring token consumption
- ‚ùå Ignoring parameter removal warnings
- ‚ùå Using extended context when standard context is sufficient

### Additional Resources

- [AWS Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
- [Claude Model Documentation](https://docs.anthropic.com/)
- [LLMManager Documentation](../docs/forLLMConsumption.md)