# Optimizing Reasoning Budget Allocation

In this notebook, we'll explore how to optimize Claude 3.7 Sonnet's reasoning budget allocation to balance quality, cost, and performance. Building on what we learned in the previous two lessons, we'll develop practical strategies to:

1. **Dynamically allocate reasoning budgets** based on task requirements
2. **Visualize the tradeoffs** between budget size, response quality, and cost  
3. **Create reusable patterns** for different kinds of applications

By the end of this lesson, you'll have practical approaches for determining the right amount of "thinking power" to allocate for different tasks.

## Prerequisites

This notebook builds on concepts covered in Lessons 1 and 2:
- Understanding of Claude 3.7's extended thinking capability
- Familiarity with the Bedrock API for invoking Claude
- Task complexity classification framework

Let's begin by setting up our environment and importing the necessary libraries.

In [None]:
# Import required libraries
import boto3
import json
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown, HTML

# Import our utility functions from previous lessons
import claude_utils

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook", font_scale=1.2)

# Configure pandas display options
pd.set_option('display.max_colwidth', None)

#### Initialize Bedrock Clients:

In [None]:
# Set up the Bedrock clients using our utility module
REGION = 'us-west-2'  # Change to your preferred region
bedrock, bedrock_runtime = claude_utils.create_bedrock_clients(REGION)

# Claude 3.7 Sonnet model ID
CLAUDE_37_SONNET_MODEL_ID = 'us.anthropic.claude-3-7-sonnet-20250219-v1:0'

# Verify model availability
claude_utils.verify_model_availability(bedrock, CLAUDE_37_SONNET_MODEL_ID)

## Recap: The Reasoning Budget Concept

Before diving into optimization strategies, let's quickly recap what we've learned about Claude's reasoning budget:

- The **reasoning budget** is the number of tokens allocated for Claude's extended thinking process
- Minimum reasoning budget is **1,024 tokens**
- Larger budgets allow for more thorough reasoning on complex problems
- Extended thinking incurs costs as part of the output tokens ($15 per million tokens)

Think of the reasoning budget like allocating CPU time to a computational task - more complex tasks benefit from larger allocations, but there are diminishing returns beyond a certain point.

In this lesson, we'll focus on finding the "sweet spot" for different types of tasks - optimizing for both performance and cost-effectiveness.

## Understanding the Tradeoffs of Budget Allocation

Before we implement a dynamic budget allocation system, let's understand the key tradeoffs involved:

1. **Budget Size vs. Response Quality**: Larger budgets generally lead to more thorough reasoning and potentially better responses, but with diminishing returns
   
2. **Budget Size vs. Response Time**: Larger budgets increase processing time as Claude spends more time thinking
   
3. **Budget Size vs. Cost**: Larger budgets increase token usage and therefore cost

These tradeoffs create an optimization challenge - finding the right budget that balances quality, speed, and cost for a specific task.

Let's explore these tradeoffs by testing a range of reasoning budgets on a moderately complex problem and measuring the results.

#### Testing Different Budget Sizes

In [None]:
def test_reasoning_budgets(prompt, budget_sizes=[1024, 2048, 4096, 8192, 16384], max_tokens=2000):
    """
    Test different reasoning budget sizes on the same prompt and collect performance metrics
    
    Args:
        prompt (str): The prompt to test
        budget_sizes (list): List of reasoning budget sizes to test
        max_tokens (int): Maximum tokens for responses
        
    Returns:
        pd.DataFrame: Performance metrics for each budget size
    """
    results = []
    
    print(f"Testing prompt: {prompt[:100]}..." if len(prompt) > 100 else f"Testing prompt: {prompt}")
    print(f"Testing {len(budget_sizes)} different reasoning budget sizes...\n")
    
    for budget in budget_sizes:
        print(f"Testing reasoning budget: {budget} tokens")
        
        start_time = time.time()
        response = claude_utils.invoke_claude(
            bedrock_runtime,
            prompt, 
            CLAUDE_37_SONNET_MODEL_ID, 
            enable_reasoning=True,
            reasoning_budget=budget,
            max_tokens=max_tokens
        )
        elapsed_time = time.time() - start_time
        
        # Extract metrics
        input_tokens = response.get('usage', {}).get('inputTokens', 0)
        output_tokens = response.get('usage', {}).get('outputTokens', 0)
        total_tokens = response.get('usage', {}).get('totalTokens', 0)
        
        # Calculate costs (approximate)
        input_cost = input_tokens * 0.000003  # $3 per million tokens
        output_cost = output_tokens * 0.000015  # $5 per million tokens
        total_cost = input_cost + output_cost
        
        # Calculate efficiency (tokens per second)
        efficiency = total_tokens / elapsed_time if elapsed_time > 0 else 0
        
        # Store results
        results.append({
            'budget': budget,
            'elapsed_time': elapsed_time,
            'input_tokens': input_tokens,
            'output_tokens': output_tokens,
            'total_tokens': total_tokens,
            'input_cost': input_cost,
            'output_cost': output_cost,
            'total_cost': total_cost,
            'efficiency': efficiency,
            'response': claude_utils.extract_response_content(response)
        })
        
        print(f"Completed in {elapsed_time:.2f}s, {total_tokens} tokens, ${total_cost:.6f}\n")
    
    return pd.DataFrame(results)

# Test with a moderately complex problem
test_prompt = """
A retailer sells two products, Product A and Product B. Product A costs $20 to make and sells for $50. 
Product B costs $10 to make and sells for $25. The retailer has $2000 available for production costs, 
and warehouse space for at most 250 units in total.

Due to customer demand, they need to produce at least twice as many units of Product B as Product A.

What is the optimal production plan to maximize profit?
"""

# Run the test
budget_test_results = test_reasoning_budgets(test_prompt)

# Display a subset of the results (excluding the actual responses to keep the output manageable)
display_cols = ['budget', 'elapsed_time', 'total_tokens', 'total_cost', 'efficiency']
display(budget_test_results[display_cols])

#### Visualizing the Tradeoffs:

In [None]:
def visualize_budget_tradeoffs(df):
    """
    Create visualizations to show the tradeoffs between different budget sizes
    
    Args:
        df (pd.DataFrame): DataFrame with test results
    """
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. Response Time vs. Budget Size
    axes[0, 0].plot(df['budget'], df['elapsed_time'], marker='o', linestyle='-', linewidth=2)
    axes[0, 0].set_title('Response Time vs. Budget Size', fontsize=14)
    axes[0, 0].set_xlabel('Reasoning Budget (tokens)', fontsize=12)
    axes[0, 0].set_ylabel('Response Time (seconds)', fontsize=12)
    axes[0, 0].grid(True, alpha=0.3)
    
    # 2. Token Usage vs. Budget Size
    axes[0, 1].plot(df['budget'], df['total_tokens'], marker='o', linestyle='-', linewidth=2, color='green')
    axes[0, 1].set_title('Token Usage vs. Budget Size', fontsize=14)
    axes[0, 1].set_xlabel('Reasoning Budget (tokens)', fontsize=12)
    axes[0, 1].set_ylabel('Total Tokens Used', fontsize=12)
    axes[0, 1].grid(True, alpha=0.3)
    
    # 3. Cost vs. Budget Size
    axes[1, 0].plot(df['budget'], df['total_cost'], marker='o', linestyle='-', linewidth=2, color='red')
    axes[1, 0].set_title('Cost vs. Budget Size', fontsize=14)
    axes[1, 0].set_xlabel('Reasoning Budget (tokens)', fontsize=12)
    axes[1, 0].set_ylabel('Total Cost ($)', fontsize=12)
    axes[1, 0].grid(True, alpha=0.3)
    
    # 4. Efficiency vs. Budget Size
    axes[1, 1].plot(df['budget'], df['efficiency'], marker='o', linestyle='-', linewidth=2, color='purple')
    axes[1, 1].set_title('Efficiency vs. Budget Size', fontsize=14)
    axes[1, 1].set_xlabel('Reasoning Budget (tokens)', fontsize=12)
    axes[1, 1].set_ylabel('Efficiency (Tokens/Second)', fontsize=12)
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Return the budget size with the highest efficiency
    max_efficiency_idx = df['efficiency'].idxmax()
    max_efficiency_budget = df.iloc[max_efficiency_idx]['budget']
    
    print(f"The most efficient budget size is {max_efficiency_budget} tokens")
    print(f"This produced {df.iloc[max_efficiency_idx]['total_tokens']} tokens")
    print(f"In {df.iloc[max_efficiency_idx]['elapsed_time']:.2f} seconds")
    print(f"At a cost of ${df.iloc[max_efficiency_idx]['total_cost']:.6f}")

# Visualize the results
visualize_budget_tradeoffs(budget_test_results)

## Analyzing Response Quality

While the quantitative metrics (time, tokens, cost) are important, we also need to consider the **quality** of responses at different budget sizes. Let's examine the responses at the minimum budget (1,024 tokens) compared to the largest budget we tested.

This qualitative analysis helps us understand the real-world value we get from allocating larger reasoning budgets.

#### Comparing Response Quality:

In [None]:
# Display the response with the smallest budget
min_budget = min(budget_test_results['budget'])
min_budget_response = budget_test_results[budget_test_results['budget'] == min_budget]['response'].iloc[0]

# Display the response with the largest budget
max_budget = max(budget_test_results['budget'])
max_budget_response = budget_test_results[budget_test_results['budget'] == max_budget]['response'].iloc[0]

# Create a comparison display
print(f"Response with {min_budget} tokens budget:")
display(Markdown(min_budget_response))

print(f"\nResponse with {max_budget} tokens budget:")
display(Markdown(max_budget_response))

## Implementing Dynamic Budget Allocation

Now that we understand the tradeoffs involved, let's implement a dynamic budget allocation strategy that adjusts the reasoning budget based on task complexity.

Our approach will:
1. Classify the task complexity
2. Allocate a reasoning budget based on the complexity
3. Apply budget adjustments based on time sensitivity and cost constraints
4. Maintain a record of performance for continuous improvement

This creates a system that can adapt to different tasks while balancing performance and cost considerations.

## Using Task Complexity to Guide Budget Allocation

The first step in our dynamic budget allocation strategy is determining the complexity of the task. Instead of using a fixed budget for all tasks, we'll adjust the reasoning budget based on how complex the task is.

### The Task Complexity Classifier

We'll implement a `classify_task_complexity` function that:

1. Uses a smaller, more efficient model (Claude 3.5 Haiku) to quickly categorize the task
2. Classifies tasks into four complexity levels:
   - **Simple**: Basic factual questions, straightforward calculations
   - **Medium**: Multi-step reasoning, moderate analysis
   - **Complex**: Detailed analysis, constraint problems, system design
   - **Very Complex**: Advanced mathematical proofs, multi-stage system design

This classification serves as the foundation for our budget allocation strategy. Using a smaller model for this classification step is both faster and more cost-effective than using Claude 3.7 Sonnet for the entire process.

Think of this as the "triage" step in our workflow - similar to how a hospital quickly determines the urgency of each patient to allocate resources appropriately.

#### Task Complexity Classification

In [None]:
def classify_task_complexity(prompt, model_id='anthropic.claude-3-5-haiku-20241022-v1:0'):
    """
    Use a more efficient model to classify the complexity of a task
    
    Args:
        prompt (str): The prompt to classify
        model_id (str): Model ID to use for classification
        
    Returns:
        str: Complexity category ('simple', 'medium', 'complex', 'very_complex')
    """
    system_prompt = [
        {"text": """You are a task complexity classifier. Classify the complexity of the given task 
                   into one of these categories: 'simple', 'medium', 'complex', or 'very_complex'.
                   
                   Examples:
                   - simple: Basic factual questions, straightforward calculations
                   - medium: Multi-step reasoning, moderate analysis
                   - complex: Detailed analysis, constraint problems, system design
                   - very_complex: Advanced mathematical proofs, multi-stage system design
                   
                   Respond with only the category name, nothing else."""}
    ]
    
    messages = [
        {
            "role": "user",
            "content": [{"text": f"Classify the task complexity: {prompt}"}]
        }
    ]
    
    try:
        response = bedrock_runtime.converse(
            modelId=model_id,
            messages=messages,
            system=system_prompt,
            inferenceConfig={
                "temperature": 0,
                "maxTokens": 10  # Only need a short response
            }
        )
        
        # Extract the classification
        result = None
        if response.get('output', {}).get('message', {}).get('content'):
            content_blocks = response['output']['message']['content']
            for block in content_blocks:
                if 'text' in block:
                    result = block['text'].strip().lower()
                    break
        
        # Ensure valid category
        valid_categories = ['simple', 'medium', 'complex', 'very_complex']
        if result not in valid_categories:
            result = 'medium'  # Default if unclear
        
        return result
        
    except Exception as e:
        print(f"Error classifying complexity: {e}")
        return "medium"  # Default if error

## The Dynamic Budget Allocator

Now that we can classify task complexity, we need a systematic way to determine the appropriate reasoning budget. Our `DynamicBudgetAllocator` class provides this functionality.

### Design Philosophy

The Dynamic Budget Allocator follows these principles:

1. **Complexity-driven**: The primary factor determining budget size is task complexity
2. **Context-aware**: Adjusts allocations based on situation (time-sensitive, cost-constrained)
3. **Adaptive**: Tracks performance to improve allocation decisions over time
4. **Practical**: Balances theoretical ideals with real-world constraints

### How It Works

The allocator maintains default budget ranges for each complexity level and applies adjustments based on constraints:

- For **time-sensitive** situations, it reduces budgets to prioritize faster responses
- For **cost-constrained** scenarios, it limits budgets to control expenses
- For **simple tasks**, it avoids extended thinking altogether (using standard mode)

This approach gives us the flexibility to handle different scenarios while maintaining a consistent allocation logic.

Let's implement this class and see how it works in practice.

In [None]:
class DynamicBudgetAllocator:
    """
    Allocates reasoning budgets dynamically based on task complexity and constraints
    """
    def __init__(self):
        # Default budget ranges by complexity
        self.default_budgets = {
            'simple': 0,  # No extended thinking for simple tasks
            'medium': 2048,
            'complex': 4096,
            'very_complex': 8192
        }
        
        # Budget adjustments for time sensitivity
        self.time_sensitive_adjustments = {
            'simple': 0,
            'medium': 0,  # No extended thinking when time-sensitive
            'complex': -2048,  # Reduce budget for time-sensitive tasks
            'very_complex': -4096  # Significant reduction for time-sensitive tasks
        }
        
        # Performance tracking
        self.performance_history = {}
    
    def allocate_budget(self, prompt, time_sensitive=False, cost_constrained=False):
        """
        Allocate an appropriate reasoning budget for a task
        
        Args:
            prompt (str): The user prompt
            time_sensitive (bool): Whether the task is time-sensitive
            cost_constrained (bool): Whether to prioritize cost saving
            
        Returns:
            dict: Allocation decision including reasoning budget and strategy details
        """
        # Step 1: Classify task complexity
        complexity = classify_task_complexity(prompt)
        
        # Step 2: Get base budget for this complexity
        base_budget = self.default_budgets.get(complexity, 2048)
        
        # Step 3: Apply adjustments
        final_budget = base_budget
        
        # Apply time sensitivity adjustment
        if time_sensitive and complexity in self.time_sensitive_adjustments:
            final_budget += self.time_sensitive_adjustments[complexity]
        
        # Apply cost constraint adjustment (reduce by 50% if cost constrained)
        if cost_constrained and final_budget > 0:
            final_budget = max(1024, final_budget // 2)  # Minimum 1024 if using extended thinking
        
        # Step 4: Determine whether to use extended thinking
        use_extended_thinking = final_budget >= 1024
        
        # If not using extended thinking, set budget to 0
        if not use_extended_thinking:
            final_budget = 0
        
        # Step 5: Create allocation decision
        allocation = {
            'complexity': complexity,
            'use_extended_thinking': use_extended_thinking,
            'reasoning_budget': final_budget,
            'time_sensitive': time_sensitive,
            'cost_constrained': cost_constrained
        }
        
        return allocation
    
    def update_performance(self, allocation, elapsed_time, token_count, cost):
        """
        Update performance history for continuous learning
        
        Args:
            allocation (dict): The allocation decision
            elapsed_time (float): Time taken for response
            token_count (int): Total tokens used
            cost (float): Total cost
        """
        complexity = allocation['complexity']
        budget = allocation['reasoning_budget']
        
        if complexity not in self.performance_history:
            self.performance_history[complexity] = []
        
        self.performance_history[complexity].append({
            'budget': budget,
            'elapsed_time': elapsed_time,
            'token_count': token_count,
            'cost': cost,
            'timestamp': time.time()
        })

# Create an instance of our allocator
budget_allocator = DynamicBudgetAllocator()

In [None]:
def test_dynamic_allocation(prompts, allocator):
    """
    Test our dynamic budget allocator on a set of prompts
    
    Args:
        prompts (dict): Dictionary of prompt labels to prompt text
        allocator (DynamicBudgetAllocator): The budget allocator
        
    Returns:
        pd.DataFrame: Results of the test
    """
    results = []
    
    for label, prompt in prompts.items():
        print(f"\nTesting prompt: {label}")
        print(f"Prompt: {prompt[:100]}..." if len(prompt) > 100 else f"Prompt: {prompt}")
        
        # Get allocation for standard mode (not time-sensitive)
        standard_allocation = allocator.allocate_budget(prompt, time_sensitive=False)
        print(f"Standard mode allocation: {standard_allocation}")
        
        # Get allocation for time-sensitive mode
        time_sensitive_allocation = allocator.allocate_budget(prompt, time_sensitive=True)
        print(f"Time-sensitive allocation: {time_sensitive_allocation}")
        
        # Execute with the standard allocation
        print(f"\nExecuting with standard allocation...")
        start_time = time.time()
        
        response = claude_utils.invoke_claude(
            bedrock_runtime,
            prompt,
            CLAUDE_37_SONNET_MODEL_ID,
            enable_reasoning=standard_allocation['use_extended_thinking'],
            reasoning_budget=standard_allocation['reasoning_budget'],
            max_tokens=1000
        )
        
        elapsed_time = time.time() - start_time
        
        # Calculate metrics
        input_tokens = response.get('usage', {}).get('inputTokens', 0)
        output_tokens = response.get('usage', {}).get('outputTokens', 0)
        total_tokens = response.get('usage', {}).get('totalTokens', 0)
        total_cost = (input_tokens * 0.000003) + (output_tokens * 0.000015)
        
        # Update allocator's performance history
        allocator.update_performance(
            standard_allocation,
            elapsed_time,
            total_tokens,
            total_cost
        )
        
        # Store result
        results.append({
            'Prompt': label,
            'Complexity': standard_allocation['complexity'],
            'Use_Extended_Thinking': standard_allocation['use_extended_thinking'],
            'Reasoning_Budget': standard_allocation['reasoning_budget'],
            'Time_Sensitive_Budget': time_sensitive_allocation['reasoning_budget'],
            'Elapsed_Time': elapsed_time,
            'Total_Tokens': total_tokens,
            'Total_Cost': total_cost
        })
        
        print(f"Completed in {elapsed_time:.2f}s, {total_tokens} tokens, ${total_cost:.6f}")
    
    return pd.DataFrame(results)

# Test prompts of varying complexity
test_prompts = {
    "Simple_Fact": "What is the capital of France?",
    
    "Medium_Math": "If a rectangle has a perimeter of 30 units and a width of 5 units, what is its area?",
    
    "Complex_Analysis": """
    Analyze the advantages and disadvantages of implementing a universal basic income 
    in a developed economy. Consider economic, social, and political perspectives.
    """,
    
    "Very_Complex_Design": """
    Design a system for urban traffic management that optimizes traffic flow, reduces congestion,
    minimizes emissions, and adapts to changing conditions. Include sensing, data processing,
    decision-making components, and how they would interact.
    """
}

# Run the test
allocation_test_results = test_dynamic_allocation(test_prompts, budget_allocator)

# Display the results
display(allocation_test_results)

## Practical Strategies for Different Scenarios

Based on our experiments and analysis, here are some practical strategies for different scenarios:

### Time-Sensitive Applications
For applications where response time is critical (e.g., customer service chatbots, real-time assistants):

- Use standard mode (no extended thinking) for simple and medium complexity tasks
- Use minimal reasoning budgets (1,024-2,048 tokens) for complex tasks
- Consider a two-stage approach: quick response first, then deeper analysis if requested

### Depth-Critical Applications
For applications where reasoning quality and depth is the priority (e.g., research assistance, complex analysis):

- Use extended thinking for all but the simplest tasks
- Allocate generous reasoning budgets (4,096+ tokens) for complex and very complex tasks
- Consider progressive enhancement: start with medium budgets, increase if needed

### Cost-Sensitive Applications
For applications with tight budget constraints:

- Only use extended thinking for complex and very complex tasks
- Limit reasoning budgets to the most efficient sizes identified through testing
- Consider caching common responses to avoid repeated reasoning costs

### Balanced Approach
For general-purpose applications balancing all factors:

- Use our dynamic allocation system with complexity-based budgeting
- Track performance metrics over time to identify optimal budgets for different task types
- Adjust allocations based on user feedback and business requirements

#### Visualizing the Dynamic Allocation Strategy

In [None]:
def visualize_allocation_strategy():
    """
    Create a visualization of our dynamic allocation strategy
    """
    # Define complexity levels and scenarios
    complexities = ['Simple', 'Medium', 'Complex', 'Very Complex']
    scenarios = ['Standard', 'Time-Sensitive', 'Cost-Constrained']
    
    # Get budgets for each combination
    budgets = {}
    for scenario in scenarios:
        budgets[scenario] = []
        time_sensitive = scenario == 'Time-Sensitive'
        cost_constrained = scenario == 'Cost-Constrained'
        
        for complexity in complexities:
            # Create a sample prompt for each complexity
            if complexity == 'Simple':
                prompt = "What is the capital of France?"
            elif complexity == 'Medium':
                prompt = "Explain the greenhouse effect in simple terms."
            elif complexity == 'Complex':
                prompt = "Analyze the impacts of artificial intelligence on employment in the next decade."
            else:  # Very Complex
                prompt = "Design a system for managing autonomous vehicle traffic in a smart city."
            
            # Get allocation
            allocation = budget_allocator.allocate_budget(
                prompt, 
                time_sensitive=time_sensitive,
                cost_constrained=cost_constrained
            )
            
            budgets[scenario].append(allocation['reasoning_budget'])
    
    # Create the visualization
    fig, ax = plt.subplots(figsize=(12, 8))
    
    # Set up the bar positions
    bar_width = 0.25
    r1 = np.arange(len(complexities))
    r2 = [x + bar_width for x in r1]
    r3 = [x + bar_width for x in r2]
    
    # Create the bars
    ax.bar(r1, budgets['Standard'], width=bar_width, label='Standard', color='blue', alpha=0.7)
    ax.bar(r2, budgets['Time-Sensitive'], width=bar_width, label='Time-Sensitive', color='red', alpha=0.7)
    ax.bar(r3, budgets['Cost-Constrained'], width=bar_width, label='Cost-Constrained', color='green', alpha=0.7)
    
    # Add labels and legend
    ax.set_xlabel('Task Complexity', fontsize=14)
    ax.set_ylabel('Reasoning Budget (tokens)', fontsize=14)
    ax.set_title('Dynamic Budget Allocation Strategy', fontsize=16)
    ax.set_xticks([r + bar_width for r in range(len(complexities))])
    ax.set_xticklabels(complexities)
    ax.legend()
    
    # Add a horizontal line at 1024 tokens (minimum budget)
    ax.axhline(y=1024, color='gray', linestyle='--', alpha=0.7)
    ax.text(3.5, 1100, 'Minimum Budget (1,024 tokens)', fontsize=10)
    
    plt.tight_layout()
    plt.show()

# Visualize our allocation strategy
visualize_allocation_strategy()

## Case Study: Budget Optimization for a Complex Task

To apply what we've learned, let's walk through a case study of optimizing the reasoning budget for a complex task - developing a strategy for climate change mitigation.

This case study demonstrates how to:
1. Start with an appropriate budget based on task complexity
2. Evaluate the results against your specific requirements
3. Adjust as needed to find the optimal balance

The ideal budget will vary based on your specific priorities (quality vs. speed vs. cost), but this process provides a systematic approach to finding the right balance.

In [None]:
def run_case_study():
    """
    Run a case study on budget optimization for a complex task
    """
    # Define our complex task
    case_study_prompt = """
    Develop a comprehensive strategy for a mid-sized city (population 500,000) to reduce its carbon emissions
    by 50% by 2035. Consider transportation, buildings, energy generation, industry, and waste management.
    Include specific policy recommendations, technological solutions, financing mechanisms, and implementation timeline.
    """
    
    print("Case Study: Carbon Emission Reduction Strategy")
    print("-" * 80)
    print(f"Prompt: {case_study_prompt}")
    
    # Step 1: Classify the task
    complexity = classify_task_complexity(case_study_prompt)
    print(f"\nTask classified as: {complexity}")
    
    # Step 2: Get the recommended budget from our allocator
    allocation = budget_allocator.allocate_budget(case_study_prompt)
    recommended_budget = allocation['reasoning_budget']
    print(f"Recommended budget: {recommended_budget} tokens")
    
    # Step 3: Test a range of budgets around the recommendation
    test_budgets = [
        max(1024, recommended_budget // 2),  # Half (or minimum)
        recommended_budget,                   # Recommended
        min(16384, recommended_budget * 2)    # Double (or maximum)
    ]
    
    print(f"\nTesting budgets: {test_budgets}")
    
    results = []
    response_texts = {}
    
    for budget in test_budgets:
        print(f"\nTesting budget: {budget} tokens")
        
        start_time = time.time()
        response = claude_utils.invoke_claude(
            bedrock_runtime,
            case_study_prompt,
            CLAUDE_37_SONNET_MODEL_ID,
            enable_reasoning=True,
            reasoning_budget=budget,
            max_tokens=1500
        )
        elapsed_time = time.time() - start_time
        
        # Extract metrics
        input_tokens = response.get('usage', {}).get('inputTokens', 0)
        output_tokens = response.get('usage', {}).get('outputTokens', 0)
        total_tokens = response.get('usage', {}).get('totalTokens', 0)
        total_cost = (input_tokens * 0.000003) + (output_tokens * 0.000015)
        
        # Store response text
        response_text = claude_utils.extract_response_content(response)
        response_texts[budget] = response_text
        
        # Calculate tokens per second
        tokens_per_second = total_tokens / elapsed_time if elapsed_time > 0 else 0
        
        # Store results
        results.append({
            'Budget': budget,
            'Time (s)': elapsed_time,
            'Tokens': total_tokens,
            'Cost ($)': total_cost,
            'Tokens/Second': tokens_per_second
        })
        
        print(f"Completed in {elapsed_time:.2f}s, {total_tokens} tokens, ${total_cost:.6f}")
    
    # Display results table
    results_df = pd.DataFrame(results)
    display(results_df)
    
    # Plot the results
    fig, ax = plt.subplots(1, 3, figsize=(18, 5))
    
    # Time plot
    ax[0].plot([r['Budget'] for r in results], [r['Time (s)'] for r in results], 'o-', linewidth=2)
    ax[0].set_title('Time vs. Budget')
    ax[0].set_xlabel('Budget (tokens)')
    ax[0].set_ylabel('Time (seconds)')
    ax[0].grid(True, alpha=0.3)
    
    # Cost plot
    ax[1].plot([r['Budget'] for r in results], [r['Cost ($)'] for r in results], 'o-', linewidth=2, color='red')
    ax[1].set_title('Cost vs. Budget')
    ax[1].set_xlabel('Budget (tokens)')
    ax[1].set_ylabel('Cost ($)')
    ax[1].grid(True, alpha=0.3)
    
    # Efficiency plot
    ax[2].plot([r['Budget'] for r in results], [r['Tokens/Second'] for r in results], 'o-', linewidth=2, color='green')
    ax[2].set_title('Efficiency vs. Budget')
    ax[2].set_xlabel('Budget (tokens)')
    ax[2].set_ylabel('Tokens per Second')
    ax[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Show response previews
    for budget, text in response_texts.items():
        print(f"\nResponse preview ({budget} tokens budget):")
        print("-" * 80)
        # Display first 300 characters
        preview = text[:300] + "..." if len(text) > 300 else text
        print(preview)
    
    # Final recommendation
    most_efficient_idx = np.argmax([r['Tokens/Second'] for r in results])
    most_efficient_budget = results[most_efficient_idx]['Budget']
    
    fastest_idx = np.argmin([r['Time (s)'] for r in results])
    fastest_budget = results[fastest_idx]['Budget']
    
    print("\nRecommendations:")
    print(f"- For maximum efficiency: {most_efficient_budget} tokens budget")
    print(f"- For fastest response: {fastest_budget} tokens budget")
    print("- For optimal quality/cost balance: Review the response content and choose based on your requirements")

# Run the case study
run_case_study()

## Conclusion and Best Practices

In this notebook, we've explored how to optimize Claude 3.7 Sonnet's reasoning budget allocation to balance quality, cost, and performance. Here are the key takeaways:

### What We've Learned

1. **Budget Size Tradeoffs**: 
   - Larger budgets generally lead to more thorough reasoning
   - But also increase response time and cost
   - There's a point of diminishing returns for each task type

2. **Dynamic Allocation Strategy**:
   - Task complexity is the primary factor in budget allocation
   - Time sensitivity and cost constraints are important secondary factors
   - Different scenarios require different allocation strategies

3. **Finding the Optimal Budget**:
   - Start with a complexity-based recommendation
   - Test a range of budgets to find the balance of quality, time, and cost
   - Track performance metrics to refine your approach over time

### Best Practices

1. **For Production Systems**:
   - Implement task complexity classification as a first step
   - Use dynamic budget allocation based on use case requirements
   - Monitor and analyze performance to continuously optimize

2. **For Cost Optimization**:
   - Only use extended thinking when the complexity justifies it
   - Find the "efficiency sweet spot" for each task type
   - Consider caching common responses

3. **For Quality Optimization**:
   - Allocate larger budgets for complex tasks where quality is critical
   - Implement a feedback loop to identify when larger budgets are needed
   - Consider adjusting the prompt to focus the reasoning process

By implementing these practices, you can make the most of Claude 3.7 Sonnet's extended thinking capabilities while optimizing for your specific requirements.