# Introduction to Claude 3.7 Sonnet and Extended Thinking

This notebook provides an introduction to Anthropic's Claude 3.7 Sonnet model and its innovative "extended thinking" capability. We'll explore:

1. Overview of Claude 3.7 Sonnet's capabilities
2. Understanding extended thinking and how it works
3. Setting up Claude 3.7 in Amazon Bedrock
4. Comparing standard mode vs. extended thinking mode
5. Visualizing the thinking process

By the end of this notebook, you'll have a clear understanding of when and how to use extended thinking to improve your AI workflows.

In [1]:
!pip install -r requirements.txt -qU --disable-pip-version-check

In [None]:
# Import required libraries
import boto3
from botocore.config import Config
import json
import time
import pandas as pd
import matplotlib.pyplot as plt
from termcolor import colored
from IPython.display import display, Markdown, HTML

## 1. Overview of Claude 3.7 Sonnet's Capabilities

Claude 3.7 Sonnet represents Anthropic's most advanced model to date, released in February 2025. It introduces several key improvements over previous Claude models:

### Key Capabilities

- **Hybrid Reasoning Approach**: Claude 3.7 Sonnet can operate in both standard mode and "extended thinking" mode, allowing you to control when the model engages in deeper reasoning.

- **Increased Output Length**: Supports up to 64K output tokens (8x longer than previous models), with up to 128K tokens in preview.

- **Enhanced Computer Use**: Improved capabilities for computer interaction, including additional actions like scroll, wait, left mouse down/up, hold key, and triple click.

- **Improved Code Generation**: Leading performance on coding benchmarks, particularly with SWE-bench Verified.

- **Reasoning Budget Control**: When using the API, you can control exactly how much "thinking power" to allocate to a task, from the minimum required budget to much larger allocations for complex problems.

Claude 3.7 Sonnet excels at tasks requiring deep analysis, complex problem-solving, and multi-step reasoning - while maintaining the ability to provide quick responses when extended thinking isn't needed.

## 2. Understanding Extended Thinking

### What is Extended Thinking?

Extended thinking is a groundbreaking capability that allows Claude to solve complex problems through step-by-step reasoning that is visible to the user. Think of it like watching a mathematician work through a proof on a whiteboard, rather than just seeing the final answer.

### How Extended Thinking Works

When enabled, Claude 3.7 Sonnet follows a process similar to human problem-solving:

1. It first processes the task in an internal "scratchpad" - thinking through the problem step by step
2. This reasoning process is visible to you in the API response
3. After completing its reasoning, Claude provides a final answer informed by this thinking

### Extended Thinking vs. Chain of Thought

**Traditional Chain of Thought (CoT):**
- Requires specific prompting to elicit step-by-step reasoning
- The reasoning quality depends heavily on prompt engineering
- Reasoning and response are intermingled
- No control over reasoning depth

**Extended Thinking:**
- Explicitly enabled via API parameter
- Reasoning budget can be precisely controlled
- Reasoning appears in a separate field from the final response
- More thorough and structured reasoning

### Reasoning Budget

A key innovation of Claude 3.7 Sonnet is the ability to control the "reasoning budget" - the amount of tokens allocated to the thinking process:

- Minimum budget: 1,024 tokens
- Can be increased up to the model's 128K token limit
- Larger budgets allow for more thorough reasoning on complex problems

Think of the reasoning budget like allocating CPU time to a computational task - more complex tasks benefit from larger allocations.

In [None]:
## 3. Setting up Claude 3.7 in Amazon Bedrock

# Configure the AWS region
REGION = 'us-west-2'  # Change to your preferred region
config = Config(read_timeout=300)


# Initialize Bedrock clients
bedrock = boto3.client(
    service_name='bedrock',
    region_name=REGION,
    config=config
)

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name=REGION,
)

# Claude 3.7 Sonnet model ID
CLAUDE_37_SONNET_MODEL_ID = 'us.anthropic.claude-3-7-sonnet-20250219-v1:0'

# For comparison, Claude 3.5 Sonnet model ID
CLAUDE_35_SONNET_MODEL_ID = 'us.anthropic.claude-3-5-sonnet-20241022-v2:0'

> **Note**: In this lesson, we're showing you the fucntions below for clarity, but going forward we will make these utility functions imported as `claude_utils`. The `claude_utils.py` module contains helper functions for creating Bedrock clients, invoking Claude with or without extended thinking, and displaying responses.

In [None]:
def invoke_claude(
    prompt, 
    model_id=CLAUDE_37_SONNET_MODEL_ID, 
    enable_reasoning=False, 
    reasoning_budget=1024,
    temperature=0.7,
    max_tokens=1000
):
    """
    Invoke Claude with or without extended thinking.
    
    Args:
        prompt (str): The prompt to send to Claude
        model_id (str): The model ID to use
        enable_reasoning (bool): Whether to enable extended thinking
        reasoning_budget (int): Token budget for reasoning (min 1024)
        temperature (float): Temperature for generation (0.0-1.0)
        max_tokens (int): Maximum tokens to generate
        
    Returns:
        dict: The complete API response
    """
    # Create system prompt and messages
    system_prompt = [{"text": "You're a helpful AI assistant."}]
    
    messages = [
        {
            "role": "user",
            "content": [{"text": prompt}]
        }
    ]
    
    # Base request parameters
    request_params = {
        "modelId": model_id,
        "messages": messages,
        "system": system_prompt,
        "inferenceConfig": {
            "temperature": temperature,
            "maxTokens": max_tokens
        }
    }
    
    # Add reasoning configuration if enabled
    if enable_reasoning:
        # When using reasoning, temperature must be 1.0
        request_params["inferenceConfig"]["temperature"] = 1.0
        
        # Ensure maxTokens is greater than reasoning_budget
        if max_tokens <= reasoning_budget:
            # Make it just one token more than the reasoning budget
            adjusted_max_tokens = reasoning_budget + 1
            print(f"Info: Increasing maxTokens from {max_tokens} to {adjusted_max_tokens} to exceed reasoning budget")
            request_params["inferenceConfig"]["maxTokens"] = adjusted_max_tokens
        
        request_params["additionalModelRequestFields"] = {
            "reasoning_config": {
                "type": "enabled",
                "budget_tokens": reasoning_budget
            }
        }
    
    # Invoke the model
    start_time = time.time()
    response = bedrock_runtime.converse(**request_params)
    elapsed_time = time.time() - start_time
    
    # Add elapsed time to response for reference
    response["_elapsed_time"] = elapsed_time
    
    return response


def extract_response_content(response):
    """Extract the response content from Claude's API response"""
    if response.get('output', {}).get('message', {}).get('content'):
        content_blocks = response['output']['message']['content']
        for block in content_blocks:
            if 'text' in block:
                return block['text']
    return "No response content found"

def print_section_header(message):
    """Helper function to print a decorative section header"""
    width = 80
    decoration = colored("╔" + "═"*(width-2) + "╗", "blue")
    empty_line = colored("║" + " "*(width-2) + "║", "blue")
    
    # Center the message
    message_length = len(message)
    padding = (width - 2 - message_length) // 2
    message_line = colored("║", "blue") + " "*padding + colored(message, "cyan", attrs=["bold"]) + " "*(width - 2 - padding - message_length) + colored("║", "blue")
    
    bottom = colored("╚" + "═"*(width-2) + "╝", "blue")
    
    print("\n" + decoration)
    print(empty_line)
    print(message_line)
    print(empty_line)
    print(bottom + "\n")

def display_claude_response(response, show_reasoning=False):
    """Display Claude's response in a nicely formatted way with colors"""
    result = extract_response_content(response)
    
    # Calculate costs (approximate)
    input_tokens = response.get('usage', {}).get('inputTokens', 0)
    output_tokens = response.get('usage', {}).get('outputTokens', 0)
    total_tokens = response.get('usage', {}).get('totalTokens', 0)
    
    input_cost = input_tokens * 0.000003  # $3 per million tokens
    output_cost = output_tokens * 0.000015  # $15 per million tokens
    total_cost = input_cost + output_cost
    
    # Create styled HTML output
    metrics_html = f"""
    <div style="background-color: #f0f8ff; padding: 15px; border-radius: 5px; margin: 10px 0;">
        <h3 style="color: #2c5282; margin-top: 0;">📊 Response Metrics</h3>
        <p style="color: #4a5568;"><strong>⏱️ Time:</strong> {response.get('_elapsed_time', 0):.2f} seconds</p>
        <p style="color: #4a5568;"><strong>🔤 Tokens:</strong> {total_tokens:,} total 
           (<span style="color: #38a169;">{input_tokens:,} input</span>, 
            <span style="color: #805ad5;">{output_tokens:,} output</span>)</p>
        <p style="color: #4a5568;"><strong>💰 Estimated cost:</strong> 
           <span style="color: #e53e3e;">${total_cost:.5f}</span></p>
    </div>
    """
    
    response_html = f"""
    <div style="background-color: #fff5f5; padding: 15px; border-radius: 5px; margin: 10px 0;">
        <h3 style="color: #2c5282; margin-top: 0;">🤖 Claude's Response:</h3>
        <div style="color: #4a5568;">{result}</div>
    </div>
    """
    
    # Display the formatted output
    display(HTML(metrics_html))
    display(HTML(response_html))
    
    # Also print colored console output for non-notebook environments
    print(colored("\n=== Response Metrics ===", "blue", attrs=["bold"]))
    print(colored(f"Time: {response.get('_elapsed_time', 0):.2f} seconds", "cyan"))
    print(colored(f"Tokens: {total_tokens:,} total ({input_tokens:,} input, {output_tokens:,} output)", "green"))
    print(colored(f"Estimated cost: ${total_cost:.5f}", "yellow"))
    print(colored("\n=== Claude's Response ===", "blue", attrs=["bold"]))
    print(colored(result, "white"))
    
    return result


## 4. Comparing Standard Mode vs. Extended Thinking Mode

Now let's see Claude 3.7 Sonnet in action, comparing its performance with and without extended thinking enabled. We'll test it on a few different types of problems to demonstrate when extended thinking provides the most benefit.

In [None]:
# Simple example - Capital city question
simple_prompt = "What is the capital of France?"

# Without extended thinking
print_section_header("Calling Claude 3.7 Sonnet WITHOUT extended thinking...")
standard_response = invoke_claude(
    simple_prompt,
    enable_reasoning=False,
    max_tokens=100
)

display_claude_response(standard_response, show_reasoning=False)

# With extended thinking
print_section_header("Calling Claude 3.7 Sonnet WITH extended thinking...")
reasoning_response = invoke_claude(
    simple_prompt,
    enable_reasoning=True,
    reasoning_budget=1024,  # Minimum budget
    max_tokens=100
)

display_claude_response(reasoning_response)

In [None]:
# More complex example - Sock drawer problem
complex_prompt = """
A man has 53 socks in his drawer: 21 identical blue, 15 identical black and 17 identical red.
The lights are out, and he is completely in the dark. How many socks must he take out to make 100 percent
certain he has at least one pair of black socks?
"""

# Without extended thinking
print("Calling Claude 3.7 Sonnet WITHOUT extended thinking...")
standard_complex_response = invoke_claude(
    complex_prompt,
    enable_reasoning=False,
    max_tokens=300
)

display_claude_response(standard_complex_response, show_reasoning=False)

# With extended thinking
print("\nCalling Claude 3.7 Sonnet WITH extended thinking...")
reasoning_complex_response = invoke_claude(
    complex_prompt,
    enable_reasoning=True,
    reasoning_budget=2048,  # A bit more budget for this problem
    max_tokens=2049  # Ensuring it's at least 1 token more than the reasoning budget
)

display_claude_response(reasoning_complex_response)

## 5. Analyzing Performance Across Different Reasoning Budgets

Now let's systematically analyze how different reasoning budgets affect Claude's performance on our sock drawer problem. We'll test four different budget sizes:

- **1,024 tokens**: The minimum required budget
- **2,048 tokens**: A moderate budget
- **4,096 tokens**: A generous budget
- **8,192 tokens**: A very large budget

For each budget size, we'll measure:
1. **Response time**: How long it takes to get a response
2. **Token usage**: Total tokens used (input + output)
3. **Cost**: Estimated cost based on token usage
4. **Efficiency**: Tokens processed per second

This analysis will help us find the optimal reasoning budget that balances cost, speed, and performance. Think of it like finding the right amount of CPU time to allocate to a computational task - too little and the model might not have enough "thinking space" to solve the problem effectively, too much and you're wasting resources.

In [None]:
# Compare different reasoning budgets for the complex problem
print("Testing different reasoning budgets on the sock drawer problem...")

results = []
budgets = [1024, 2048, 4096, 8192, 16384]  # Different budget sizes to test

for budget in budgets:
    print(f"\nTesting with reasoning budget: {budget} tokens")
    response = invoke_claude(
        complex_prompt,
        enable_reasoning=True,
        reasoning_budget=budget,
        max_tokens=300
    )
    
    # Extract metrics
    metrics = {
        'budget': budget,
        'time': response.get('_elapsed_time', 0),
        'input_tokens': response.get('usage', {}).get('inputTokens', 0),
        'output_tokens': response.get('usage', {}).get('outputTokens', 0),
        'total_tokens': response.get('usage', {}).get('totalTokens', 0),
        'cost': (response.get('usage', {}).get('inputTokens', 0) * 0.000003) + 
                (response.get('usage', {}).get('outputTokens', 0) * 0.000015)
    }
    
    results.append(metrics)
    
    # Display brief summary
    print(f"Time: {metrics['time']:.2f}s, Tokens: {metrics['total_tokens']}, Cost: ${metrics['cost']:.5f}")

# Create a DataFrame and display the results
performance_df = pd.DataFrame(results)
display(performance_df)

# Plot the results
plt.figure(figsize=(14, 8))

plt.subplot(2, 2, 1)
plt.plot(performance_df['budget'], performance_df['time'], marker='o')
plt.title('Time vs. Reasoning Budget')
plt.xlabel('Reasoning Budget (tokens)')
plt.ylabel('Time (seconds)')
plt.grid(True, alpha=0.3)

plt.subplot(2, 2, 2)
plt.plot(performance_df['budget'], performance_df['total_tokens'], marker='o')
plt.title('Total Tokens vs. Reasoning Budget')
plt.xlabel('Reasoning Budget (tokens)')
plt.ylabel('Total Tokens')
plt.grid(True, alpha=0.3)

plt.subplot(2, 2, 3)
plt.plot(performance_df['budget'], performance_df['cost'], marker='o')
plt.title('Cost vs. Reasoning Budget')
plt.xlabel('Reasoning Budget (tokens)')
plt.ylabel('Cost ($)')
plt.grid(True, alpha=0.3)

plt.subplot(2, 2, 4)
efficiency = performance_df['total_tokens'] / performance_df['time']
plt.plot(performance_df['budget'], efficiency, marker='o')
plt.title('Efficiency (Tokens/Second) vs. Reasoning Budget')
plt.xlabel('Reasoning Budget (tokens)')
plt.ylabel('Tokens per Second')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. When to Use Extended Thinking

Based on our experiments and performance analysis, we can draw some conclusions about when to use extended thinking:

### Best use cases for extended thinking:

1. **Complex reasoning tasks**: Math problems, logic puzzles, and multi-step reasoning benefit significantly from extended thinking.

2. **Problems requiring exhaustive analysis**: When Claude needs to consider many possibilities or edge cases.

3. **When accuracy is critical**: Extended thinking generally improves accuracy on challenging problems by reducing the chance of reasoning errors.

4. **Transparency requirements**: When you need to see Claude's reasoning process to verify its approach.

### When standard mode might be sufficient:

1. **Simple factual queries**: For straightforward questions like "What is the capital of France?", extended thinking adds cost without significant benefit.

2. **Creative tasks**: Creative writing, summarization, and other content generation tasks may not benefit as much from extended thinking.

3. **Time-sensitive applications**: If response speed is critical, standard mode provides faster responses.

4. **Cost-sensitive applications**: Extended thinking increases token usage and therefore cost.

### Finding the right reasoning budget:

The ideal reasoning budget depends on the complexity of your task:

- **Simple reasoning tasks**: 1,024-2,048 tokens
- **Moderate complexity**: 2,048-4,096 tokens
- **Complex problems**: 4,096-8,192 tokens
- **Very complex problems**: 8,192+ tokens

As we observed in our performance analysis, there's often a "sweet spot" where:
- Too small a budget may not give Claude enough space to solve complex problems
- Beyond a certain point, larger budgets show diminishing returns while increasing cost and latency
- The efficiency (tokens per second) tends to peak at moderate budget sizes before declining

Our performance charts demonstrate this tradeoff visually, helping you determine the optimal budget for your specific use cases.

In the next notebook, we'll explore a more systematic framework for determining when to use extended thinking and how to optimize the reasoning budget for different task types.