# Extended Thinking with the Anthropic API

This notebook provides a comprehensive guide to leveraging Claude's extended thinking capabilities for complex reasoning tasks. Extended thinking gives Claude enhanced reasoning capabilities while providing transparency into its step-by-step thought process.

## What is Extended Thinking?

Extended thinking allows Claude to:
- Work through complex problems step-by-step
- Show its reasoning process in `thinking` content blocks
- Improve response quality for tasks requiring deep analysis
- Chain multiple tool calls with reasoning steps in between

## Supported Models

Extended thinking is available in:
- **Claude Opus 4.1** (`claude-opus-4-1-20250805`)
- **Claude Opus 4** (`claude-opus-4-20250514`)
- **Claude Sonnet 4** (`claude-sonnet-4-20250514`)
- **Claude Sonnet 3.7** (`claude-3-7-sonnet-20250219`)

## Setup and Installation

First, let's install the required packages and set up our environment.

In [None]:
# Install the Anthropic Python SDK
!pip install anthropic

In [None]:
import os
import json
from anthropic import Anthropic
from typing import Dict, Any, List
from dotenv import load_dotenv

# Load environment variables from .env file (if it exists)
load_dotenv()

# Set up your API key
# You can either set it as an environment variable or directly here
# Option 1: Environment variable (recommended)
# export ANTHROPIC_API_KEY="your-api-key"

# Option 2: Direct assignment (only for testing)
# api_key = "your-api-key-here"

# Initialize the client
client = Anthropic(
    # api_key=api_key  # Uncomment if not using environment variable
)

## Basic Extended Thinking Example

Let's start with a simple example showing how to enable extended thinking.

In [None]:
def basic_thinking_example():
    """Basic example of using extended thinking"""
    
    response = client.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=4096,
        messages=[
            {
                "role": "user",
                "content": "What is the sum of the first 50 prime numbers? Walk me through your approach."
            }
        ],
        thinking={
            "type": "enabled",
            "budget_tokens": 8192  # Allow up to 8k tokens for thinking
        }
    )
    
    return response

# Run the example
result = basic_thinking_example()
print("Response:")
for block in result.content:
    if block.type == "thinking":
        print(f"\n=== THINKING ===\n{block.text}\n")
    elif block.type == "text":
        print(f"\n=== FINAL ANSWER ===\n{block.text}")

## Understanding Token Budgets

The `budget_tokens` parameter determines how many tokens Claude can use for reasoning:
- **Minimum**: 1,024 tokens
- **Recommended starting point**: 8,192 tokens for moderate complexity
- **Complex tasks**: 16,384+ tokens
- **Very complex tasks**: 32,768+ tokens (consider batch processing)

In [None]:
def compare_thinking_budgets():
    """Compare results with different thinking budgets"""
    
    problem = """Solve this logic puzzle:
    Three friends (Alice, Bob, Charlie) each have a different pet (cat, dog, fish) 
    and live in different colored houses (red, blue, green).
    - The person with the cat doesn't live in the red house
    - Bob doesn't have the fish
    - Alice lives in the blue house
    - The person in the green house has a dog
    
    Who has which pet and lives in which house?"""
    
    budgets = [1024, 4096, 16384]
    results = {}
    
    for budget in budgets:
        print(f"\n{'='*50}")
        print(f"Testing with {budget} token budget")
        print('='*50)
        
        response = client.messages.create(
            model="claude-opus-4-1-20250805",
            max_tokens=1024,
            messages=[{"role": "user", "content": problem}],
            thinking={
                "type": "enabled",
                "budget_tokens": budget
            }
        )
        
        # Extract the final answer
        for block in response.content:
            if block.type == "text":
                print(f"Answer: {block.text[:200]}...")
                results[budget] = block.text
                break
    
    return results

# Uncomment to run (will use API credits)
# results = compare_thinking_budgets()

## Streaming Extended Thinking

For better user experience, you can stream responses including thinking blocks.

In [None]:
def stream_thinking_example():
    """Example of streaming with extended thinking"""
    
    stream = client.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=2048,
        messages=[
            {
                "role": "user",
                "content": "Explain the quicksort algorithm and implement it in Python."
            }
        ],
        thinking={
            "type": "enabled",
            "budget_tokens": 4096
        },
        stream=True
    )
    
    current_block_type = None
    
    for event in stream:
        if event.type == "content_block_start":
            block = event.content_block
            if block.type == "thinking":
                print("\n=== THINKING STARTS ===\n", end="")
                current_block_type = "thinking"
            elif block.type == "text":
                print("\n=== RESPONSE STARTS ===\n", end="")
                current_block_type = "text"
                
        elif event.type == "thinking_delta":
            print(event.delta.text, end="", flush=True)
            
        elif event.type == "content_block_delta":
            if hasattr(event.delta, 'text'):
                print(event.delta.text, end="", flush=True)
                
        elif event.type == "content_block_stop":
            print("\n")

# Uncomment to run
# stream_thinking_example()

## Extended Thinking with Tool Use

One of the powerful features is combining extended thinking with tool use. This allows Claude to reason about tool selection and results.

In [None]:
def thinking_with_tools_example():
    """Example combining extended thinking with tool use"""
    
    # Define a simple calculator tool
    tools = [
        {
            "name": "calculator",
            "description": "Perform mathematical calculations",
            "input_schema": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Mathematical expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        },
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    ]
    
    response = client.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=2048,
        tools=tools,
        messages=[
            {
                "role": "user",
                "content": "If I have 3 apples and buy 5 more, then give away 2, how many do I have?"
            }
        ],
        thinking={
            "type": "enabled",
            "budget_tokens": 4096
        },
        tool_choice={"type": "auto"}  # Let Claude decide whether to use tools
    )
    
    return response

# Process the response
def process_tool_response(response):
    """Process a response that may contain tool use"""
    
    for block in response.content:
        if block.type == "thinking":
            print(f"\n=== THINKING ===\n{block.text[:500]}...\n")
        elif block.type == "text":
            print(f"\n=== RESPONSE ===\n{block.text}")
        elif block.type == "tool_use":
            print(f"\n=== TOOL USE ===\nTool: {block.name}\nInput: {block.input}")

# Uncomment to run
# response = thinking_with_tools_example()
# process_tool_response(response)

## Interleaved Thinking (Claude 4 Models Only)

Interleaved thinking allows Claude to think between tool calls, enabling more sophisticated multi-step reasoning.

In [None]:
def interleaved_thinking_example():
    """Example of interleaved thinking with multiple tool calls"""
    
    # This requires the beta header for interleaved thinking
    client_with_beta = Anthropic(
        default_headers={
            "anthropic-beta": "interleaved-thinking-2025-05-14"
        }
    )
    
    tools = [
        {
            "name": "search_database",
            "description": "Search for information in a database",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "filters": {"type": "object"}
                },
                "required": ["query"]
            }
        },
        {
            "name": "analyze_data",
            "description": "Analyze data and return insights",
            "input_schema": {
                "type": "object",
                "properties": {
                    "data": {"type": "array"},
                    "analysis_type": {"type": "string"}
                },
                "required": ["data", "analysis_type"]
            }
        }
    ]
    
    response = client_with_beta.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=4096,
        tools=tools,
        messages=[
            {
                "role": "user",
                "content": "Find all customers who made purchases last month and analyze their buying patterns."
            }
        ],
        thinking={
            "type": "enabled",
            "budget_tokens": 16384  # Larger budget for complex multi-step reasoning
        }
    )
    
    return response

# Note: With interleaved thinking, Claude can:
# 1. Think about what data to search for
# 2. Call search_database tool
# 3. Think about the results
# 4. Decide to call analyze_data tool
# 5. Think about the analysis
# 6. Provide final insights

## Handling Multi-Turn Conversations

When using extended thinking in conversations, you must preserve thinking blocks when continuing with tool results.

In [None]:
def multi_turn_conversation():
    """Example of handling multi-turn conversations with thinking"""
    
    messages = []
    
    # First turn
    messages.append({
        "role": "user",
        "content": "I need help planning a trip. What should I consider?"
    })
    
    response1 = client.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=2048,
        messages=messages,
        thinking={
            "type": "enabled",
            "budget_tokens": 4096
        }
    )
    
    # Extract response for conversation history
    # Note: Thinking blocks are automatically stripped in subsequent turns
    assistant_message = {
        "role": "assistant",
        "content": []
    }
    
    for block in response1.content:
        if block.type == "text":
            assistant_message["content"].append({
                "type": "text",
                "text": block.text
            })
    
    messages.append(assistant_message)
    
    # Second turn
    messages.append({
        "role": "user",
        "content": "I'm thinking about visiting Japan in April. What's special about that time?"
    })
    
    response2 = client.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=2048,
        messages=messages,
        thinking={
            "type": "enabled",
            "budget_tokens": 4096
        }
    )
    
    return response1, response2

# Uncomment to run
# r1, r2 = multi_turn_conversation()
# print("First response:", r1.content[-1].text[:200])
# print("\nSecond response:", r2.content[-1].text[:200])

## Best Practices and Tips

### 1. Choosing the Right Token Budget

- **Simple logic problems**: 1,024 - 4,096 tokens
- **Moderate complexity** (multi-step math, basic coding): 4,096 - 8,192 tokens
- **Complex tasks** (algorithm design, deep analysis): 16,384 - 32,768 tokens
- **Very complex tasks**: 32,768+ tokens (consider batch processing)

### 2. When to Use Extended Thinking

Extended thinking is most beneficial for:
- Mathematical problems requiring step-by-step calculation
- Complex coding challenges
- Logic puzzles and reasoning tasks
- Multi-step analysis problems
- Tasks requiring careful consideration of multiple factors

### 3. Performance Considerations

- Extended thinking increases response time
- Streaming is required when `max_tokens` > 21,333
- For budgets > 32k tokens, use batch processing to avoid timeouts

### 4. Cost Optimization

- Start with minimum budget and increase as needed
- Monitor actual token usage vs. allocated budget
- Claude may not use the entire budget if not needed

In [None]:
def optimize_thinking_budget():
    """Helper function to find optimal thinking budget for a task"""
    
    def test_budget(task: str, budget: int) -> Dict[str, Any]:
        """Test a specific budget and return metrics"""
        response = client.messages.create(
            model="claude-opus-4-1-20250805",
            max_tokens=2048,
            messages=[{"role": "user", "content": task}],
            thinking={
                "type": "enabled",
                "budget_tokens": budget
            }
        )
        
        # Extract metrics from usage
        return {
            "budget": budget,
            "input_tokens": response.usage.input_tokens,
            "output_tokens": response.usage.output_tokens,
            "total_tokens": response.usage.input_tokens + response.usage.output_tokens,
            "response_quality": len(response.content[-1].text) if response.content else 0
        }
    
    # Test different budgets
    task = "Write a function to find the nth Fibonacci number using dynamic programming"
    budgets = [1024, 2048, 4096, 8192]
    
    results = []
    for budget in budgets:
        print(f"Testing budget: {budget} tokens...")
        metrics = test_budget(task, budget)
        results.append(metrics)
        print(f"  Output tokens used: {metrics['output_tokens']}")
        print(f"  Response length: {metrics['response_quality']} chars\n")
    
    return results

# Uncomment to run optimization test
# optimization_results = optimize_thinking_budget()

## Common Patterns and Use Cases

In [None]:
# Pattern 1: Complex Problem Solving
def solve_complex_problem(problem: str):
    """Template for solving complex problems with extended thinking"""
    
    return client.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=4096,
        messages=[
            {
                "role": "system",
                "content": "You are an expert problem solver. Break down complex problems systematically."
            },
            {
                "role": "user",
                "content": problem
            }
        ],
        thinking={
            "type": "enabled",
            "budget_tokens": 16384
        }
    )

# Pattern 2: Code Generation with Reasoning
def generate_code_with_reasoning(requirements: str):
    """Generate code with detailed reasoning about design choices"""
    
    return client.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=8192,
        messages=[
            {
                "role": "user",
                "content": f"""Create a Python implementation for the following requirements:
                {requirements}
                
                Explain your design decisions and implementation approach."""
            }
        ],
        thinking={
            "type": "enabled",
            "budget_tokens": 8192
        }
    )

# Pattern 3: Data Analysis with Step-by-Step Reasoning
def analyze_data_with_reasoning(data_description: str, analysis_goal: str):
    """Analyze data with transparent reasoning process"""
    
    return client.messages.create(
        model="claude-opus-4-1-20250805",
        max_tokens=4096,
        messages=[
            {
                "role": "user",
                "content": f"""Data: {data_description}
                
                Analysis Goal: {analysis_goal}
                
                Please analyze this data step by step and provide insights."""
            }
        ],
        thinking={
            "type": "enabled",
            "budget_tokens": 12288
        }
    )

## Error Handling and Edge Cases

In [None]:
def robust_thinking_request(prompt: str, budget: int = 8192, max_retries: int = 3):
    """Robust wrapper for extended thinking requests with error handling"""
    
    import time
    from anthropic import APIError, RateLimitError
    
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model="claude-opus-4-1-20250805",
                max_tokens=min(4096, budget - 1),  # Ensure max_tokens < budget_tokens
                messages=[{"role": "user", "content": prompt}],
                thinking={
                    "type": "enabled",
                    "budget_tokens": budget
                }
            )
            
            # Process response
            result = {
                "thinking": [],
                "response": "",
                "tokens_used": response.usage.output_tokens if hasattr(response, 'usage') else None
            }
            
            for block in response.content:
                if block.type == "thinking":
                    result["thinking"].append(block.text)
                elif block.type == "text":
                    result["response"] = block.text
            
            return result
            
        except RateLimitError as e:
            print(f"Rate limit hit. Waiting {2 ** attempt} seconds...")
            time.sleep(2 ** attempt)
            
        except APIError as e:
            print(f"API error: {e}")
            if attempt == max_retries - 1:
                raise
            
        except Exception as e:
            print(f"Unexpected error: {e}")
            raise
    
    return None

# Example usage
# result = robust_thinking_request(
#     "What is the optimal algorithm for finding the shortest path in a weighted graph?",
#     budget=8192
# )

## Summary and Key Takeaways

### Key Points to Remember:

1. **Extended thinking enhances Claude's reasoning** for complex tasks
2. **Token budgets** should match task complexity (1k-32k+ tokens)
3. **Thinking blocks are transparent** in Claude 3.7, summarized in Claude 4
4. **Interleaved thinking** (Claude 4 only) enables reasoning between tool calls
5. **Preserve thinking blocks** when continuing conversations with tool use
6. **Stream responses** for better UX, especially with large budgets
7. **Monitor token usage** to optimize costs and performance

### When to Use Extended Thinking:
✅ Complex mathematical problems  
✅ Multi-step reasoning tasks  
✅ Algorithm design and optimization  
✅ Deep analysis requiring systematic approach  
✅ Problems benefiting from transparent reasoning  

### When NOT to Use Extended Thinking:
❌ Simple factual queries  
❌ Basic text generation  
❌ Tasks where speed is critical  
❌ When you need deterministic outputs (thinking adds variability)  

---

**Reference**: This notebook is based on [Anthropic's Extended Thinking documentation](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking)