# Claude with Extended Thinking: A Comprehensive Tutorial

## What You'll Learn

In this notebook, we'll explore Claude's extended thinking capabilities - a powerful feature that gives Claude enhanced reasoning for complex tasks. We'll start with the basics and gradually build up to advanced use cases.

### Table of Contents
1. [Introduction to Extended Thinking](#introduction)
2. [Setting Up Your Environment](#setup)
3. [Basic Usage](#basic-usage)
4. [Understanding Thinking Blocks](#thinking-blocks)
5. [Advanced Features](#advanced-features)
6. [Best Practices](#best-practices)
7. [Real-World Examples](#examples)
8. [Performance and Cost Considerations](#performance)

<a id='introduction'></a>
## 1. Introduction to Extended Thinking

Extended thinking is a feature that allows Claude to "think" through complex problems step-by-step before providing a final answer. This is particularly useful for:

- üßÆ **Mathematical problems** requiring multi-step calculations
- üîç **Complex analysis** of documents or data
- üèóÔ∏è **Architecture decisions** in software development
- üéØ **Strategic planning** and decision-making

### How It Works

When extended thinking is enabled, Claude:
1. Creates internal "thinking" content blocks
2. Works through the problem systematically
3. Incorporates insights from this reasoning
4. Delivers a more thoughtful final response

<a id='setup'></a>
## 2. Setting Up Your Environment

Let's start by installing the necessary packages and setting up our API key.

In [1]:
# Install the Anthropic Python SDK
!pip install anthropic>=0.40.0

zsh:1: 0.40.0 not found


In [None]:
import os
import anthropic
from IPython.display import Markdown, display
import json
import time
from dotenv import load_dotenv

# Load environment variables from .env file (if it exists)
load_dotenv()

# Set up your API key
# You can either set it as an environment variable or directly here
# For security, we recommend using environment variables
api_key = os.getenv('ANTHROPIC_API_KEY')
if not api_key:
    api_key = input("Please enter your Anthropic API key: ")

client = anthropic.Anthropic(api_key=api_key)

### Supported Models

Extended thinking is supported in:
- **Claude Sonnet 4.5** (`claude-sonnet-4-5-20250929`) - Latest model with enhanced reasoning
- **Claude Haiku 4.5** (`claude-haiku-4-5-20251001`) - Fast model with thinking support
- **Claude Opus 4.1** (`claude-opus-4-1-20250805`) - Enhanced Opus with advanced reasoning
- **Claude Opus 4** (`claude-opus-4-20250514`)
- **Claude Sonnet 4** (`claude-sonnet-4-20250514`)
- **Claude Sonnet 3.7** (`claude-3-7-sonnet-20250219`)

**Note**: Claude 4+ models return **summarized thinking** (you're charged for full internal tokens), while Claude 3.7 returns full thinking output.

<a id='basic-usage'></a>
## 3. Basic Usage

Let's start with a simple example to see extended thinking in action.

In [5]:
def basic_thinking_example():
    """A simple example demonstrating extended thinking"""
    
    response = client.messages.create(
        model="claude-sonnet-4-5-20250929",  # Using latest model
        max_tokens=10000,
        thinking={
            "type": "enabled",
            "budget_tokens": 5000  # How many tokens Claude can use for thinking
        },
        messages=[{
            "role": "user",
            "content": "What is 27 * 453? Show me how you calculate this step by step."
        }]
    )
    
    # Process the response
    for block in response.content:
        if block.type == "thinking":
            print("ü§î Claude's Thinking Process (Summary):")
            print("-" * 50)
            print(block.thinking)
            print("-" * 50)
            print()
        elif block.type == "text":
            print("‚úÖ Final Answer:")
            print(block.text)

# Run the example
basic_thinking_example()

ü§î Claude's Thinking Process (Summary):
--------------------------------------------------
I need to calculate 27 * 453 step by step.

Let me break this down using the standard multiplication method.

27 * 453

I can break 27 into 20 + 7, or I can do it the traditional way by multiplying each digit.

Let me do it the traditional way:

```
    453
  √ó  27
  -----
```

First, multiply 453 by 7:
- 7 √ó 3 = 21 (write 1, carry 2)
- 7 √ó 5 = 35, plus 2 = 37 (write 7, carry 3)
- 7 √ó 4 = 28, plus 3 = 31 (write 31)

So 453 √ó 7 = 3,171

Next, multiply 453 by 20 (or by 2, then shift one position):
- 2 √ó 3 = 6
- 2 √ó 5 = 10 (write 0, carry 1)
- 2 √ó 4 = 8, plus 1 = 9

So 453 √ó 2 = 906, but since it's actually 453 √ó 20, we write it as 9,060

Now add them together:
```
    453
  √ó  27
  -----
  3,171  (453 √ó 7)
  9,060  (453 √ó 20)
  -----
 12,231
```

Let me verify: 3,171 + 9,060 = 12,231
--------------------------------------------------

‚úÖ Final Answer:
# 27 √ó 453

Let me solve this 

### Understanding the Parameters

- **`type: "enabled"`**: Turns on extended thinking
- **`budget_tokens`**: Maximum tokens Claude can use for reasoning
  - Minimum: 1,024 tokens
  - Standard tasks: 5,000-15,000 tokens
  - Complex tasks: 16,000-32,000 tokens
  - Very complex tasks: 32,000+ tokens (use batch processing to avoid timeouts)
- **`max_tokens`**: Must exceed `budget_tokens` to allow room for final response

**Important**: Claude may not use the entire budget - it only uses what's needed for the task.

<a id='thinking-blocks'></a>
## 4. Understanding Thinking Blocks

Let's explore how thinking blocks work and what information they contain.

In [6]:
def analyze_thinking_blocks():
    """Demonstrate the structure of thinking blocks"""
    
    response = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=15000,
        thinking={
            "type": "enabled",
            "budget_tokens": 8000
        },
        messages=[{
            "role": "user",
            "content": """I have a list of numbers: [15, 23, 8, 42, 16, 4, 30, 12].
            
            Please:
            1. Find the median
            2. Calculate the mean
            3. Identify any outliers using the IQR method
            4. Suggest what this data might represent"""
        }]
    )
    
    # Analyze the response structure
    print("üìä Response Structure Analysis")
    print("=" * 60)
    
    for i, block in enumerate(response.content):
        print(f"\nBlock {i + 1}:")
        print(f"  Type: {block.type}")
        
        if block.type == "thinking":
            # Claude 4+ models provide summarized thinking
            print(f"  Thinking Summary Length: {len(block.thinking)} characters")
            print(f"  Has Signature: {'Yes' if hasattr(block, 'signature') else 'No'}")
            print("\n  Thinking Content (first 500 chars):")
            print(f"  {block.thinking[:500]}...")
            print("\n  Note: This is a summary. Full thinking tokens were used internally.")
        elif block.type == "text":
            print(f"  Text Length: {len(block.text)} characters")
            print("\n  Final Response:")
            display(Markdown(block.text))

analyze_thinking_blocks()

üìä Response Structure Analysis

Block 1:
  Type: thinking
  Thinking Summary Length: 1579 characters
  Has Signature: Yes

  Thinking Content (first 500 chars):
  Let me work through this step by step.

Given list: [15, 23, 8, 42, 16, 4, 30, 12]

**1. Find the median**

First, I need to sort the list:
[4, 8, 12, 15, 16, 23, 30, 42]

There are 8 numbers (even count), so the median is the average of the 4th and 5th values.
4th value: 15
5th value: 16
Median = (15 + 16) / 2 = 31 / 2 = 15.5

**2. Calculate the mean**

Mean = sum of all values / count
Sum = 4 + 8 + 12 + 15 + 16 + 23 + 30 + 42 = 150
Count = 8
Mean = 150 / 8 = 18.75

**3. Identify outliers using...

  Note: This is a summary. Full thinking tokens were used internally.

Block 2:
  Type: text
  Text Length: 1141 characters

  Final Response:


# Analysis of Your Data: [15, 23, 8, 42, 16, 4, 30, 12]

## 1. **Median: 15.5**
First, sort the data: [4, 8, 12, 15, 16, 23, 30, 42]

With 8 values (even count), the median is the average of the 4th and 5th values:
- (15 + 16) √∑ 2 = **15.5**

## 2. **Mean: 18.75**
- Sum: 4 + 8 + 12 + 15 + 16 + 23 + 30 + 42 = 150
- Mean: 150 √∑ 8 = **18.75**

## 3. **Outlier Analysis (IQR Method)**

**Calculations:**
- Q1 (25th percentile): 10
- Q3 (75th percentile): 26.5
- IQR: 26.5 - 10 = 16.5
- Lower bound: 10 - (1.5 √ó 16.5) = -14.75
- Upper bound: 26.5 + (1.5 √ó 16.5) = 51.25

**Result: No outliers detected** 
All values fall within the range [-14.75, 51.25]

*(Note: 42 is the highest value and approaches the upper bound, but doesn't exceed it)*

## 4. **Possible Data Representations**

This data could represent:
- **Ages** of people in a group (mix of children and adults)
- **Wait times** in minutes at a service location
- **Test scores** or grades (out of 50 points)
- **Response times** in seconds for a task
- **Number of items** purchased or daily sales counts

The spread suggests moderate variability with one notably higher value (42).

### Key Points About Thinking Blocks

1. **Summarization (Claude 4+ models)**: Returns summarized thinking, not full reasoning
2. **Billing**: You're charged for the full internal thinking tokens, not just the summary
3. **Signature**: Each thinking block includes a cryptographic signature for verification
4. **Context Window**: Thinking blocks from previous turns are stripped and don't count toward your context window
5. **Preservation**: With tool use, thinking blocks must be preserved and passed back unchanged
6. **Security**: The signature field is opaque and should not be parsed or modified

<a id='advanced-features'></a>
## 5. Advanced Features

### 5.1 Streaming Responses

For better user experience, especially with longer thinking times, you can stream responses.

In [7]:
def stream_thinking_example():
    """Demonstrate streaming with extended thinking"""
    
    print("üåä Streaming Extended Thinking Example")
    print("=" * 60)
    
    with client.messages.stream(
        model="claude-sonnet-4-5-20250929",
        max_tokens=12000,
        thinking={"type": "enabled", "budget_tokens": 10000},
        messages=[{
            "role": "user",
            "content": """Design a simple REST API for a todo list application. 
            Include endpoints for CRUD operations and consider:
            - Authentication
            - Error handling
            - Data validation
            - Response formats"""
        }],
    ) as stream:
        current_block_type = None
        
        for event in stream:
            if event.type == "content_block_start":
                current_block_type = event.content_block.type
                if current_block_type == "thinking":
                    print("\nü§î Claude is thinking...", end="", flush=True)
                elif current_block_type == "text":
                    print("\n\n‚úÖ Final Response:\n", end="", flush=True)
            
            elif event.type == "content_block_delta":
                if event.delta.type == "thinking_delta":
                    # Show progress dots for thinking
                    # Note: Thinking deltas may arrive in "chunky" patterns
                    print(".", end="", flush=True)
                elif event.delta.type == "text_delta":
                    print(event.delta.text, end="", flush=True)
            
            elif event.type == "content_block_stop":
                if current_block_type == "thinking":
                    print(" Done thinking!")

stream_thinking_example()

üåä Streaming Extended Thinking Example

ü§î Claude is thinking............................................. Done thinking!


‚úÖ Final Response:
# Todo List REST API Design

## API Overview

Here's a complete REST API design with implementation examples:

## 1. API Specification

### Base URL
```
https://api.todoapp.com/v1
```

### Endpoints Summary

| Method | Endpoint | Description | Auth Required |
|--------|----------|-------------|---------------|
| POST | /auth/register | Register new user | No |
| POST | /auth/login | Login user | No |
| POST | /auth/refresh | Refresh token | Yes |
| GET | /todos | Get all todos | Yes |
| GET | /todos/:id | Get specific todo | Yes |
| POST | /todos | Create todo | Yes |
| PUT | /todos/:id | Update todo | Yes |
| PATCH | /todos/:id | Partial update | Yes |
| DELETE | /todos/:id | Delete todo | Yes |

## 2. Implementation Example (Node.js + Express)

### Project Structure
```
todo-api/
‚îú‚îÄ‚îÄ src/
‚îÇ   ‚îú‚îÄ‚îÄ controllers/
‚îÇ   ‚îÇ   ‚îú

### 5.2 Extended Thinking with Tool Use

Extended thinking can be combined with tool use for even more powerful applications.

In [8]:
def thinking_with_tools_example():
    """Demonstrate extended thinking with tool use"""
    
    # Define a simple calculator tool
    tools = [{
        "name": "calculator",
        "description": "Perform mathematical calculations",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Mathematical expression to evaluate"
                }
            },
            "required": ["expression"]
        }
    }]
    
    response = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=20000,
        thinking={
            "type": "enabled",
            "budget_tokens": 8000
        },
        tools=tools,
        # Note: Extended thinking only supports tool_choice: "auto" or "none"
        tool_choice={"type": "auto"},
        messages=[{
            "role": "user",
            "content": """I'm planning a party for 25 people. Each person will eat:
            - 3 slices of pizza (8 slices per pizza)
            - 2 sodas ($1.50 each)
            - 1 dessert ($3.00 each)
            
            Pizzas cost $12 each. Calculate the total cost and quantities needed."""
        }]
    )
    
    print("üéâ Party Planning with Extended Thinking")
    print("=" * 60)
    
    for block in response.content:
        if block.type == "thinking":
            print("\nü§î Planning Process:")
            print(block.thinking[:1000] + "...\n")
        elif block.type == "tool_use":
            print(f"\nüîß Using tool: {block.name}")
            print(f"   Input: {block.input}")
        elif block.type == "text":
            print("\nüìã Final Plan:")
            display(Markdown(block.text))
    
    print("\n‚ö†Ô∏è Important Notes:")
    print("- Extended thinking only works with tool_choice: 'auto' or 'none'")
    print("- The entire assistant turn operates in a single thinking mode")
    print("- When continuing conversations, preserve thinking blocks unchanged")

# Note: This example shows the structure but won't execute the tool
# In a real application, you'd handle tool execution and pass results back
thinking_with_tools_example()

üéâ Party Planning with Extended Thinking

ü§î Planning Process:
Let me break down this party planning problem:

1. **Pizza calculation:**
   - 25 people √ó 3 slices per person = 75 slices needed
   - 8 slices per pizza
   - Number of pizzas needed = 75 √∑ 8 = 9.375, so I need to round up to 10 pizzas
   - Cost: 10 pizzas √ó $12 = $120

2. **Sodas calculation:**
   - 25 people √ó 2 sodas per person = 50 sodas needed
   - Cost: 50 sodas √ó $1.50 = $75

3. **Desserts calculation:**
   - 25 people √ó 1 dessert per person = 25 desserts needed
   - Cost: 25 desserts √ó $3.00 = $75

4. **Total cost:**
   - Pizza + Sodas + Desserts = $120 + $75 + $75 = $270

Let me use the calculator to verify these calculations....


üìã Final Plan:


I'll help you calculate the quantities and costs for your party!


üîß Using tool: calculator
   Input: {'expression': '25 * 3'}

üîß Using tool: calculator
   Input: {'expression': '75 / 8'}

üîß Using tool: calculator
   Input: {'expression': '10 * 12'}

üîß Using tool: calculator
   Input: {'expression': '25 * 2'}

üîß Using tool: calculator
   Input: {'expression': '50 * 1.50'}

üîß Using tool: calculator
   Input: {'expression': '25 * 3.00'}

üîß Using tool: calculator
   Input: {'expression': '120 + 75 + 75'}

‚ö†Ô∏è Important Notes:
- Extended thinking only works with tool_choice: 'auto' or 'none'
- The entire assistant turn operates in a single thinking mode
- When continuing conversations, preserve thinking blocks unchanged


### 5.3 Interleaved Thinking (Beta)

A powerful new feature for Claude 4 models that enables reasoning between tool calls.

In [9]:
def budget_comparison():
    """Compare different thinking budgets"""
    
    problem = """Analyze this business scenario:
    A coffee shop has 3 locations. Location A makes $2,500/day, 
    Location B makes $1,800/day, and Location C makes $3,200/day.
    Operating costs are 65% of revenue. They want to open a 4th location.
    What factors should they consider and what's the minimum daily revenue 
    the new location needs to be profitable?"""
    
    budgets = [1024, 5000, 15000]
    
    print("üí∞ Thinking Budget Comparison")
    print("=" * 60)
    
    for budget in budgets:
        print(f"\nüìä Budget: {budget:,} tokens")
        print("-" * 40)
        
        start_time = time.time()
        
        response = client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=8000,
            thinking={"type": "enabled", "budget_tokens": budget},
            messages=[{"role": "user", "content": problem}]
        )
        
        elapsed_time = time.time() - start_time
        
        # Get response length and usage
        response_text = ""
        for block in response.content:
            if block.type == "text":
                response_text = block.text
        
        print(f"‚è±Ô∏è  Time: {elapsed_time:.2f} seconds")
        print(f"üìù Response length: {len(response_text)} characters")
        print(f"üßÆ Tokens used - Input: {response.usage.input_tokens}, Output: {response.usage.output_tokens}")
        print(f"üí° Response preview: {response_text[:200]}...")
    
    print("\n\nüí° Budget Guidelines:")
    print("-" * 40)
    print("Start minimal (1,024 tokens) and increase based on task complexity")

budget_comparison()

üí∞ Thinking Budget Comparison

üìä Budget: 1,024 tokens
----------------------------------------
‚è±Ô∏è  Time: 19.06 seconds
üìù Response length: 1807 characters
üßÆ Tokens used - Input: 133, Output: 822
üí° Response preview: # Coffee Shop Expansion Analysis

## Current Performance Breakdown

**Revenue & Profitability:**
- Location A: $2,500/day ‚Üí $875/day profit (35%)
- Location B: $1,800/day ‚Üí $630/day profit (35%)
- Loc...

üìä Budget: 5,000 tokens
----------------------------------------
‚è±Ô∏è  Time: 22.42 seconds
üìù Response length: 1742 characters
üßÆ Tokens used - Input: 133, Output: 969
üí° Response preview: # Coffee Shop Expansion Analysis

## Current Performance Metrics

**Daily Revenue & Profit (at 65% operating costs):**
- Location A: $2,500 revenue ‚Üí $875 profit (35%)
- Location B: $1,800 revenue ‚Üí $...

üìä Budget: 15,000 tokens
----------------------------------------


BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': '`max_tokens` must be greater than `thinking.budget_tokens`. Please consult our documentation at https://docs.claude.com/en/docs/build-with-claude/extended-thinking#max-tokens-and-context-window-size'}, 'request_id': 'req_011CV66G9Y9NdwnodBsix4Tw'}

<a id='best-practices'></a>
## 6. Best Practices

### 6.1 Choosing the Right Budget

In [None]:
def budget_comparison():
    """Compare different thinking budgets"""
    
    problem = """Analyze this business scenario:
    A coffee shop has 3 locations. Location A makes $2,500/day, 
    Location B makes $1,800/day, and Location C makes $3,200/day.
    Operating costs are 65% of revenue. They want to open a 4th location.
    What factors should they consider and what's the minimum daily revenue 
    the new location needs to be profitable?"""
    
    budgets = [1024, 5000, 15000]
    
    print("üí∞ Thinking Budget Comparison")
    print("=" * 60)
    
    for budget in budgets:
        print(f"\nüìä Budget: {budget:,} tokens")
        print("-" * 40)
        
        start_time = time.time()
        
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1000,
            thinking={"type": "enabled", "budget_tokens": budget},
            messages=[{"role": "user", "content": problem}]
        )
        
        elapsed_time = time.time() - start_time
        
        # Get response length
        response_text = ""
        for block in response.content:
            if block.type == "text":
                response_text = block.text
        
        print(f"‚è±Ô∏è  Time: {elapsed_time:.2f} seconds")
        print(f"üìù Response length: {len(response_text)} characters")
        print(f"üí° Response preview: {response_text[:200]}...")

budget_comparison()

### Key Prompting Tips:

1. **Be Specific**: Clearly state what you want analyzed
2. **Provide Context**: Include all relevant information
3. **Structure Your Input**: Use clear formatting and sections
4. **Define Success Criteria**: Specify what a good answer looks like
5. **Keep it General**: Claude performs better with high-level instructions rather than step-by-step directives
6. **Let Claude Think**: Don't tell Claude to "think step by step" - extended thinking handles this automatically
7. **Use Multishot Examples**: For complex patterns, show examples using `<thinking>` tags in your prompts
8. **Verify Work**: Ask Claude to check its reasoning with test cases for better consistency

In [None]:
def prompting_best_practices():
    """Demonstrate effective prompting strategies"""
    
    # Good prompt - clear, specific, structured
    good_prompt = """Analyze the following investment options and recommend the best choice:

Option A: Stock Portfolio
- Expected annual return: 8%
- Risk level: High
- Minimum investment: $10,000
- Liquidity: High (can sell anytime)

Option B: Real Estate
- Expected annual return: 6%
- Risk level: Medium
- Minimum investment: $50,000
- Liquidity: Low (takes months to sell)

Option C: Bonds
- Expected annual return: 4%
- Risk level: Low
- Minimum investment: $5,000
- Liquidity: Medium

Investor Profile:
- Age: 35
- Investment horizon: 15 years
- Risk tolerance: Medium
- Available capital: $75,000
- Goal: Retirement savings

Please provide:
1. Analysis of each option
2. Recommended allocation
3. Justification for your recommendation"""
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=3000,
        thinking={"type": "enabled", "budget_tokens": 12000},
        messages=[{"role": "user", "content": good_prompt}]
    )
    
    print("‚úÖ Best Practices Example: Structured Investment Analysis")
    print("=" * 60)
    
    for block in response.content:
        if block.type == "text":
            display(Markdown(block.text))

prompting_best_practices()

def document_analysis_example():
    """Analyze a complex document with extended thinking"""
    
    # Simulated legal document excerpt
    document = """PURCHASE AGREEMENT - EXECUTIVE SUMMARY
    
    This Agreement is entered into as of January 15, 2024, between TechCorp Inc. 
    ("Buyer") and DataSystems LLC ("Seller").
    
    TERMS:
    1. Purchase Price: $45,000,000 (Forty-five million dollars)
    2. Payment Structure:
       - Initial Payment: $20,000,000 upon closing
       - Deferred Payment: $15,000,000 payable over 3 years
       - Performance Earnout: Up to $10,000,000 based on revenue targets
    
    3. Conditions Precedent:
       - Regulatory approval from FTC
       - No material adverse change in Seller's business
       - Retention of key employees (minimum 80% for 12 months)
    
    4. Representations and Warranties:
       - Seller warrants all intellectual property is free of encumbrances
       - Financial statements are accurate per GAAP
       - No pending litigation exceeding $500,000
    
    5. Termination Clauses:
       - Either party may terminate if closing doesn't occur by March 31, 2024
       - Buyer may terminate if due diligence reveals material issues
       - Break-up fee: $2,000,000 if Buyer terminates without cause
    """
    
    response = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=4000,
        thinking={"type": "enabled", "budget_tokens": 20000},
        messages=[{
            "role": "user",
            "content": f"""Analyze this purchase agreement and identify:
            
            1. Key risks for the buyer
            2. Key risks for the seller
            3. Potential deal breakers
            4. Areas that need clarification
            5. Recommendations for both parties
            
            Document:
            {document}"""
        }]
    )
    
    print("üìÑ Complex Document Analysis with Extended Thinking")
    print("=" * 60)
    
    for block in response.content:
        if block.type == "thinking":
            print("\nüß† Analysis Process (Summary):")
            print(block.thinking[:800] + "...\n")
        elif block.type == "text":
            print("üìä Detailed Analysis:")
            display(Markdown(block.text))

document_analysis_example()

<a id='examples'></a>
## 7. Real-World Examples

### 7.1 Complex Document Analysis

In [None]:
def architecture_planning_example():
    """Use extended thinking for software architecture decisions"""
    
    requirements = """Design a microservices architecture for an e-commerce platform with:
    
    Functional Requirements:
    - User authentication and profiles
    - Product catalog with search
    - Shopping cart and checkout
    - Order management and tracking
    - Payment processing
    - Inventory management
    - Review and rating system
    
    Non-Functional Requirements:
    - Handle 100,000 concurrent users
    - 99.9% uptime
    - Response time < 200ms for catalog
    - Scalable to 10x current load
    - Multi-region deployment
    - GDPR compliant
    
    Tech Stack Preferences:
    - Cloud-native (AWS/GCP/Azure)
    - Container-based deployment
    - Modern languages (Python/Go/Node.js)
    """
    
    response = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=5000,
        thinking={"type": "enabled", "budget_tokens": 25000},
        messages=[{
            "role": "user",
            "content": f"""Create a detailed microservices architecture plan.
            
            Include:
            1. Service breakdown and responsibilities
            2. Communication patterns (sync/async)
            3. Data storage strategy
            4. Security considerations
            5. Deployment architecture
            6. Scaling strategy
            7. Monitoring and observability
            
            Requirements:
            {requirements}"""
        }]
    )
    
    print("üèóÔ∏è Microservices Architecture Planning")
    print("=" * 60)
    
    for block in response.content:
        if block.type == "text":
            display(Markdown(block.text))

architecture_planning_example()

### 7.2 Code Architecture Planning

In [None]:
def cost_calculator():
    """Calculate costs for extended thinking usage"""
    
    # Current pricing (prices per million tokens)
    pricing = {
        "claude-sonnet-4.5": {"input": 3, "output": 15},
        "claude-haiku-4.5": {"input": 1, "output": 5},
        "claude-opus-4.1": {"input": 15, "output": 75},
        "claude-sonnet-4": {"input": 3, "output": 15},
        "claude-sonnet-3.7": {"input": 3, "output": 15}
    }
    
    print("üí∞ Extended Thinking Cost Calculator")
    print("=" * 60)
    
    # Example scenario
    scenarios = [
        {"name": "Simple Analysis", "input": 500, "thinking": 5000, "output": 1000},
        {"name": "Complex Problem", "input": 2000, "thinking": 20000, "output": 3000},
        {"name": "Deep Research", "input": 5000, "thinking": 50000, "output": 8000}
    ]
    
    # Show example with Sonnet 4.5
    model = "claude-sonnet-4.5"
    prices = pricing[model]
    
    print(f"\nüìä Cost Analysis for {model}")
    print("-" * 40)
    
    for scenario in scenarios:
        # Remember: thinking tokens are billed as output tokens
        input_cost = (scenario["input"] / 1_000_000) * prices["input"]
        thinking_cost = (scenario["thinking"] / 1_000_000) * prices["output"]
        output_cost = (scenario["output"] / 1_000_000) * prices["output"]
        total_cost = input_cost + thinking_cost + output_cost
        
        print(f"\n  {scenario['name']}:")
        print(f"    Input tokens: {scenario['input']:,}")
        print(f"    Thinking tokens: {scenario['thinking']:,} (billed as output)")
        print(f"    Output tokens: {scenario['output']:,}")
        print(f"    Total cost: ${total_cost:.4f}")
    
    print("\n\n‚ö†Ô∏è Important Pricing Notes:")
    print("-" * 40)
    print("‚Ä¢ Thinking tokens are charged at OUTPUT token rates")
    print("‚Ä¢ Claude 4+ models: Charged for full internal thinking, not summary")
    print("‚Ä¢ Thinking blocks from previous turns don't count toward context window")
    print("‚Ä¢ Use prompt caching to reduce costs on repeated patterns")

cost_calculator()

<a id='performance'></a>
## 8. Performance and Cost Considerations

### Understanding Token Usage and Costs

In [None]:
def performance_tips():
    """Demonstrate performance optimization strategies"""
    
    print("‚ö° Performance Optimization Strategies")
    print("=" * 60)
    
    strategies = [
        {
            "title": "1. Start with Minimal Budget",
            "description": "Begin with 1,024 tokens and increase only if needed",
            "example_budget": 1024,
            "use_case": "Simple calculations or basic analysis"
        },
        {
            "title": "2. Use Streaming for Better UX",
            "description": "Stream responses to show progress during long thinking",
            "example_budget": 10000,
            "use_case": "Interactive applications"
        },
        {
            "title": "3. Batch Processing for Large Budgets",
            "description": "Use batch API for thinking budgets > 32k tokens to avoid timeouts",
            "example_budget": 50000,
            "use_case": "Overnight analysis jobs or complex research"
        },
        {
            "title": "4. Cache Common Patterns",
            "description": "Use prompt caching for repeated analysis patterns",
            "example_budget": 15000,
            "use_case": "Standardized document analysis"
        },
        {
            "title": "5. Choose the Right Model",
            "description": "Use Haiku 4.5 for faster, cost-effective thinking on simpler tasks",
            "example_budget": 8000,
            "use_case": "High-volume tasks with moderate complexity"
        }
    ]
    
    for strategy in strategies:
        print(f"\n{strategy['title']}")
        print(f"  üìù {strategy['description']}")
        print(f"  üí° Budget: {strategy['example_budget']:,} tokens")
        print(f"  üéØ Best for: {strategy['use_case']}")
    
    print("\n\nüìà Budget vs. Quality Guidelines:")
    print("-" * 40)
    print("  1,024 - 5,000 tokens: Basic reasoning tasks")
    print("  5,000 - 15,000 tokens: Standard complex problems")
    print("  15,000 - 32,000 tokens: Deep analysis and research")
    print("  32,000+ tokens: Extensive multi-faceted problems (use batch API)")
    
    print("\n\nüéØ Best Use Cases for Extended Thinking:")
    print("-" * 40)
    print("  ‚úì Complex STEM problems with sequential logic")
    print("  ‚úì Constraint optimization with competing requirements")
    print("  ‚úì Structured thinking frameworks")
    print("  ‚úì Multi-step analysis requiring mental models")
    
    print("\n\n‚ö†Ô∏è When NOT to Use Extended Thinking:")
    print("-" * 40)
    print("  ‚úó Simple queries or lookups")
    print("  ‚úó Real-time chat applications")
    print("  ‚úó Tasks where latency is critical")
    print("  ‚úó High-volume, low-complexity requests")
    
    print("\n\nüîß Advanced Tips:")
    print("-" * 40)
    print("  ‚Ä¢ Use interleaved thinking (beta) for complex tool orchestration")
    print("  ‚Ä¢ Extended thinking works best in English (output can be any language)")
    print("  ‚Ä¢ Ask Claude to verify work with test cases for better consistency")
    print("  ‚Ä¢ Incompatible with temperature, top_k, or forced tool use")

performance_tips()

## Summary and Next Steps

### What We've Learned

1. **Extended Thinking Basics**: How to enable and use Claude's reasoning capabilities
2. **Model Support**: All Claude 4+ models support extended thinking, including Sonnet 4.5 and Haiku 4.5
3. **Thinking Blocks**: Understanding summarized thinking in Claude 4+ vs full output in Claude 3.7
4. **Advanced Features**: Streaming, tool use, and interleaved thinking (beta)
5. **Best Practices**: Start with minimal budgets, use high-level instructions, verify work with test cases
6. **Real-World Applications**: Document analysis and architecture planning
7. **Cost Management**: Understanding pricing and optimization strategies

### Key Takeaways

‚úÖ **Use Extended Thinking for:**
- Complex multi-step problems requiring sequential reasoning
- Deep document analysis and structured evaluation
- Strategic planning and decision-making with multiple constraints
- STEM problems and optimization tasks
- Quality-critical tasks where accuracy matters more than speed

‚ùå **Avoid Extended Thinking for:**
- Simple queries or lookups
- Real-time chat applications
- Tasks where latency is critical
- High-volume, low-complexity requests

### New Features to Explore

üÜï **Interleaved Thinking (Beta)**: Enable reasoning between tool calls for more sophisticated multi-step workflows
üÜï **Haiku 4.5**: Fast, cost-effective extended thinking for high-volume tasks
üÜï **Context Window Optimization**: Thinking blocks from previous turns don't count toward context limits

### Common Pitfalls to Avoid

‚ö†Ô∏è **Don't:**
- Toggle thinking mid-conversation (complete the assistant turn first)
- Use with forced tool use or modified temperature settings
- Manually edit or parse signature fields
- Over-specify instructions (let Claude's creativity shine through)
- Forget to pass thinking blocks back unchanged when using tools

### Resources for Further Learning

- [Extended Thinking Documentation](https://docs.claude.com/en/docs/build-with-claude/extended-thinking)
- [Extended Thinking Tips](https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/extended-thinking-tips)
- [Anthropic API Reference](https://docs.anthropic.com/api/)
- [Claude on Amazon Bedrock](https://aws.amazon.com/bedrock/claude/)
- [Claude on Google Cloud Vertex AI](https://cloud.google.com/vertex-ai)

### Try It Yourself!

Now that you understand extended thinking, try these challenges:

1. **Math Challenge**: Solve a complex optimization problem with constraints
2. **Analysis Challenge**: Analyze a dataset and provide insights with multi-step reasoning
3. **Planning Challenge**: Design a system architecture for your own project
4. **Budget Experiment**: Compare different thinking budgets on the same complex problem
5. **Tool Orchestration**: Use interleaved thinking to build a multi-step workflow
6. **Cost Optimization**: Find the sweet spot between budget and quality for your use case

Happy thinking! ü§î‚ú®

---

**Last Updated**: Based on Claude documentation as of January 2025  
**API Version**: Anthropic Python SDK v0.40.0+

In [None]:
def performance_tips():
    """Demonstrate performance optimization strategies"""
    
    print("‚ö° Performance Optimization Strategies")
    print("=" * 60)
    
    strategies = [
        {
            "title": "1. Start with Minimal Budget",
            "description": "Begin with 1,024 tokens and increase only if needed",
            "example_budget": 1024,
            "use_case": "Simple calculations or basic analysis"
        },
        {
            "title": "2. Use Streaming for Better UX",
            "description": "Stream responses to show progress during long thinking",
            "example_budget": 10000,
            "use_case": "Interactive applications"
        },
        {
            "title": "3. Batch Processing for Large Budgets",
            "description": "Use batch API for thinking budgets > 32k tokens",
            "example_budget": 50000,
            "use_case": "Overnight analysis jobs"
        },
        {
            "title": "4. Cache Common Patterns",
            "description": "Use prompt caching for repeated analysis patterns",
            "example_budget": 15000,
            "use_case": "Standardized document analysis"
        }
    ]
    
    for strategy in strategies:
        print(f"\n{strategy['title']}")
        print(f"  üìù {strategy['description']}")
        print(f"  üí° Budget: {strategy['example_budget']:,} tokens")
        print(f"  üéØ Best for: {strategy['use_case']}")
    
    print("\n\nüìà Budget vs. Quality Guidelines:")
    print("-" * 40)
    print("  1,024 - 5,000 tokens: Basic reasoning tasks")
    print("  5,000 - 15,000 tokens: Standard complex problems")
    print("  15,000 - 32,000 tokens: Deep analysis and research")
    print("  32,000+ tokens: Extensive multi-faceted problems")

performance_tips()

## Summary and Next Steps

### What We've Learned

1. **Extended Thinking Basics**: How to enable and use Claude's reasoning capabilities
2. **Thinking Blocks**: Understanding the structure and content of thinking outputs
3. **Advanced Features**: Streaming, tool use, and complex scenarios
4. **Best Practices**: Optimal prompting and budget selection
5. **Real-World Applications**: Document analysis and architecture planning
6. **Cost Management**: Understanding pricing and optimization strategies

### When to Use Extended Thinking

‚úÖ **Use it for:**
- Complex multi-step problems
- Deep document analysis
- Strategic planning and decision-making
- Quality-critical tasks where accuracy matters more than speed

‚ùå **Avoid it for:**
- Simple queries or lookups
- Real-time chat applications
- Tasks where latency is critical
- High-volume, low-complexity requests

### Resources for Further Learning

- [Anthropic Documentation](https://docs.anthropic.com/)
- [Extended Thinking Cookbook](https://docs.anthropic.com/cookbook/extended-thinking)
- [API Reference](https://docs.anthropic.com/api/)

### Try It Yourself!

Now that you understand extended thinking, try these challenges:

1. **Math Challenge**: Use extended thinking to solve a complex optimization problem
2. **Analysis Challenge**: Analyze a dataset and provide insights with reasoning
3. **Planning Challenge**: Design a system architecture for your own project
4. **Comparison Challenge**: Compare different thinking budgets on the same problem

Happy thinking! ü§î‚ú®