# Anthropic Model List - Jupyter Notebook

This notebook demonstrates how to retrieve and display a list of available models from the Anthropic AI platform.

## Prerequisites

Before running this code, make sure you have:

1. Installed the Anthropic Python client:
   ```
   !pip install anthropic
   ```

2. Set your Anthropic API key as an environment variable:
   ```
   import os
   os.environ["ANTHROPIC_API_KEY"] = "your_api_key_here"  # Replace with your actual API key
   ```
   
   Alternatively, you can create a `.env` file and load it:
   ```
   from dotenv import load_dotenv
   load_dotenv()
   ```

## Retrieving and Displaying Anthropic Models

The code below will:
1. Initialize the Anthropic client
2. Retrieve a list of available models (limited to 20)
3. Display each model's ID and display name in a formatted way

```python
# Import the Anthropic library
import anthropic

# Initialize the client
client = anthropic.Anthropic()

# Get the models
models = client.models.list(limit=20)

# Print model ID and display name
print("Available Anthropic Models:")
print("--------------------------")
for model in models.data:
    print(f"{model.id:<35} | {model.display_name}")
```

## How It Works

- The script uses the official Anthropic Python client to interact with their API
- We limit the results to 20 models, although Anthropic typically offers fewer than that
- For each model, we display:
  - The model ID (e.g., "claude-3-7-sonnet-20250219") - this is what you use in API calls
  - The human-readable display name (e.g., "Claude 3.7 Sonnet")
- The `:<35` in the f-string is a formatting instruction that left-aligns the model ID text and pads it to 35 characters, creating a clean output

## Expected Output

When run, this code should produce output similar to:

```
Available Anthropic Models:
--------------------------
claude-3-7-sonnet-20250219         | Claude 3.7 Sonnet
claude-3-5-sonnet-20240620         | Claude 3.5 Sonnet
claude-3-opus-20240229             | Claude 3 Opus
claude-3-sonnet-20240229           | Claude 3 Sonnet
claude-2.1                         | Claude 2.1
```

## Enhanced Display (Optional)

If you want a more visually appealing display, you can use the `tabulate` library:

```python
import anthropic
from tabulate import tabulate

# Initialize the client
client = anthropic.Anthropic()

# Get the models
models = client.models.list(limit=20)

# Prepare data for tabulation
model_data = []
for model in models.data:
    # Format the date to be more readable
    created_date = model.created_at.strftime("%Y-%m-%d")
    model_data.append([model.id, model.display_name, created_date])

# Print as a table
print(tabulate(model_data, headers=["Model ID", "Display Name", "Created Date"], tablefmt="grid"))
```

Make sure to install tabulate first with `!pip install tabulate` if you choose this option.

In [4]:
import anthropic

# Initialize the client
client = anthropic.Anthropic()

# Get the models
models = client.models.list(limit=20)

# Print model ID and display name
print("Available Anthropic Models:")
print("--------------------------")
for model in models.data:
    print(f"{model.id:<35} | {model.display_name}")
    

Available Anthropic Models:
--------------------------
claude-opus-4-20250514              | Claude Opus 4
claude-sonnet-4-20250514            | Claude Sonnet 4
claude-3-7-sonnet-20250219          | Claude Sonnet 3.7
claude-3-5-sonnet-20241022          | Claude Sonnet 3.5 (New)
claude-3-5-haiku-20241022           | Claude Haiku 3.5
claude-3-5-sonnet-20240620          | Claude Sonnet 3.5 (Old)
claude-3-haiku-20240307             | Claude Haiku 3
claude-3-opus-20240229              | Claude Opus 3
claude-3-sonnet-20240229            | Claude Sonnet 3
claude-2.1                          | Claude 2.1
claude-2.0                          | Claude 2.0


In [5]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")

# Check if API keys are loaded
if anthropic_api_key:
    print("‚úÖ API keys are successfully loaded.")
else:
    print("‚ö†Ô∏è Warning: One or more API keys are missing.")

# Optionally, display API keys (for debugging purposes only)
display_keys = False  # Change to True if you want to see the keys

if display_keys:
    print(f"Anthropic API Key: {anthropic_api_key}")
else:
    print("üîí API keys are loaded but hidden for security.")


‚úÖ API keys are successfully loaded.
üîí API keys are loaded but hidden for security.


In [6]:
MODEL_NAME = "claude-3-5-haiku-20241022"
client = anthropic.Anthropic(api_key=anthropic_api_key)

def get_completion(prompt: str):
    message = client.messages.create(
        model=MODEL_NAME,
        max_tokens=2000,
        temperature=0.0,
        messages=[
          {"role": "user", "content": prompt}
        ]
    )
    return message.content[0].text

# Prompt
prompt = "Hello, Claude!"

# Get Claude's response
print(get_completion(prompt))

Hello! How are you doing today?


# Claude Model Comparison Guide

## Model Overview

Anthropic offers several Claude models, each optimized for different use cases. Here's a comprehensive comparison to help you choose the right model for your needs. This is a rough guide to help you think about pricing. 

### Claude 4 Series (Latest)

| Model | Context Window | Strengths | Best For | Pricing (per M tokens) |
|-------|----------------|-----------|----------|------------------------|
| **Claude Opus 4** | 200K tokens | Most capable, advanced reasoning, complex tasks | Research, analysis, coding, creative writing | Input: $15, Output: $75 |
| **Claude Sonnet 4** | 200K tokens | Balanced performance and speed | General purpose, business applications | Input: $3, Output: $15 |

### Claude 3 Series

| Model | Context Window | Strengths | Best For | Pricing (per M tokens) |
|-------|----------------|-----------|----------|------------------------|
| **Claude 3.7 Sonnet** | 200K tokens | Enhanced reasoning, improved coding | Development, technical writing | Input: $3, Output: $15 |
| **Claude 3.5 Sonnet (New)** | 200K tokens | Fast, versatile, good at reasoning | Most general use cases | Input: $3, Output: $15 |
| **Claude 3.5 Haiku** | 200K tokens | Fastest, cost-effective | Simple tasks, high-volume processing | Input: $0.25, Output: $1.25 |
| **Claude 3.5 Sonnet (Old)** | 200K tokens | Previous generation Sonnet | Legacy applications | Input: $3, Output: $15 |
| **Claude 3 Opus** | 200K tokens | Most capable 3.x model | Complex reasoning, research | Input: $15, Output: $75 |
| **Claude 3 Sonnet** | 200K tokens | Balanced 3.x model | General purpose | Input: $3, Output: $15 |
| **Claude 3 Haiku** | 200K tokens | Fastest 3.x model | Simple, quick tasks | Input: $0.25, Output: $1.25 |

## Model Selection Guide

### üéØ **For Production Applications**
- **Claude Sonnet 4**: Best balance of capability and cost
- **Claude 3.5 Sonnet (New)**: Proven performance, widely adopted

### ‚ö° **For High-Volume/Speed-Critical Tasks**
- **Claude 3.5 Haiku**: Fastest response times, most cost-effective
- **Claude 3 Haiku**: Alternative for legacy systems

### üß† **For Complex Reasoning & Analysis**
- **Claude Opus 4**: Cutting-edge capabilities
- **Claude 3 Opus**: Proven complex reasoning abilities

### üíª **For Coding & Technical Tasks**
- **Claude 3.7 Sonnet**: Enhanced for development workflows
- **Claude Sonnet 4**: Advanced coding capabilities

### üìù **For Creative Writing & Content**
- **Claude Opus 4**: Most creative and nuanced
- **Claude 3.5 Sonnet**: Good creative balance

## Key Considerations

### **Context Window**
- All current models support 200K tokens (~150K words)
- Ideal for processing long documents, codebases, or conversations

### **Response Quality vs Speed**
- **Opus models**: Highest quality, slower responses
- **Sonnet models**: Balanced quality and speed
- **Haiku models**: Fastest responses, good quality for simpler tasks

### **Cost Optimization**
- Start with **Claude 3.5 Haiku** for prototyping
- Upgrade to **Sonnet** models for production
- Use **Opus** models only when maximum capability is required

### **Model Updates**
- Claude 4 series represents the latest generation
- Claude 3.5 models receive periodic updates
- Always test new models before switching production systems

---

## Messages format

As we saw in the previous lesson, we can use `client.messages.create()` to send a message to Claude and get a response:

In [13]:
response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1000,
    messages=[
        {"role": "user", "content": "What flavors are used in Coca Cola?"}
    ]
)

print(response)

Message(id='msg_014WE4hqyZBF7Ynw3eTG39SF', content=[TextBlock(text='The exact recipe for Coca-Cola is a closely guarded trade secret, but the general flavors and ingredients are known:\n\n- Carbonated water - This provides the bubbly carbonation.\n\n- Caffeine - Coca-Cola contains around 34mg of caffeine per 12oz serving.\n\n- Sugar (or high fructose corn syrup) - This provides the sweetness.\n\n- Caramel coloring - This gives Coca-Cola its distinctive brown color.\n\n- Phosphoric acid - This adds tartness and a slightly acidic flavor.\n\nThe specific flavors used are a blend of citrus flavors (such as lemon, lime, and orange), spices (such as cinnamon and nutmeg), and other flavorings. The original Coca-Cola formula also contained coca leaf extract, which provided a small amount of cocaine, but this has been removed since the early 1900s.\n\nThe exact blend of these flavors is what gives Coca-Cola its unique taste, and the full recipe has been kept secret by the Coca-Cola company sinc

Let's take a closer look at this bit: 
```py
messages=[
        {"role": "user", "content": "What flavors are used in Dr. Pepper?"}
    ]
```

The messages parameter is a crucial part of interacting with the Claude API. It allows you to provide the conversation history and context for Claude to generate a relevant response. 

The messages parameter expects a list of message dictionaries, where each dictionary represents a single message in the conversation.
Each message dictionary should have the following keys:

* `role`: A string indicating the role of the message sender. It can be either "user" (for messages sent by the user) or "assistant" (for messages sent by Claude).
* `content`: A string or list of content dictionaries representing the actual content of the message. If a string is provided, it will be treated as a single text content block. If a list of content dictionaries is provided, each dictionary should have a "type" (e.g., "text" or "image") and the corresponding content.  For now, we'll leave `content` as a single string.

Here's an example of a messages list with a single user message:

```py
messages = [
    {"role": "user", "content": "Hello Claude! How are you today?"}
]
```

And here's an example with multiple messages representing a conversation:

```py
messages = [
    {"role": "user", "content": "Hello Claude! How are you today?"},
    {"role": "assistant", "content": "Hello! I'm doing well, thank you. How can I assist you today?"},
    {"role": "user", "content": "Can you tell me a fun fact about ferrets?"},
    {"role": "assistant", "content": "Sure! Did you know that excited ferrets make a clucking vocalization known as 'dooking'?"},
]
```

Remember that messages always alternate between user and assistant messages (Source of Image: Anthropic Courses).

![Alternating Messages](images/alternating_messages.png)

The messages format allows us to structure our API calls to Claude in the form of a conversation, allowing for **context preservation**: The messages format allows for maintaining an entire conversation history, including both user and assistant messages. This ensures that Claude has access to the full context of the conversation when generating responses, leading to more coherent and relevant outputs.  

**Note: many use-cases don't require a conversation history, and there's nothing wrong with providing a list of messages that only contains a single message!** 

In [None]:
# Example: Customer service bot with specific guidelines
system_prompt = """You are a friendly customer service representative for TechGadgets Inc.

Guidelines:
- Always be polite and professional
- Address customers by name when provided
- If you don't know something, offer to connect them with a specialist
- Keep responses concise but helpful
- Use emojis sparingly and professionally"""

response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=300,
    system=system_prompt,
    messages=[
        {"role": "user", "content": "Hi, I'm Sarah. My laptop charger stopped working yesterday. What should I do?"}
    ]
)

print("ü§ñ Customer Service Response:")
print(response.content[0].text)

## System Prompt Best Practices

### ‚úÖ Good System Prompts:
- Clear and specific about role/behavior
- Concise but complete
- Focus on high-level guidelines
- Set expectations for tone and format

### ‚ùå Avoid:
- Putting detailed instructions in system prompts (use user messages instead)
- Making them too long (they count toward token limits)
- Including task-specific details (those belong in user messages)

## Advanced System Prompt Example

In [None]:
# Without system prompt
response_no_system = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=200,
    messages=[
        {"role": "user", "content": "What are list comprehensions in Python?"}
    ]
)

print("üìù WITHOUT System Prompt:")
print(response_no_system.content[0].text)
print("\n" + "="*80 + "\n")

# With system prompt
response_with_system = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=200,
    system="You are a helpful Python tutor who explains concepts in simple terms with practical examples. Always include a code example.",
    messages=[
        {"role": "user", "content": "What are list comprehensions in Python?"}
    ]
)

print("üéØ WITH System Prompt:")
print(response_with_system.content[0].text)

# System Prompts

System prompts are a powerful way to set the context, role, and behavior for Claude before the conversation begins. They're perfect for:
- Defining Claude's role or persona
- Setting guidelines and constraints
- Providing background information
- Establishing output format preferences

## Why Use System Prompts?

System prompts are processed differently than user messages:
- They set the overall context for the entire conversation
- They're more stable and consistent across responses
- They help maintain character/role throughout long conversations
- They're separate from the conversation history

## Basic System Prompt Example

Let's see how system prompts affect Claude's behavior:

***

## Inspecting the message response
Next, let's take a look at the shape of the response we get back from Claude. 

Let's ask Claude to do something simple and now let's inspect the contents of the `response` that we get back:

In [15]:
response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1000,
    messages=[
        {"role": "user", "content": "Translate Thank You to French. Respond with a single word"}
    ]
)

response

Message(id='msg_01WapibPPHkyoxMZYvJAEwQ8', content=[TextBlock(text='Merci.', type='text')], model='claude-3-haiku-20240307', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=20, output_tokens=7, service_tier='standard'))

We get back a `Message` object that contains a handful of properties.  Here's an example:

```
Message(id='msg_01Mq5gDnUmDESukTgwPV8xtG', content=[TextBlock(text='Bonjour.', type='text')], model='claude-3-haiku-20240307', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(input_tokens=19, output_tokens=8))
```

 The most important piece of information is the `content` property: this contains the actual content the model generated for us.   This is a **list** of content blocks, each of which has a type that determines its shape.

 In order to access the actual text content of the model's response, we need to do the following:



In [16]:
print(response.content[0].text)

Merci.


In addition to `content`, the `Message` object contains some other pieces of information:

* `id` - a unique object identifier
* `type` - The object type, which will always be "message"
* `role` - The conversational role of the generated message. This will always be "assistant".
* `model` - The model that handled the request and generated the response
* `stop_reason` - The reason the model stopped generating.  We'll learn more about this later.
* `stop_sequence` - We'll learn more about this shortly.
* `usage` - information on billing and rate-limit usage. Contains information on:
    * `input_tokens` - The number of input tokens that were used.
    * `output_tokens` - The number of output tokens that were used.

It's important to know that we have access to these pieces of information, but if you only remember one thing, make it this: `content` contains the actual model-generated content

## Messages list use cases

The messages list is a powerful feature that allows you to build complex interactions with Claude. Here are some common use cases:

### Putting words in Claude's mouth

Another common strategy for getting very specific outputs is to "put words in Claude's mouth".  Instead of only providing `user` messages to Claude, we can also supply an `assistant` message that Claude will use when generating output.  

When using Anthropic‚Äôs API, you are not limited to just the `user` message. If you supply an `assistant` message, Claude will continue the conversation from the last `assistant` token.  Just remember that we must start with a `user` message.

Suppose I want Claude to write me a haiku that starts with the first line, "calming mountain air".  I can provide the following conversation history: 

```py
messages=[
        {"role": "user", "content": f"Generate a beautiful haiku"},
        {"role": "assistant", "content": "calming mountain air"}
    ]
```
We tell Claude that we want it to generate a Haiku AND we put the first line of the Haiku in Claude's mouth

In [17]:
response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=500,
    messages=[
        {"role": "user", "content": f"Generate a beautiful haiku"},
        {"role": "assistant", "content": "calming mountain air"}
    ]
)
print(response.content[0].text)


petals fall like gentle tears
nature's soothing dance


To get the entire haiku, starting with the line we provided:

In [18]:
print("calming mountain air" + response.content[0].text)

calming mountain air
petals fall like gentle tears
nature's soothing dance


In [None]:
models_to_compare = [
    "claude-3-5-haiku-20241022",
    "claude-3-5-sonnet-20241022",
    "claude-3-opus-20240229"
]

prompt = "Write a professional email thanking a client for their business."

print("üíµ Cost Comparison Across Models")
print("=" * 80 + "\n")

for model in models_to_compare:
    try:
        response = client.messages.create(
            model=model,
            max_tokens=300,
            temperature=0.7,
            messages=[{"role": "user", "content": prompt}]
        )
        
        cost, details = calculate_cost(
            response.usage.input_tokens,
            response.usage.output_tokens,
            model
        )
        
        model_name = model.split("-")[1].capitalize() + " " + model.split("-")[2].capitalize()
        print(f"üìä {model_name}:")
        print(f"   Tokens: {details['total_tokens']:,} | Cost: {details['total_cost']}")
        print()
        
        time.sleep(1)  # Rate limit protection
        
    except Exception as e:
        print(f"‚ùå {model}: {str(e)}\n")

print("=" * 80)
print("üí° Tip: Use Haiku for simple tasks, Sonnet for balanced needs, Opus for complex reasoning")

## Cost Comparison Across Models

Let's compare the cost of the same task across different Claude models:

In [None]:
# Make an API call
model_to_use = "claude-3-5-haiku-20241022"
response = client.messages.create(
    model=model_to_use,
    max_tokens=500,
    messages=[{
        "role": "user",
        "content": "Explain the concept of machine learning in 3 sentences."
    }]
)

# Extract token usage from response
input_tokens = response.usage.input_tokens
output_tokens = response.usage.output_tokens

# Calculate cost
cost, details = calculate_cost(input_tokens, output_tokens, model_to_use)

print("üìû API Call Results:")
print("=" * 80)
print("\nü§ñ Claude's Response:")
print(response.content[0].text)
print("\n" + "=" * 80)
print("\nüí∞ Cost Analysis:")
print(f"  Model: {model_to_use}")
print(f"  Input tokens:  {details['input_tokens']:,}")
print(f"  Output tokens: {details['output_tokens']:,}")
print(f"  Total tokens:  {details['total_tokens']:,}")
print(f"  TOTAL COST:    {details['total_cost']}")
print("\n" + "=" * 80)

## Real API Call with Cost Tracking

Let's make an actual API call and track its cost:

In [None]:
def calculate_cost(input_tokens, output_tokens, model_id):
    """
    Calculate the cost of an API call based on token usage.
    
    Pricing as of 2025 (per million tokens):
    """
    # Pricing table (dollars per 1M tokens)
    pricing = {
        "claude-opus-4-20250514": {"input": 15.00, "output": 75.00},
        "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
        "claude-3-7-sonnet-20250219": {"input": 3.00, "output": 15.00},
        "claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
        "claude-3-5-haiku-20241022": {"input": 0.25, "output": 1.25},
        "claude-3-opus-20240229": {"input": 15.00, "output": 75.00},
        "claude-3-sonnet-20240229": {"input": 3.00, "output": 15.00},
        "claude-3-haiku-20240307": {"input": 0.25, "output": 1.25},
    }
    
    if model_id not in pricing:
        return None, "Model pricing not found"
    
    # Calculate costs
    input_cost = (input_tokens / 1_000_000) * pricing[model_id]["input"]
    output_cost = (output_tokens / 1_000_000) * pricing[model_id]["output"]
    total_cost = input_cost + output_cost
    
    return total_cost, {
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "total_tokens": input_tokens + output_tokens,
        "input_cost": f"${input_cost:.6f}",
        "output_cost": f"${output_cost:.6f}",
        "total_cost": f"${total_cost:.6f}"
    }

# Test the calculator
cost, details = calculate_cost(1000, 500, "claude-3-5-haiku-20241022")
print("üí∞ Cost Calculator Example:")
print(f"  Input tokens:  {details['input_tokens']:,}")
print(f"  Output tokens: {details['output_tokens']:,}")
print(f"  Total tokens:  {details['total_tokens']:,}")
print(f"  Input cost:    {details['input_cost']}")
print(f"  Output cost:   {details['output_cost']}")
print(f"  TOTAL COST:    {details['total_cost']}")

print(f"\n‚úÖ Cost calculator function ready!")

# Token Counting & Cost Management

Understanding tokens and costs is essential for building production applications with Claude. Every API call costs money based on the number of tokens processed.

## What are Tokens?

- **Tokens** are the basic units of text that Claude processes
- Roughly: 1 token ‚âà 4 characters or ‚âà 0.75 words in English
- Both input (prompt) and output (response) consume tokens
- Different models have different pricing per token

## Cost Calculator Function

Let's build a useful cost calculator:

In [None]:
# Example 1: Data extraction (use low temperature)
print("üìä Data Extraction (Temperature = 0.0):")
response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=100,
    temperature=0.0,
    messages=[{"role": "user", "content": "Extract the email from: Contact us at support@example.com for help"}]
)
print(response.content[0].text)

print("\n" + "="*80 + "\n")

# Example 2: Creative writing (use high temperature)
print("‚ú® Creative Writing (Temperature = 1.0):")
response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=100,
    temperature=1.0,
    messages=[{"role": "user", "content": "Write an opening line for a sci-fi story"}]
)
print(response.content[0].text)

## Other Important Parameters

### max_tokens
Controls the maximum length of the response.

```python
# Short response
response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=50,  # Very short
    messages=[{"role": "user", "content": "Explain AI"}]
)
```

### top_p (Nucleus Sampling)
Alternative to temperature for controlling randomness.
- Range: 0.0 to 1.0
- Considers only the most probable tokens whose cumulative probability exceeds p

### top_k
Limits the number of tokens to consider at each step.
- Useful for controlling vocabulary size
- Combine with temperature for fine-tuned control

## Practical Temperature Examples

In [None]:
import time

prompt = "Write a creative tagline for an eco-friendly coffee shop"
temperatures = [0.0, 0.5, 1.0]

print("üå°Ô∏è Temperature Comparison\n" + "="*80)

for temp in temperatures:
    responses = []
    print(f"\nüîπ Temperature: {temp}")
    print("-" * 80)
    
    # Generate 3 responses to show variety
    for i in range(3):
        response = client.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=50,
            temperature=temp,
            messages=[{"role": "user", "content": prompt}]
        )
        tagline = response.content[0].text.strip()
        print(f"  {i+1}. {tagline}")
        time.sleep(0.5)  # Brief pause to avoid rate limits
    
print("\n" + "="*80)
print("üìä Notice: Lower temps = more similar outputs, Higher temps = more variety")

# Temperature & Model Parameters

Temperature controls the randomness and creativity of Claude's responses. Understanding how to use it effectively is crucial for getting the right outputs.

## What is Temperature?

- **Range**: 0.0 to 1.0
- **Low temperature (0.0-0.3)**: More focused, deterministic, consistent
- **Medium temperature (0.4-0.7)**: Balanced creativity and consistency  
- **High temperature (0.8-1.0)**: More creative, diverse, unpredictable

## When to Use Different Temperatures

| Temperature | Best For | Examples |
|-------------|----------|----------|
| **0.0-0.3** | Factual tasks, code generation, data extraction | API responses, translations, summaries |
| **0.4-0.7** | Balanced tasks, Q&A, explanations | Customer support, tutorials, analysis |
| **0.8-1.0** | Creative tasks, brainstorming, variety | Marketing copy, stories, ideation |

## Temperature Comparison Example

Let's see how temperature affects output for the same prompt:

### Few-shot prompting

One of the most useful prompting strategies is called "few-shot prompting" which involves providing a model with a small number of **examples**.  These examples help guide Claude's generated output.  The messages conversation history is an easy way to provide examples to Claude.

For example, suppose we want to use Claude to analyze the sentiment in tweets.  We could start by simply asking Claude to "please analyze the sentiment in this tweet: " and see what sort of output we get:

In [19]:
response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=500,
    messages=[
        {"role": "user", "content": f"Analyze the sentiment in this tweet: Just tried the new spicy pickles from @PickleCo, and my taste buds are doing a happy dance! üå∂Ô∏èü•í #pickleslove #spicyfood"},
    ]
)
print(response.content[0].text)

The sentiment in this tweet is positive. Here's a breakdown of the analysis:

1. Positive language: The tweet uses phrases like "doing a happy dance" and "pickleslove" which convey a sense of excitement and enjoyment.

2. Emojis: The use of the pepper and pickle emojis add a playful and enthusiastic tone to the tweet.

3. Endorsement: The tweeter is positively endorsing the new spicy pickles from the company "@PickleCo", suggesting they enjoyed the product.

4. Hashtags: The hashtags "#pickleslove" and "#spicyfood" reinforce the positive sentiment around the pickles and the tweeter's enjoyment of spicy food.

Overall, the tweet expresses a very favorable sentiment towards the new spicy pickles from @PickleCo. The tweeter seems genuinely excited and pleased with the product, indicating a positive experience.


The first time I ran the above code, Claude generated this long response: 
```
The sentiment in this tweet is overwhelmingly positive. The user expresses their enjoyment of the new spicy pickles from @PickleCo, using enthusiastic language and emojis to convey their delight.

Positive indicators:
1. "My taste buds are doing a happy dance!" - This phrase indicates that the user is extremely pleased with the taste of the pickles, to the point of eliciting a joyful physical response.

2. Emojis - The use of the hot pepper üå∂Ô∏è and cucumber ü•í emojis further emphasizes the user's excitement about the spicy pickles.

3. Hashtags - The inclusion of #pickleslove and #spicyfood hashtags suggests that the user has a strong affinity for pickles and spicy food, and the new product aligns perfectly with their preferences.

4. Exclamation mark - The exclamation mark at the end of the first sentence adds emphasis to the user's positive experience.

Overall, the tweet conveys a strong sense of satisfaction, excitement, and enjoyment related to trying the new spicy pickles from @PickleCo.
```

This is a great response, but it's probably way more information than we need from Claude, especially if we're trying to automate the sentiment analysis of a large number of tweets.  

We might prefer that Claude respond with a standardized output format like a single word (POSITIVE, NEUTRAL, NEGATIVE) or a numeric value (1, 0, -1).  For readability and simplicity, let's get Claude to respond with either "POSITIVE" or "NEGATIVE".  One way of doing this is through few-shot prompting.  We can provide Claude with a conversation history that shows exactly how we want it to respond: 

```py
messages=[
        {"role": "user", "content": "Unpopular opinion: Pickles are disgusting. Don't @ me"},
        {"role": "assistant", "content": "NEGATIVE"},
        {"role": "user", "content": "I think my love for pickles might be getting out of hand. I just bought a pickle-shaped pool float"},
        {"role": "assistant", "content": "POSITIVE"},
        {"role": "user", "content": "Seriously why would anyone ever eat a pickle?  Those things are nasty!"},
        {"role": "assistant", "content": "NEGATIVE"},
        {"role": "user", "content": "Just tried the new spicy pickles from @PickleCo, and my taste buds are doing a happy dance! üå∂Ô∏èü•í #pickleslove #spicyfood"},
    ]
```



In [20]:
response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=500,
    messages=[
        {"role": "user", "content": "Unpopular opinion: Pickles are disgusting. Don't @ me"},
        {"role": "assistant", "content": "NEGATIVE"},
        {"role": "user", "content": "I think my love for pickles might be getting out of hand. I just bought a pickle-shaped pool float"},
        {"role": "assistant", "content": "POSITIVE"},
        {"role": "user", "content": "Seriously why would anyone ever eat a pickle?  Those things are nasty!"},
        {"role": "assistant", "content": "NEGATIVE"},
        {"role": "user", "content": "Just tried the new spicy pickles from @PickleCo, and my taste buds are doing a happy dance! üå∂Ô∏èü•í #pickleslove #spicyfood"},
    ]
)
print(response.content[0].text)

POSITIVE


# Error Handling & Retry Logic

Production applications need robust error handling to deal with API failures, rate limits, and network issues. Let's learn how to build resilient applications with Claude.

## Common Error Types

The Anthropic API can return several types of errors:

1. **APIError**: General API errors (500+ status codes)
2. **RateLimitError**: Too many requests (429 status code)
3. **APIConnectionError**: Network connectivity issues
4. **AuthenticationError**: Invalid API key (401 status code)
5. **BadRequestError**: Invalid request parameters (400 status code)

## Basic Error Handling

Let's start with basic try-except error handling:

In [None]:
from anthropic import APIError, RateLimitError, APIConnectionError, AuthenticationError, BadRequestError

def safe_api_call(prompt, model="claude-3-5-haiku-20241022", max_tokens=500):
    """
    Make an API call with basic error handling.
    """
    try:
        response = client.messages.create(
            model=model,
            max_tokens=max_tokens,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text, None
    
    except RateLimitError as e:
        error_msg = f"‚ö†Ô∏è Rate limit exceeded. Please wait before making more requests."
        print(error_msg)
        return None, error_msg
    
    except AuthenticationError as e:
        error_msg = f"üîí Authentication failed. Check your API key."
        print(error_msg)
        return None, error_msg
    
    except BadRequestError as e:
        error_msg = f"‚ùå Bad request: {str(e)}"
        print(error_msg)
        return None, error_msg
    
    except APIConnectionError as e:
        error_msg = f"üåê Connection error. Check your internet connection."
        print(error_msg)
        return None, error_msg
    
    except APIError as e:
        error_msg = f"‚ö° API error: {str(e)}"
        print(error_msg)
        return None, error_msg
    
    except Exception as e:
        error_msg = f"‚ùó Unexpected error: {str(e)}"
        print(error_msg)
        return None, error_msg

# Test the function
result, error = safe_api_call("What is the capital of France?")
if result:
    print(f"‚úÖ Success: {result}")
else:
    print(f"Failed with error: {error}")

## Retry Logic with Exponential Backoff

When rate limits or temporary errors occur, it's best to retry with exponential backoff. This means waiting progressively longer between retries.

### Why Exponential Backoff?
- **Prevents overwhelming the API** during high-traffic periods
- **Gives time for rate limits to reset**
- **Increases success rate** for transient errors
- **Standard practice** recommended by Anthropic

Let's implement a robust retry function:

In [None]:
import time
from typing import Optional, Tuple

def call_claude_with_retry(
    prompt: str,
    model: str = "claude-3-5-haiku-20241022",
    max_tokens: int = 500,
    max_retries: int = 3,
    initial_delay: float = 1.0
) -> Tuple[Optional[str], Optional[str]]:
    """
    Call Claude API with exponential backoff retry logic.
    
    Args:
        prompt: The user prompt
        model: Model to use
        max_tokens: Maximum tokens in response
        max_retries: Maximum number of retry attempts
        initial_delay: Initial delay in seconds (doubles with each retry)
    
    Returns:
        Tuple of (response_text, error_message)
    """
    delay = initial_delay
    
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model=model,
                max_tokens=max_tokens,
                messages=[{"role": "user", "content": prompt}]
            )
            
            # Success!
            if attempt > 0:
                print(f"‚úÖ Success after {attempt + 1} attempts")
            
            return response.content[0].text, None
        
        except RateLimitError as e:
            if attempt < max_retries - 1:
                print(f"‚ö†Ô∏è Rate limit hit. Waiting {delay:.1f}s before retry {attempt + 2}/{max_retries}...")
                time.sleep(delay)
                delay *= 2  # Exponential backoff
            else:
                error_msg = f"‚ùå Rate limit exceeded after {max_retries} attempts"
                print(error_msg)
                return None, error_msg
        
        except APIConnectionError as e:
            if attempt < max_retries - 1:
                print(f"üåê Connection error. Retrying in {delay:.1f}s... ({attempt + 2}/{max_retries})")
                time.sleep(delay)
                delay *= 2
            else:
                error_msg = f"‚ùå Connection failed after {max_retries} attempts"
                print(error_msg)
                return None, error_msg
        
        except (AuthenticationError, BadRequestError) as e:
            # Don't retry these - they won't succeed on retry
            error_msg = f"‚ùå Non-retryable error: {str(e)}"
            print(error_msg)
            return None, error_msg
        
        except APIError as e:
            if attempt < max_retries - 1:
                print(f"‚ö° API error. Retrying in {delay:.1f}s... ({attempt + 2}/{max_retries})")
                time.sleep(delay)
                delay *= 2
            else:
                error_msg = f"‚ùå API error after {max_retries} attempts: {str(e)}"
                print(error_msg)
                return None, error_msg
    
    return None, "Max retries exceeded"

# Test the retry function
print("Testing retry logic with a normal request:\n")
result, error = call_claude_with_retry("Explain recursion in one sentence.")

if result:
    print(f"\nüìù Response: {result}")
else:
    print(f"\n‚ùå Failed: {error}")

## Practical Example: Batch Processing with Error Handling

Let's combine error handling and retry logic for a real-world use case - processing multiple prompts:

In [None]:
def batch_process_prompts(prompts: list, delay_between_requests: float = 0.5):
    """
    Process multiple prompts with error handling and rate limiting.
    
    Args:
        prompts: List of prompt strings
        delay_between_requests: Delay between successful requests (rate limiting)
    
    Returns:
        List of results (each is a dict with prompt, response, and error)
    """
    results = []
    
    print(f"üìä Processing {len(prompts)} prompts...\n")
    print("=" * 80)
    
    for i, prompt in enumerate(prompts, 1):
        print(f"\n[{i}/{len(prompts)}] Processing: '{prompt[:50]}...'")
        
        response, error = call_claude_with_retry(
            prompt=prompt,
            max_tokens=100,
            max_retries=3
        )
        
        results.append({
            "prompt": prompt,
            "response": response,
            "error": error,
            "success": response is not None
        })
        
        # Rate limiting: wait between successful requests
        if response and i < len(prompts):
            time.sleep(delay_between_requests)
    
    print("\n" + "=" * 80)
    print(f"\n‚úÖ Completed: {sum(1 for r in results if r['success'])}/{len(prompts)} successful")
    
    return results

# Example batch processing
prompts = [
    "What is Python?",
    "What is JavaScript?",
    "What is Java?",
    "What is C++?"
]

results = batch_process_prompts(prompts, delay_between_requests=0.3)

# Display results
print("\nüìã Results Summary:")
print("=" * 80)
for i, result in enumerate(results, 1):
    if result['success']:
        print(f"\n{i}. ‚úÖ {result['prompt']}")
        print(f"   Response: {result['response'][:100]}...")
    else:
        print(f"\n{i}. ‚ùå {result['prompt']}")
        print(f"   Error: {result['error']}")

# Multi-turn Conversations

Building conversational applications requires managing conversation history effectively. Claude can maintain context across multiple turns by passing the full message history.

## Why Multi-turn Conversations?

- **Context preservation**: Claude remembers earlier parts of the conversation
- **Natural dialogue**: Enables back-and-forth interactions
- **Follow-up questions**: Users can ask clarifying questions
- **Personalization**: Build rapport and adapt to user needs

## Basic Multi-turn Example

In [None]:
# Start a conversation
conversation_history = []

def chat(user_message, system_prompt=None):
    """
    Send a message and maintain conversation history.
    """
    # Add user message to history
    conversation_history.append({
        "role": "user",
        "content": user_message
    })
    
    # Make API call with full history
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=500,
        system=system_prompt if system_prompt else "You are a helpful assistant.",
        messages=conversation_history
    )
    
    # Add assistant response to history
    assistant_message = response.content[0].text
    conversation_history.append({
        "role": "assistant",
        "content": assistant_message
    })
    
    return assistant_message

# Example conversation
print("üó£Ô∏è Multi-turn Conversation Example")
print("=" * 80 + "\n")

# Turn 1
print("üë§ User: What's the capital of France?")
response1 = chat("What's the capital of France?")
print(f"ü§ñ Claude: {response1}\n")

# Turn 2 - Claude remembers context
print("üë§ User: What's the population?")
response2 = chat("What's the population?")
print(f"ü§ñ Claude: {response2}\n")

# Turn 3 - Still remembers
print("üë§ User: What are some famous landmarks there?")
response3 = chat("What are some famous landmarks there?")
print(f"ü§ñ Claude: {response3}\n")

print("=" * 80)
print(f"üìä Conversation has {len(conversation_history)} messages")

## Conversation Management Class

For production applications, it's useful to create a conversation manager:

In [None]:
class ConversationManager:
    """
    Manage multi-turn conversations with Claude.
    """
    def __init__(self, system_prompt=None, model="claude-3-5-haiku-20241022"):
        self.messages = []
        self.system_prompt = system_prompt
        self.model = model
        self.total_input_tokens = 0
        self.total_output_tokens = 0
    
    def send_message(self, user_message: str) -> str:
        """Send a message and get response."""
        # Add user message
        self.messages.append({
            "role": "user",
            "content": user_message
        })
        
        # Get response
        response = client.messages.create(
            model=self.model,
            max_tokens=500,
            system=self.system_prompt if self.system_prompt else "You are a helpful assistant.",
            messages=self.messages
        )
        
        # Track tokens
        self.total_input_tokens += response.usage.input_tokens
        self.total_output_tokens += response.usage.output_tokens
        
        # Add assistant response
        assistant_message = response.content[0].text
        self.messages.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message
    
    def get_history(self) -> list:
        """Get conversation history."""
        return self.messages
    
    def get_stats(self) -> dict:
        """Get conversation statistics."""
        return {
            "turns": len(self.messages) // 2,
            "total_messages": len(self.messages),
            "input_tokens": self.total_input_tokens,
            "output_tokens": self.total_output_tokens,
            "total_tokens": self.total_input_tokens + self.total_output_tokens
        }
    
    def reset(self):
        """Reset conversation."""
        self.messages = []
        self.total_input_tokens = 0
        self.total_output_tokens = 0

# Example: Create a technical support chatbot
support_bot = ConversationManager(
    system_prompt="""You are a friendly technical support agent for a software company.
    Be helpful, patient, and provide clear step-by-step instructions."""
)

print("ü§ñ Technical Support Chatbot")
print("=" * 80 + "\n")

# Simulate a support conversation
queries = [
    "My app keeps crashing when I try to export data",
    "I'm using version 2.1.5",
    "It happens right after I click the export button"
]

for query in queries:
    print(f"üë§ User: {query}")
    response = support_bot.send_message(query)
    print(f"ü§ñ Support: {response}\n")

# Show statistics
stats = support_bot.get_stats()
print("=" * 80)
print(f"üìä Conversation Stats:")
print(f"   Turns: {stats['turns']}")
print(f"   Total messages: {stats['total_messages']}")
print(f"   Total tokens: {stats['total_tokens']:,}")

# Stop Sequences

Stop sequences tell Claude when to stop generating text. They're useful for:
- Controlling output format
- Implementing custom delimiters
- Creating structured outputs
- Preventing over-generation

## How Stop Sequences Work

When Claude encounters a stop sequence in its generation, it immediately stops and returns the text generated up to (but not including) that sequence.

## Basic Stop Sequence Example

In [None]:
# Example 1: Stop at a specific marker
print("üõë Stop Sequence Example 1: Custom Delimiter")
print("=" * 80 + "\n")

response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=500,
    stop_sequences=["###"],
    messages=[{
        "role": "user",
        "content": "Write a short story about a robot. End with ###"
    }]
)

print(f"Response: {response.content[0].text}")
print(f"\nStop reason: {response.stop_reason}")
print(f"Stop sequence: {response.stop_sequence}")

print("\n" + "=" * 80 + "\n")

# Example 2: Multiple stop sequences
print("üõë Stop Sequence Example 2: Multiple Delimiters")
print("=" * 80 + "\n")

response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=500,
    stop_sequences=["DONE", "END", "FINISHED"],
    messages=[{
        "role": "user",
        "content": "List 3 programming languages, one per line. After the list, write DONE."
    }]
)

print(f"Response: {response.content[0].text}")
print(f"\nStop reason: {response.stop_reason}")
print(f"Stop sequence triggered: {response.stop_sequence}")

## Practical Use Cases for Stop Sequences

### 1. Structured Data Extraction

In [None]:
# Extract structured data with stop sequences
print("üìã Use Case: Structured Data Extraction")
print("=" * 80 + "\n")

product_text = """
Extract product information from this text and format as:
Product: [name]
Price: [price]
Rating: [rating]
END_EXTRACTION

Text: "The UltraWidget 3000 is available for $299.99 and has a 4.5-star rating."
"""

response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=200,
    stop_sequences=["END_EXTRACTION"],
    messages=[{"role": "user", "content": product_text}]
)

print(response.content[0].text)
print("\n‚úÖ Stopped cleanly at delimiter")

### 2. Dialogue Systems

In [None]:
# Use stop sequences for dialogue formatting
print("üí¨ Use Case: Dialogue System")
print("=" * 80 + "\n")

response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=300,
    stop_sequences=["\nUser:", "\nHuman:"],
    system="You are an AI assistant in a chat interface. Always end your responses naturally.",
    messages=[{
        "role": "user",
        "content": "Tell me about Python programming."
    }]
)

print(f"Assistant: {response.content[0].text}")
print("\n‚úÖ Prevents generating fake user messages")

### 3. Code Generation with Boundaries

In [None]:
# Generate code with clear boundaries
print("üíª Use Case: Code Generation")
print("=" * 80 + "\n")

response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=400,
    stop_sequences=["```\n\n", "# End of code"],
    messages=[{
        "role": "user",
        "content": """Write a Python function to calculate factorial. 
        Format it in a code block and add a comment '# End of code' after the function."""
    }]
)

print(response.content[0].text)
print("\n‚úÖ Clean code extraction")

# Streaming Responses

Streaming allows you to receive Claude's response incrementally as it's generated, rather than waiting for the complete response. This dramatically improves perceived performance for users.

## Why Use Streaming?

- **Better UX**: Users see output immediately
- **Lower perceived latency**: Feels faster even if total time is the same
- **Progressive rendering**: Display content as it arrives
- **Early cancellation**: Users can stop generation if not needed

## Basic Streaming Example

In [None]:
import sys

# Simple streaming example
print("üåä Streaming Response Example")
print("=" * 80 + "\n")

with client.messages.stream(
    model="claude-3-5-haiku-20241022",
    max_tokens=300,
    messages=[{
        "role": "user",
        "content": "Write a haiku about programming."
    }]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print("\n\n" + "=" * 80)
print("‚úÖ Response streamed in real-time!")

## Advanced Streaming with Event Handling

In [None]:
# Advanced streaming with full event access
print("üîÑ Advanced Streaming with Events")
print("=" * 80 + "\n")

full_response = ""

with client.messages.stream(
    model="claude-3-5-haiku-20241022",
    max_tokens=500,
    messages=[{
        "role": "user",
        "content": "Explain the concept of streaming in 2 sentences."
    }]
) as stream:
    # Access different event types
    for event in stream:
        if event.type == "content_block_delta":
            # Text is being generated
            if hasattr(event.delta, 'text'):
                text_chunk = event.delta.text
                full_response += text_chunk
                print(text_chunk, end="", flush=True)
        
        elif event.type == "message_start":
            # Message generation started
            pass
        
        elif event.type == "message_stop":
            # Message generation completed
            print("\n\n‚úÖ Stream completed")

print("\n" + "=" * 80)
print(f"üìù Full response captured: {len(full_response)} characters")

## Streaming with Token Tracking

In [None]:
# Stream and collect usage statistics
print("üìä Streaming with Token Tracking")
print("=" * 80 + "\n")

with client.messages.stream(
    model="claude-3-5-haiku-20241022",
    max_tokens=400,
    messages=[{
        "role": "user",
        "content": "List 5 benefits of using AI in software development."
    }]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    
    # Get the final message with usage statistics
    final_message = stream.get_final_message()

print("\n\n" + "=" * 80)
print("üí∞ Usage Statistics:")
print(f"   Input tokens:  {final_message.usage.input_tokens}")
print(f"   Output tokens: {final_message.usage.output_tokens}")
print(f"   Total tokens:  {final_message.usage.input_tokens + final_message.usage.output_tokens}")
print(f"   Stop reason:   {final_message.stop_reason}")

## Streaming Helper Function

In [None]:
def stream_response(prompt: str, model="claude-3-5-haiku-20241022", max_tokens=1000, 
                   show_stats=True):
    """
    Stream a response and optionally show statistics.
    
    Args:
        prompt: User prompt
        model: Model to use
        max_tokens: Maximum tokens
        show_stats: Whether to show usage statistics
    
    Returns:
        Full response text
    """
    full_response = ""
    
    with client.messages.stream(
        model=model,
        max_tokens=max_tokens,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            full_response += text
            print(text, end="", flush=True)
        
        final_message = stream.get_final_message()
    
    if show_stats:
        print(f"\n\nüìä Tokens: {final_message.usage.input_tokens} in / "
              f"{final_message.usage.output_tokens} out")
    
    return full_response

# Test the helper
print("üéØ Streaming Helper Function Test")
print("=" * 80 + "\n")

response = stream_response(
    "Write a motivational quote about learning.",
    max_tokens=100
)

# Prompt Templates & Reusable Patterns

Building reusable prompt templates helps maintain consistency and makes your code more maintainable. This section covers practical patterns for production applications.

## Why Use Templates?

- **Consistency**: Ensure similar tasks use similar prompts
- **Maintainability**: Update prompts in one place
- **Testability**: Easy to A/B test different prompt versions
- **Scalability**: Share prompts across your team

## Basic Template Pattern

In [None]:
# Simple string formatting template
SUMMARIZATION_TEMPLATE = """Please summarize the following text in {num_sentences} sentences.
Focus on the {focus_area}.

Text to summarize:
{text}

Summary:"""

def summarize_text(text: str, num_sentences: int = 3, focus_area: str = "main points"):
    """Summarize text using a template."""
    prompt = SUMMARIZATION_TEMPLATE.format(
        num_sentences=num_sentences,
        focus_area=focus_area,
        text=text
    )
    
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.content[0].text

# Test the template
sample_text = """
Artificial intelligence is transforming software development through automated code generation,
intelligent debugging tools, and predictive analytics. Machine learning models can now suggest
code completions, identify potential bugs before they occur, and even write entire functions
based on natural language descriptions. This technology is making developers more productive
and helping them focus on higher-level design decisions.
"""

print("üìù Template-based Summarization")
print("=" * 80 + "\n")

summary = summarize_text(sample_text, num_sentences=2, focus_area="practical benefits")
print(f"Summary: {summary}")

## Class-based Template System

In [None]:
class PromptTemplate:
    """Reusable prompt template with variable substitution."""
    
    def __init__(self, template: str, system_prompt: str = None):
        self.template = template
        self.system_prompt = system_prompt
    
    def format(self, **kwargs) -> str:
        """Format template with provided variables."""
        return self.template.format(**kwargs)
    
    def execute(self, model="claude-3-5-haiku-20241022", max_tokens=500, **kwargs):
        """Format and execute the template."""
        prompt = self.format(**kwargs)
        
        response = client.messages.create(
            model=model,
            max_tokens=max_tokens,
            system=self.system_prompt,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.content[0].text

# Create templates for different tasks
code_review_template = PromptTemplate(
    template="""Review the following {language} code and provide feedback on:
1. Code quality and best practices
2. Potential bugs or issues
3. Suggestions for improvement

Code:
```{language}
{code}
```

Review:""",
    system_prompt="You are an experienced software engineer conducting a code review."
)

translation_template = PromptTemplate(
    template="Translate the following text from {source_lang} to {target_lang}:\n\n{text}",
    system_prompt="You are a professional translator. Provide accurate, natural-sounding translations."
)

# Test code review template
print("üîç Code Review Template Example")
print("=" * 80 + "\n")

code_sample = """
def calculate_avg(numbers):
    total = 0
    for num in numbers:
        total = total + num
    return total / len(numbers)
"""

review = code_review_template.execute(
    language="Python",
    code=code_sample,
    max_tokens=400
)

print(review)

print("\n" + "=" * 80 + "\n")

# Test translation template
print("üåç Translation Template Example")
print("=" * 80 + "\n")

translation = translation_template.execute(
    source_lang="English",
    target_lang="Spanish",
    text="Hello, how are you today?",
    max_tokens=100
)

print(f"Translation: {translation}")

## Common Prompt Patterns

Here are some battle-tested prompt patterns you can adapt:

In [None]:
# Collection of reusable prompt patterns
PROMPT_PATTERNS = {
    "extraction": """Extract the following information from the text:
{fields}

Text:
{text}

Output format:
{format_spec}""",
    
    "classification": """Classify the following {item_type} into one of these categories:
{categories}

{item_type}: {content}

Category:""",
    
    "analysis": """Analyze the following {subject} and provide insights on:
{analysis_points}

{subject}:
{content}

Analysis:""",
    
    "generation": """Generate {count} {item_type} that meet these requirements:
{requirements}

Format: {format_spec}

Output:""",
    
    "comparison": """Compare {item_a} and {item_b} across these dimensions:
{dimensions}

{item_a}:
{content_a}

{item_b}:
{content_b}

Comparison:"""
}

# Example: Using the extraction pattern
def extract_contact_info(text: str):
    """Extract contact information using a template pattern."""
    prompt = PROMPT_PATTERNS["extraction"].format(
        fields="- Name\n- Email\n- Phone number",
        text=text,
        format_spec="One item per line in 'Field: Value' format"
    )
    
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=200,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.content[0].text

# Test extraction
contact_text = "Please contact John Smith at john.smith@email.com or call (555) 123-4567 for more information."

print("üìã Extraction Pattern Example")
print("=" * 80 + "\n")
print(extract_contact_info(contact_text))

# JSON Mode & Structured Outputs

Getting structured data from Claude is essential for programmatic use. This section covers techniques for reliable JSON generation and parsing.

## Why Structured Outputs?

- **Programmatic processing**: Easy to parse and use in code
- **Type safety**: Define expected structure
- **Integration**: Works with APIs and databases
- **Consistency**: Same format every time

## Basic JSON Generation

In [None]:
import json

# Request JSON output explicitly
print("üìä Basic JSON Generation")
print("=" * 80 + "\n")

response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=500,
    messages=[{
        "role": "user",
        "content": """Extract information about this product and return ONLY valid JSON:

Product: "iPhone 15 Pro - 256GB, Titanium Blue, with A17 Pro chip and 48MP camera"

Format:
{
  "name": "product name",
  "storage": "storage capacity",
  "color": "color",
  "processor": "processor",
  "camera": "camera specs"
}"""
    }]
)

json_text = response.content[0].text.strip()
print(f"Raw response:\n{json_text}\n")

# Parse JSON
try:
    data = json.loads(json_text)
    print("‚úÖ Successfully parsed JSON:")
    print(json.dumps(data, indent=2))
except json.JSONDecodeError as e:
    print(f"‚ùå JSON parsing error: {e}")

## Structured Data Extraction Helper

In [None]:
import re

def extract_json_from_response(response_text: str) -> dict:
    """
    Extract and parse JSON from Claude's response.
    Handles cases where JSON is wrapped in markdown code blocks.
    """
    # Try direct parsing first
    try:
        return json.loads(response_text.strip())
    except json.JSONDecodeError:
        pass
    
    # Try extracting from code block
    json_match = re.search(r'```(?:json)?\n(.+?)\n```', response_text, re.DOTALL)
    if json_match:
        try:
            return json.loads(json_match.group(1))
        except json.JSONDecodeError:
            pass
    
    # Try finding JSON object
    json_match = re.search(r'\{.+\}', response_text, re.DOTALL)
    if json_match:
        try:
            return json.loads(json_match.group(0))
        except json.JSONDecodeError:
            pass
    
    raise ValueError("No valid JSON found in response")

def get_structured_output(prompt: str, schema_description: str = None) -> dict:
    """
    Get structured JSON output from Claude.
    
    Args:
        prompt: The task description
        schema_description: Optional JSON schema description
    
    Returns:
        Parsed JSON dictionary
    """
    full_prompt = f"""{prompt}

{f'Required JSON schema:\n{schema_description}\n' if schema_description else ''}
Respond with ONLY valid JSON, no additional text or markdown formatting."""
    
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=1000,
        messages=[{"role": "user", "content": full_prompt}]
    )
    
    response_text = response.content[0].text
    return extract_json_from_response(response_text)

# Test the helper
print("üîß Structured Output Helper Example")
print("=" * 80 + "\n")

schema = """{
  "title": "string",
  "author": "string",
  "year": number,
  "genre": "string",
  "summary": "string"
}"""

result = get_structured_output(
    prompt="Extract book information: 'To Kill a Mockingbird by Harper Lee (1960) is a classic American novel about racial injustice.'",
    schema_description=schema
)

print("‚úÖ Extracted structured data:")
print(json.dumps(result, indent=2))

## Complex Structured Outputs

In [None]:
# Generate complex nested JSON
print("üèóÔ∏è Complex Structured Output Example")
print("=" * 80 + "\n")

schema = """{
  "company": {
    "name": "string",
    "founded": number,
    "employees": number
  },
  "products": [
    {
      "name": "string",
      "category": "string",
      "price": number
    }
  ],
  "headquarters": {
    "city": "string",
    "country": "string"
  }
}"""

response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=800,
    messages=[{
        "role": "user",
        "content": f"""Create a fictional tech company profile with this schema:

{schema}

Include 3 products. Return ONLY the JSON object."""
    }]
)

try:
    company_data = extract_json_from_response(response.content[0].text)
    print("‚úÖ Generated complex structured data:")
    print(json.dumps(company_data, indent=2))
    
    # Validate structure
    assert "company" in company_data
    assert "products" in company_data
    assert len(company_data["products"]) == 3
    print("\n‚úÖ Structure validation passed")
    
except (json.JSONDecodeError, AssertionError, ValueError) as e:
    print(f"‚ùå Error: {e}")

# Advanced Use Cases

This section demonstrates real-world applications combining multiple techniques we've learned.

## Use Case 1: Content Moderation System

In [None]:
class ContentModerator:
    """
    Advanced content moderation system with confidence scores.
    Combines: Templates, JSON outputs, error handling
    """
    
    MODERATION_PROMPT = """Analyze this content for policy violations:

Content: {content}

Check for:
- Spam or promotional content
- Hate speech or harassment
- Explicit or inappropriate material
- Misinformation
- Violence or threats

Return JSON:
{{
  "is_safe": boolean,
  "confidence": number (0-1),
  "violations": ["list of violations if any"],
  "reason": "brief explanation",
  "recommended_action": "approve|review|reject"
}}"""
    
    def __init__(self):
        self.moderation_count = 0
    
    def moderate(self, content: str) -> dict:
        """Moderate content and return structured decision."""
        self.moderation_count += 1
        
        try:
            prompt = self.MODERATION_PROMPT.format(content=content)
            
            response = client.messages.create(
                model="claude-3-5-haiku-20241022",
                max_tokens=500,
                temperature=0.0,  # Deterministic for consistency
                system="You are a content moderator. Be consistent and fair in your assessments.",
                messages=[{"role": "user", "content": prompt}]
            )
            
            result = extract_json_from_response(response.content[0].text)
            result["moderation_id"] = self.moderation_count
            return result
            
        except Exception as e:
            return {
                "is_safe": False,
                "confidence": 0.0,
                "violations": [],
                "reason": f"Error during moderation: {str(e)}",
                "recommended_action": "review",
                "moderation_id": self.moderation_count
            }

# Test the moderator
moderator = ContentModerator()

test_contents = [
    "Check out my new blog post about Python programming!",
    "This product is amazing! Buy now at scamsite.com for 90% off!!!",
    "I really enjoyed this movie, the acting was superb."
]

print("üõ°Ô∏è Content Moderation System")
print("=" * 80 + "\n")

for i, content in enumerate(test_contents, 1):
    print(f"[{i}] Moderating: \"{content[:50]}...\"")
    result = moderator.moderate(content)
    print(f"    Decision: {result['recommended_action'].upper()}")
    print(f"    Safe: {result['is_safe']} (confidence: {result['confidence']})")
    print(f"    Reason: {result['reason']}\n")

## Use Case 2: Intelligent Document Processor

In [None]:
class DocumentProcessor:
    """
    Process documents with multiple operations.
    Combines: Multi-turn conversations, streaming, cost tracking
    """
    
    def __init__(self):
        self.conversation = ConversationManager(
            system_prompt="You are a document analysis assistant.",
            model="claude-3-5-haiku-20241022"
        )
    
    def process_document(self, document_text: str):
        """Process a document with multiple analysis steps."""
        print("üìÑ Document Processing Pipeline")
        print("=" * 80 + "\n")
        
        # Step 1: Summarization
        print("Step 1: Generating summary...")
        summary = self.conversation.send_message(
            f"Summarize this document in 2-3 sentences:\n\n{document_text}"
        )
        print(f"‚úÖ Summary: {summary}\n")
        
        # Step 2: Key points (Claude remembers the document)
        print("Step 2: Extracting key points...")
        key_points = self.conversation.send_message(
            "Extract the 3 most important points from this document."
        )
        print(f"‚úÖ Key Points:\n{key_points}\n")
        
        # Step 3: Action items (still remembers context)
        print("Step 3: Identifying action items...")
        actions = self.conversation.send_message(
            "Are there any action items or tasks mentioned?"
        )
        print(f"‚úÖ Actions:\n{actions}\n")
        
        # Show stats
        stats = self.conversation.get_stats()
        print("=" * 80)
        print(f"üìä Processing Stats:")
        print(f"   Conversation turns: {stats['turns']}")
        print(f"   Total tokens: {stats['total_tokens']:,}")
        
        return {
            "summary": summary,
            "key_points": key_points,
            "actions": actions,
            "stats": stats
        }

# Test document processor
processor = DocumentProcessor()

sample_document = """
Project Update: Q4 2024

The development team has completed the new user authentication system ahead of schedule.
We need to conduct security testing before the December 15th release. The QA team should 
prioritize testing the password reset flow and two-factor authentication. Marketing has 
requested updated documentation by December 1st for the product launch.

Budget: The project is 15% under budget, saving $50,000. Remaining funds could be allocated
to the mobile app development starting in January.
"""

result = processor.process_document(sample_document)

## Use Case 3: Code Review Assistant

In [None]:
def code_review_assistant(code: str, language: str = "python") -> dict:
    """
    Comprehensive code review with structured feedback.
    Combines: JSON outputs, templates, best practices
    """
    
    prompt = f"""Review this {language} code and provide structured feedback.

Code:
```{language}
{code}
```

Return JSON with:
{{
  "overall_quality": "poor|fair|good|excellent",
  "score": number (1-10),
  "strengths": ["list of good aspects"],
  "issues": [
    {{
      "severity": "low|medium|high",
      "category": "bug|style|performance|security",
      "description": "issue description",
      "suggestion": "how to fix"
    }}
  ],
  "recommendations": ["list of improvements"]
}}"""
    
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=1500,
        temperature=0.2,
        system="You are an experienced software engineer. Provide constructive, specific feedback.",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return extract_json_from_response(response.content[0].text)

# Test code review
test_code = """
def process_users(users):
    results = []
    for user in users:
        if user['age'] > 18:
            results.append(user['name'].upper())
    return results
"""

print("üë®‚Äçüíª Code Review Assistant")
print("=" * 80 + "\n")

review = code_review_assistant(test_code)

print(f"üìä Overall Quality: {review['overall_quality'].upper()} (Score: {review['score']}/10)\n")

print("‚úÖ Strengths:")
for strength in review['strengths']:
    print(f"   ‚Ä¢ {strength}")

print(f"\n‚ö†Ô∏è Issues Found: {len(review['issues'])}")
for i, issue in enumerate(review['issues'], 1):
    print(f"\n   [{i}] {issue['severity'].upper()} - {issue['category']}")
    print(f"       {issue['description']}")
    print(f"       üí° Suggestion: {issue['suggestion']}")

print(f"\nüìù Recommendations:")
for rec in review['recommendations']:
    print(f"   ‚Ä¢ {rec}")

## Use Case 4: Smart Customer Support Bot

In [None]:
class SupportBot:
    """
    Intelligent customer support with conversation management and analytics.
    Combines: Multi-turn conversations, classification, streaming
    """
    
    def __init__(self):
        self.sessions = {}
    
    def create_session(self, customer_id: str):
        """Create a new support session."""
        self.sessions[customer_id] = {
            "conversation": ConversationManager(
                system_prompt="""You are a friendly customer support agent.
                - Be empathetic and patient
                - Ask clarifying questions
                - Provide step-by-step solutions
                - Escalate complex issues when needed""",
                model="claude-3-5-haiku-20241022"
            ),
            "issue_category": None,
            "resolved": False
        }
    
    def classify_issue(self, message: str) -> str:
        """Classify the customer's issue."""
        response = client.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=50,
            temperature=0.0,
            messages=[{
                "role": "user",
                "content": f"""Classify this support request into ONE category:
                
Categories: technical|billing|account|shipping|product

Message: "{message}"

Category:"""
            }]
        )
        
        return response.content[0].text.strip().lower()
    
    def handle_message(self, customer_id: str, message: str) -> dict:
        """Handle a customer message."""
        if customer_id not in self.sessions:
            self.create_session(customer_id)
        
        session = self.sessions[customer_id]
        
        # Classify first message
        if session["conversation"].get_stats()["turns"] == 0:
            session["issue_category"] = self.classify_issue(message)
        
        # Get response
        response = session["conversation"].send_message(message)
        
        return {
            "response": response,
            "category": session["issue_category"],
            "turn": session["conversation"].get_stats()["turns"]
        }

# Test the support bot
bot = SupportBot()
customer_id = "CUST001"

print("üéß Smart Customer Support Bot")
print("=" * 80 + "\n")

messages = [
    "Hi, I can't log into my account. It says my password is incorrect.",
    "Yes, I tried resetting it but I didn't receive the email.",
    "I'll check my spam folder. Thanks!"
]

for msg in messages:
    print(f"üë§ Customer: {msg}")
    result = bot.handle_message(customer_id, msg)
    print(f"ü§ñ Support [{result['category']}]: {result['response']}\n")

stats = bot.sessions[customer_id]["conversation"].get_stats()
print("=" * 80)
print(f"üìä Session Stats: {stats['turns']} turns, {stats['total_tokens']} tokens")

## Key Takeaways from Advanced Use Cases

### 1. **Content Moderation System**
- **Techniques**: Templates, JSON outputs, error handling, deterministic temperature
- **Key Features**: Confidence scores, structured decisions, fallback handling
- **Production Tips**: Use temperature=0.0 for consistency, always handle exceptions

### 2. **Document Processor**
- **Techniques**: Multi-turn conversations, context preservation, token tracking
- **Key Features**: Sequential analysis, conversation memory, statistics
- **Production Tips**: Break complex tasks into steps, track cumulative costs

### 3. **Code Review Assistant**
- **Techniques**: Structured outputs, severity categorization, actionable feedback
- **Key Features**: Multi-level analysis, specific suggestions, quality scores
- **Production Tips**: Clear schema definitions, specific feedback categories

### 4. **Customer Support Bot**
- **Techniques**: Session management, classification, multi-turn dialogue
- **Key Features**: Issue categorization, conversation tracking, per-customer sessions
- **Production Tips**: Classify early, maintain conversation context, track metrics

## Common Patterns Across Use Cases

‚úÖ **Error Handling**: Every system includes robust try-except blocks  
‚úÖ **Structured Outputs**: JSON for programmatic processing  
‚úÖ **Cost Tracking**: Monitor token usage for budgeting  
‚úÖ **Temperature Control**: Use 0.0-0.3 for consistent, deterministic tasks  
‚úÖ **System Prompts**: Define clear roles and expectations  
‚úÖ **Conversation Management**: Maintain context for multi-turn interactions

## Building Your Own Applications

When building production applications with Claude:

1. **Start Simple**: Begin with basic API calls, add complexity gradually
2. **Test Extensively**: Edge cases, error conditions, various inputs
3. **Monitor Performance**: Track latency, costs, success rates
4. **Iterate**: Refine prompts based on real-world usage
5. **Handle Failures**: Always have fallback logic
6. **Track Costs**: Monitor token usage to stay within budget
7. **Optimize**: Use Haiku for simple tasks, upgrade when needed

---

**Congratulations!** üéâ You now have a comprehensive understanding of the Anthropic Claude API and production-ready patterns for building AI applications.

## Best Practices for Structured Outputs

### ‚úÖ DO:
- **Be explicit**: "Respond with ONLY valid JSON"
- **Provide schemas**: Show the exact structure you want
- **Give examples**: Include sample JSON in your prompt
- **Validate outputs**: Always parse and validate the JSON
- **Handle errors gracefully**: Use try-except for parsing
- **Use stop sequences**: Can help ensure clean JSON generation

### ‚ùå DON'T:
- **Assume perfect JSON**: Always validate
- **Use complex nested structures**: Keep it simple when possible
- **Forget to handle code blocks**: Claude sometimes wraps JSON in markdown
- **Skip schema definition**: Be explicit about structure

## Tips for Reliable JSON

1. **Explicit Instructions**
```python
prompt = """Return your response as valid JSON only.
No additional text, no markdown, just the JSON object."""
```

2. **Few-shot Examples**
```python
prompt = """Example format:
{"name": "John", "age": 30}

Now extract: "Jane is 25 years old"
"""
```

3. **Use Stop Sequences**
```python
stop_sequences=["}\n\n", "```"]  # Stop after JSON ends
```

4. **Validate with Schemas** (using libraries like `pydantic` or `jsonschema`)

---

**Summary**: Structured outputs enable programmatic use of Claude's responses. Use clear schemas, validate outputs, and handle parsing errors gracefully.

## Template Best Practices

### ‚úÖ DO:
- **Use clear variable names** (`{user_name}` not `{x}`)
- **Include examples in templates** when format matters
- **Version your templates** for tracking changes
- **Test templates with edge cases**
- **Document template variables** and their expected formats
- **Use XML tags for structure**: `<context>`, `<task>`, `<format>`

### ‚ùå DON'T:
- **Hardcode values** that might change
- **Make templates too rigid** (allow flexibility)
- **Forget to sanitize user input** in templates
- **Create overly complex templates** (split into smaller ones)

## Template Organization

```python
# Organize templates by domain
class EmailTemplates:
    CUSTOMER_SUPPORT = PromptTemplate(...)
    MARKETING = PromptTemplate(...)
    INTERNAL = PromptTemplate(...)

class ContentTemplates:
    BLOG_POST = PromptTemplate(...)
    SOCIAL_MEDIA = PromptTemplate(...)
    DOCUMENTATION = PromptTemplate(...)
```

---

**Summary**: Templates make your prompts reusable, testable, and maintainable. Invest time in building a good template library for your common use cases.

## When to Use Streaming vs Non-Streaming

| Use Streaming | Use Non-Streaming |
|---------------|-------------------|
| Interactive chat interfaces | Batch processing |
| Long-form content generation | Short responses |
| Better user experience needed | Processing responses programmatically |
| Real-time feedback desired | Results needed all at once |
| Web/mobile applications | Background jobs |

## Streaming Best Practices

### ‚úÖ DO:
- **Use streaming for user-facing applications** (chat, content generation)
- **Handle stream interruptions gracefully** (network issues, cancellations)
- **Display loading indicators** before stream starts
- **Buffer text for smooth rendering** (avoid flickering)
- **Track tokens with `get_final_message()`**

### ‚ùå DON'T:
- **Use streaming for simple batch processing** (adds complexity)
- **Forget error handling** (streams can fail mid-generation)
- **Block the UI thread** during streaming (use async if needed)
- **Parse incomplete JSON/structured data** during streaming

## Streaming Event Types

```python
# Available event types:
- message_start: Message generation begins
- content_block_start: Content block starts
- content_block_delta: New content chunk (the actual text)
- content_block_stop: Content block ends
- message_delta: Message metadata update
- message_stop: Message complete
```

## Error Handling in Streaming

```python
try:
    with client.messages.stream(...) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
except APIError as e:
    print(f"\n‚ùå Stream error: {e}")
except KeyboardInterrupt:
    print("\n‚ö†Ô∏è Stream cancelled by user")
```

---

**Summary**: Streaming provides a superior user experience for interactive applications, allowing users to see results immediately rather than waiting for complete responses.

## Stop Sequence Best Practices

### ‚úÖ DO:
- **Use distinctive markers** (e.g., "###", "END_OUTPUT", "---")
- **Combine with prompts** that mention the stop sequence
- **Use multiple stop sequences** for flexibility
- **Test your stop sequences** to ensure they work as expected
- **Use them for structured outputs** (JSON, CSV, etc.)

### ‚ùå DON'T:
- **Use common words** as stop sequences (they might appear naturally)
- **Make them too complex** (simple markers work best)
- **Rely solely on stop sequences** for control (combine with good prompts)
- **Forget to handle** cases where stop sequence isn't triggered

## Stop Reasons

Claude's `stop_reason` field tells you why generation stopped:
- **`end_turn`**: Natural completion
- **`max_tokens`**: Hit the max_tokens limit
- **`stop_sequence`**: Hit one of your stop sequences

```python
if response.stop_reason == "stop_sequence":
    print(f"Stopped at: {response.stop_sequence}")
elif response.stop_reason == "max_tokens":
    print("Warning: Output truncated due to max_tokens")
```

---

**Summary**: Stop sequences give you precise control over Claude's output boundaries, making them invaluable for structured data extraction and preventing unwanted generation.

## Managing Long Conversations

As conversations grow, token usage increases. Here are strategies for managing long conversations:

### 1. **Token Limits**
- Claude models have 200K token context windows
- Input + output must fit within this limit
- Monitor conversation length

### 2. **Summarization Strategy**
```python
def summarize_conversation(conversation):
    """Summarize old messages to save tokens."""
    # Keep recent messages, summarize old ones
    if len(conversation) > 10:
        old_messages = conversation[:-6]  # Keep last 6 messages
        summary_prompt = "Summarize this conversation: " + str(old_messages)
        summary = client.messages.create(...)
        # Replace old messages with summary
```

### 3. **Sliding Window**
```python
def maintain_window(messages, max_messages=20):
    """Keep only recent messages."""
    if len(messages) > max_messages:
        return messages[-max_messages:]
    return messages
```

### 4. **Context Pruning**
- Remove less important messages
- Keep system prompt and recent context
- Preserve key information

---

**Best Practices for Multi-turn Conversations:**
- ‚úÖ Always pass the full conversation history
- ‚úÖ Track token usage to avoid limits
- ‚úÖ Implement conversation reset functionality
- ‚úÖ Use system prompts for consistent behavior
- ‚úÖ Consider summarization for very long conversations
- ‚ùå Don't exceed context window limits
- ‚ùå Don't lose important context from early in the conversation

# Prompt Engineering Best Practices Checklist

Use this comprehensive checklist when building applications with Claude to ensure you're following best practices.

## üéØ 1. Model Selection

- [ ] **Choose the right model for your task**
  - Use **Haiku** for simple, high-volume tasks (fastest, cheapest)
  - Use **Sonnet** for balanced performance (general purpose)
  - Use **Opus** for complex reasoning and analysis (most capable)

- [ ] **Consider context window requirements**
  - All models support 200K tokens
  - Plan for input + output tokens within limits

- [ ] **Test with cost-effective models first**
  - Start with Haiku for prototyping
  - Upgrade only if quality is insufficient

## üìù 2. Prompt Design

- [ ] **Write clear, specific prompts**
  - Be explicit about what you want
  - Provide context and constraints
  - Use concrete examples when possible

- [ ] **Use system prompts for role/behavior**
  - Define Claude's role or persona
  - Set high-level guidelines
  - Keep them concise (they count toward tokens)

- [ ] **Leverage few-shot learning**
  - Provide 2-5 examples for consistent formatting
  - Show Claude exactly how you want outputs
  - Use message history to demonstrate patterns

- [ ] **Structure complex prompts**
  - Break tasks into steps
  - Use XML tags for clarity: `<context>`, `<task>`, `<format>`
  - Number steps if order matters

## ‚öôÔ∏è 3. Parameter Configuration

- [ ] **Set appropriate temperature**
  - **0.0-0.3**: Factual, deterministic tasks
  - **0.4-0.7**: Balanced tasks
  - **0.8-1.0**: Creative tasks

- [ ] **Configure max_tokens**
  - Estimate based on expected output length
  - Add buffer for safety (20-30% extra)
  - Monitor actual usage to optimize

- [ ] **Consider top_p and top_k**
  - Use top_p for controlled randomness
  - Combine with temperature for fine-tuning

## üõ°Ô∏è 4. Error Handling & Reliability

- [ ] **Implement try-except blocks**
  - Catch specific error types
  - Handle each error appropriately
  - Log errors with context

- [ ] **Add retry logic with exponential backoff**
  - Start with 1s delay, double each time
  - Retry 3-5 times for transient errors
  - Don't retry auth/bad request errors

- [ ] **Implement rate limiting**
  - Add delays between batch requests (0.3-1s)
  - Track usage with response.usage
  - Monitor for rate limit errors

- [ ] **Handle timeouts gracefully**
  - Set reasonable timeout values
  - Provide fallback responses
  - Inform users of delays

## üí∞ 5. Cost Management

- [ ] **Track token usage**
  - Monitor input and output tokens
  - Use calculate_cost() for estimates
  - Log usage for analysis

- [ ] **Optimize prompts for cost**
  - Remove unnecessary verbosity
  - Reuse system prompts
  - Batch related requests

- [ ] **Use appropriate models**
  - Don't use Opus for simple tasks
  - Haiku is 12x cheaper than Opus
  - Cost scales with tokens and model tier

## üîí 6. Security & Privacy

- [ ] **Protect API keys**
  - Use environment variables
  - Never commit keys to version control
  - Rotate keys periodically

- [ ] **Sanitize user inputs**
  - Validate before sending to API
  - Remove sensitive information
  - Implement input length limits

- [ ] **Handle sensitive data appropriately**
  - Review Anthropic's data handling policies
  - Don't send PII unnecessarily
  - Consider data retention policies

## üß™ 7. Testing & Validation

- [ ] **Test with diverse inputs**
  - Edge cases and unusual inputs
  - Different languages if applicable
  - Various prompt phrasings

- [ ] **Validate outputs**
  - Check response format
  - Verify content accuracy
  - Test error scenarios

- [ ] **Monitor in production**
  - Track success rates
  - Monitor latency
  - Analyze token usage patterns

## üìä 8. Production Readiness

- [ ] **Implement logging**
  - Log all API calls with metadata
  - Include timestamps and user context
  - Track errors and retries

- [ ] **Add monitoring**
  - Set up alerts for failures
  - Monitor rate limits
  - Track cost trends

- [ ] **Plan for scaling**
  - Implement request queuing
  - Consider caching strategies
  - Load test before launch

- [ ] **Document your implementation**
  - Document prompt templates
  - Explain model selection rationale
  - Maintain error handling guides

## üé® 9. User Experience

- [ ] **Provide loading indicators**
  - Show progress for long requests
  - Set user expectations
  - Handle interruptions gracefully

- [ ] **Stream responses when possible**
  - For better perceived performance
  - Allow users to see progress
  - Enable early cancellation

- [ ] **Handle errors user-friendly**
  - Show helpful error messages
  - Suggest actions to resolve issues
  - Provide fallback options

## üîÑ 10. Maintenance & Iteration

- [ ] **Version your prompts**
  - Track prompt changes
  - A/B test improvements
  - Maintain prompt history

- [ ] **Stay updated on model changes**
  - Review Anthropic's changelogs
  - Test new models when released
  - Update model IDs as needed

- [ ] **Gather feedback**
  - Collect user feedback
  - Monitor output quality
  - Iterate on prompts

---

## Quick Reference Summary

| Aspect | Recommendation |
|--------|----------------|
| **Model** | Haiku for simple, Sonnet for balanced, Opus for complex |
| **Temperature** | 0.0-0.3 factual, 0.4-0.7 balanced, 0.8-1.0 creative |
| **Max Tokens** | Estimate + 20-30% buffer |
| **Retries** | 3-5 attempts with exponential backoff |
| **Rate Limiting** | 0.3-1s delay between requests |
| **Cost** | Haiku: $0.25/$1.25, Sonnet: $3/$15, Opus: $15/$75 per 1M tokens |

‚úÖ **Remember**: Start simple, test thoroughly, optimize iteratively!

---

## Additional Resources

- **Anthropic Documentation**: https://docs.anthropic.com/
- **Prompt Engineering Guide**: https://docs.anthropic.com/claude/docs/prompt-engineering
- **API Reference**: https://docs.anthropic.com/claude/reference/
- **Pricing**: https://www.anthropic.com/api

## Best Practices for Error Handling

### ‚úÖ DO:
- **Always wrap API calls in try-except blocks**
- **Use exponential backoff for rate limits** (start with 1s, double each time)
- **Implement retry logic for transient errors** (rate limits, connection issues, 5xx errors)
- **Log errors for debugging** (include timestamps and error details)
- **Set reasonable timeouts** to prevent hanging requests
- **Add delays between batch requests** to avoid rate limits

### ‚ùå DON'T:
- **Don't retry authentication errors** (401) - fix your API key instead
- **Don't retry bad request errors** (400) - fix your request parameters
- **Don't retry indefinitely** - set a max retry limit (3-5 is typical)
- **Don't ignore errors silently** - always log or handle them
- **Don't spam retries immediately** - use exponential backoff

## Rate Limit Guidelines

Anthropic's rate limits vary by tier and model:
- **Free tier**: Lower limits, suitable for development
- **Build tier**: Higher limits for production apps
- **Scale tier**: Custom limits for enterprise

### Rate Limiting Tips:
1. **Track your usage** with the `usage` field in responses
2. **Implement request queuing** for high-volume applications
3. **Use Haiku models** for higher throughput (they're faster and cheaper)
4. **Distribute load** across time if possible
5. **Monitor the `retry-after` header** when you hit rate limits

---

### Summary

Robust error handling is essential for production applications:
- Use try-except blocks to catch specific error types
- Implement exponential backoff for retries
- Don't retry non-transient errors (auth, bad requests)
- Add delays between batch requests
- Log errors for debugging

With proper error handling, your application will be resilient and provide a better user experience!