# üéì Lesson 1.3: Controlling Outputs

## üìö What You'll Learn

By the end of this lesson, you'll understand:
- How temperature affects randomness
- What max_tokens does and how to use it
- Stop sequences for controlling output
- Getting consistent, predictable results
- Top-p and top-k sampling

**Time to Complete**: 40-50 minutes

---

## üé≤ Understanding Randomness

Claude doesn't just give the "one correct answer" - it's probabilistic! Imagine Claude as a creative writer:

**Question**: "The cat sat on the..."

**Possible completions**:
- mat (70% likely)
- chair (15% likely)
- roof (10% likely)
- skateboard (5% likely)

**Temperature** controls how adventurous Claude is in choosing words!

---

## üöÄ Setup

In [None]:
import os
from dotenv import load_dotenv
from anthropic import Anthropic

load_dotenv()
client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

print("‚úÖ Client ready!")

## üå°Ô∏è Temperature: Controlling Creativity

Temperature ranges from 0 to 1:
- **0.0**: Deterministic, predictable (always picks the most likely word)
- **0.5**: Balanced (default for many uses)
- **1.0**: Creative, random (takes more risks)

Let's see the difference!

In [None]:
# Same question, different temperatures
question = "Write a creative opening line for a sci-fi story."

# Temperature = 0 (Very predictable)
response_cold = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=100,
    temperature=0.0,  # Deterministic!
    messages=[{"role": "user", "content": question}]
)

# LINE-BY-LINE EXPLANATION:
# ---------------------------
# temperature=0.0
#   This makes Claude VERY conservative
#   It will always pick the most probable next word
#   Results are consistent - run it 10 times, get similar answers
#   Good for: factual questions, data extraction, code generation

print("ü•∂ TEMPERATURE = 0.0 (Conservative)")
print(response_cold.content[0].text)
print("\n" + "="*60 + "\n")

In [None]:
# Temperature = 1 (Very creative)
response_hot = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=100,
    temperature=1.0,  # Maximum creativity!
    messages=[{"role": "user", "content": question}]
)

# LINE-BY-LINE EXPLANATION:
# ---------------------------
# temperature=1.0
#   This makes Claude VERY creative and unpredictable
#   It considers less likely word choices
#   Results vary significantly between runs
#   Good for: creative writing, brainstorming, diverse outputs

print("üî• TEMPERATURE = 1.0 (Creative)")
print(response_hot.content[0].text)
print("\n" + "="*60 + "\n")

In [None]:
# Let's run the SAME question multiple times with different temperatures
def test_temperature(temp, runs=3):
    """Test how temperature affects output consistency."""
    print(f"\nüå°Ô∏è Testing Temperature = {temp}")
    print("="*60)
    
    for i in range(runs):
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=50,
            temperature=temp,
            messages=[{"role": "user", "content": "Name a color."}]
        )
        print(f"Run {i+1}: {response.content[0].text}")

# Test low temperature (should be very similar)
test_temperature(0.0)

# Test high temperature (should be varied)
test_temperature(1.0)

### üí° When to Use Each Temperature

| Temperature | Use Case | Example |
|------------|----------|----------|
| 0.0 - 0.3 | Facts, data extraction, code | "Extract the email from this text" |
| 0.4 - 0.7 | Balanced responses | "Explain quantum physics" |
| 0.8 - 1.0 | Creative writing, brainstorming | "Write a poem" |

---

## üé´ Max Tokens: Setting Length Limits

Remember: **1 token ‚âà 4 characters** or **¬æ of a word**

Max tokens controls the MAXIMUM length of Claude's response.

In [None]:
# Very short response
response_short = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=20,  # Very short!
    messages=[{"role": "user", "content": "Explain artificial intelligence in detail."}]
)

print("üìè MAX_TOKENS = 20 (Very Short)")
print(response_short.content[0].text)
print(f"\nActual tokens used: {response_short.usage.output_tokens}")
print(f"Stop reason: {response_short.stop_reason}")  # Will be 'max_tokens'!
print("\n" + "="*60 + "\n")

In [None]:
# Longer response
response_long = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,  # Much more room
    messages=[{"role": "user", "content": "Explain artificial intelligence in detail."}]
)

print("üìè MAX_TOKENS = 500 (Longer)")
print(response_long.content[0].text)
print(f"\nActual tokens used: {response_long.usage.output_tokens}")
print(f"Stop reason: {response_long.stop_reason}")  # Might be 'end_turn'
print("\n" + "="*60 + "\n")

### üîç Understanding Stop Reasons

- **end_turn**: Claude finished naturally (complete thought)
- **max_tokens**: Hit the token limit (response was cut off)
- **stop_sequence**: Hit a custom stop sequence (explained below)

If you see `max_tokens` as the stop reason, increase `max_tokens`!

---

## üõë Stop Sequences: Custom Stop Points

Stop sequences tell Claude "when you see THIS text, STOP generating."

In [None]:
# Stop at a specific word
response_stop = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    stop_sequences=["However"],  # Stop if Claude says "However"
    messages=[{
        "role": "user",
        "content": "List the pros and cons of electric cars."
    }]
)

# LINE-BY-LINE EXPLANATION:
# ---------------------------
# stop_sequences=["However"]
#   This is a LIST of strings
#   When Claude generates any of these strings, it STOPS immediately
#   Useful for structured output (e.g., stop at "###" or "END")
#   The stop sequence is NOT included in the output

print("üõë WITH STOP SEQUENCE: ['However']")
print(response_stop.content[0].text)
print(f"\nStop reason: {response_stop.stop_reason}")
print("\n" + "="*60 + "\n")

In [None]:
# Practical example: Generate a numbered list that stops at item 5
response_list = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    stop_sequences=["6."],  # Stop before item 6
    messages=[{
        "role": "user",
        "content": "List programming languages. Number each one."
    }]
)

print("üõë Stop at item 6:")
print(response_list.content[0].text)
print(f"\nStop reason: {response_list.stop_reason}")

### üí° Stop Sequence Use Cases

1. **Structured output**: Stop at delimiters ("###", "---", "END")
2. **Limiting lists**: Stop at a specific item number
3. **Sections**: Stop when a new section starts
4. **Dialogue**: Stop after one character speaks

---

## üéØ Top-p (Nucleus Sampling)

Top-p is another way to control randomness (alternative to temperature).

In [None]:
# Top-p example
response_topp = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=200,
    top_p=0.1,  # Only consider top 10% most likely tokens
    messages=[{
        "role": "user",
        "content": "Describe a sunset in poetic language."
    }]
)

# LINE-BY-LINE EXPLANATION:
# ---------------------------
# top_p=0.1
#   'p' stands for probability
#   0.1 means "only consider tokens in the top 10% of probability"
#   Lower top_p = more focused, less random
#   Higher top_p = more diverse
#   Range: 0.0 to 1.0
#   Don't use both temperature AND top_p at the same time!

print("üéØ TOP-P = 0.1 (Focused)")
print(response_topp.content[0].text)

### ü§î Temperature vs Top-p

**Use temperature when**:
- You want simple control (0 = boring, 1 = creative)
- You're familiar with the concept

**Use top-p when**:
- You want more precise control over randomness
- You're doing advanced sampling

**Don't use both at the same time!** Pick one or the other.

---

## üéØ Practice Exercise 1: Temperature Experiments

**Task**: Test how temperature affects creative vs factual outputs.

In [None]:
# YOUR TURN: Test different temperatures

def compare_temperatures(question, temps=[0.0, 0.5, 1.0]):
    """Compare the same question at different temperatures."""
    for temp in temps:
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=100,
            temperature=temp,
            messages=[{"role": "user", "content": question}]
        )
        print(f"\nüå°Ô∏è Temperature {temp}:")
        print(response.content[0].text)
        print("-" * 60)

# Test 1: Creative question (should show big differences)
print("\n" + "="*60)
print("TEST 1: Creative Question")
print("="*60)
compare_temperatures("Invent a new ice cream flavor.")

# Test 2: Factual question (should be similar)
print("\n" + "="*60)
print("TEST 2: Factual Question")
print("="*60)
compare_temperatures("What is the capital of France?")

## üéØ Practice Exercise 2: Token Management

**Task**: Create a function that ensures complete responses.

In [None]:
# YOUR TURN: Build a smart token manager

def smart_ask(question, initial_tokens=100, max_retries=3):
    """
    Ask Claude a question and ensure we get a complete answer.
    If we hit max_tokens, retry with more tokens.
    """
    tokens = initial_tokens
    
    for attempt in range(max_retries):
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=tokens,
            messages=[{"role": "user", "content": question}]
        )
        
        # Check if we got a complete answer
        if response.stop_reason == "end_turn":
            print(f"‚úÖ Complete answer in {tokens} tokens!")
            return response.content[0].text
        
        # If cut off, try again with more tokens
        print(f"‚ö†Ô∏è Response cut off at {tokens} tokens. Retrying with more...")
        tokens *= 2  # Double the tokens
    
    print(f"‚ùå Still incomplete after {max_retries} attempts.")
    return response.content[0].text

# Test it!
result = smart_ask("Explain the history of the internet in detail.", initial_tokens=50)
print("\nFinal result:")
print(result)

## üéØ Practice Exercise 3: Stop Sequences for Structured Data

**Task**: Use stop sequences to extract only what you need.

In [None]:
# Generate a recipe but stop after ingredients
recipe_response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    stop_sequences=["Instructions:", "Directions:"],
    messages=[{
        "role": "user",
        "content": "Write a recipe for chocolate chip cookies. Include ingredients and instructions."
    }]
)

print("üç™ Recipe (Ingredients Only):")
print(recipe_response.content[0].text)
print(f"\nStop reason: {recipe_response.stop_reason}")

In [None]:
# YOUR TURN: Generate a story but stop after the first paragraph

story_response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    stop_sequences=["\n\n"],  # Stop at double newline (paragraph break)
    messages=[{
        "role": "user",
        "content": "Write a mystery story about a detective."
    }]
)

print("üìñ Story (First Paragraph Only):")
print(story_response.content[0].text)

## üéØ Challenge: Build a Consistent Data Extractor

**Goal**: Extract structured data consistently using low temperature.

In [None]:
# Build a contact info extractor

def extract_contact_info(text):
    """
    Extract name, email, and phone from text.
    Uses low temperature for consistency.
    """
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=200,
        temperature=0.0,  # Deterministic!
        messages=[{
            "role": "user",
            "content": f"""Extract the contact information from this text.
Return ONLY in this format:
Name: [name]
Email: [email]
Phone: [phone]

Text: {text}"""
        }]
    )
    return response.content[0].text

# Test it multiple times - should be identical!
sample_text = "Hi, I'm John Doe. You can reach me at john.doe@email.com or call 555-1234."

print("Testing consistency with temperature=0.0...\n")
for i in range(3):
    print(f"Run {i+1}:")
    print(extract_contact_info(sample_text))
    print("-" * 40)

## üí° Best Practices Summary

### Temperature
- ‚úÖ Use 0.0 for: Data extraction, coding, factual Q&A
- ‚úÖ Use 0.7-1.0 for: Creative writing, brainstorming
- ‚ùå Don't use temperature with top_p

### Max Tokens
- ‚úÖ Set higher than needed, let Claude finish naturally
- ‚úÖ Check `stop_reason` to verify completeness
- ‚úÖ Use `usage.output_tokens` to track actual length
- ‚ùå Don't set too low and cut off responses

### Stop Sequences
- ‚úÖ Use for structured output
- ‚úÖ Use to limit length precisely
- ‚úÖ Can have multiple stop sequences
- ‚ùå Don't make them too common (might stop too early)

---

## ‚úÖ Lesson Complete!

### What You Learned:
- ‚úÖ How temperature controls randomness (0 = consistent, 1 = creative)
- ‚úÖ Max tokens sets length limits
- ‚úÖ Stop sequences control output endpoints
- ‚úÖ Top-p for advanced sampling
- ‚úÖ When to use each parameter
- ‚úÖ Building consistent extractors

### Key Concepts:
1. **Temperature = creativity dial** (0 to 1)
2. **Max tokens = length limit** (check stop_reason)
3. **Stop sequences = custom endpoints** (for structured output)
4. **Low temp = consistency** (for data extraction)

### Next Steps:
üìñ **Lesson 1.4**: Working with JSON - Learn to get perfectly structured data!

---

## ü§î Reflection Questions

1. When would you use temperature=0.0 vs temperature=1.0?
2. What happens if max_tokens is too small?
3. How can you tell if a response was cut off?
4. Give 3 examples of good stop sequences.
5. Why shouldn't you use temperature and top_p together?

Ready to continue? Open `lesson_1.4.ipynb`!