# How to Prompt with Reasoning Models (OpenAI & Claude)

This notebook demonstrates how to effectively use reasoning models like **OpenAI's o3/o4-mini** and **Claude's Extended Thinking** for various tasks, including structured prompting, few-shot learning, and generating responses with specific policies. Below is a detailed guide to help students navigate and understand the notebook.

## 1. Introduction
This notebook is designed to teach students how to:
* Load and configure API keys for reasoning models (OpenAI & Anthropic).
* List available OpenAI models programmatically.
* Apply best practices for prompting reasoning models.
* Use structured formats and few-shot examples to improve model responses.
* Compare different reasoning models (o3, o4-mini, o3-mini) and their costs.
* Understand when to use reasoning models vs. regular models.
* Leverage Claude's Extended Thinking feature for advanced reasoning.

## 2. Prerequisites
Before running the notebook, ensure the following:

**Python Environment:** Install Python 3.8+.

**Required Libraries:**
- `openai` - For OpenAI API access
- `anthropic` - For Claude API access
- `python-dotenv` - For environment variable management
- `IPython` - For enhanced display

**API Keys:**
Obtain API keys for OpenAI and Anthropic.
Store them in a `.env` file in the same directory as the notebook:

```bash
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
```

**Install dependencies:**
```bash
pip install openai anthropic python-dotenv ipython jupyter
```

# Notebook Sections

## üìö Loading API Keys

### Purpose
Load API keys securely using the `dotenv` library to authenticate with OpenAI and other services.

### Code Implementation
```python
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Retrieve API keys from environment variables
openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")

# Verify API keys are loaded successfully
if openai_api_key:
    print("‚úÖ OpenAI API key loaded successfully")
else:
    print("‚ùå OpenAI API key not found")

if anthropic_api_key:
    print("‚úÖ Anthropic API key loaded successfully")  
else:
    print("‚ùå Anthropic API key not found")
```

### Expected Output
- Confirmation messages indicating whether API keys are successfully loaded
- Security best practice: API key values are not displayed for privacy protection

### Prerequisites
- Create a `.env` file in your project root directory
- Add your API keys in the format: `OPENAI_API_KEY=your_actual_key_here`
- Install required dependencies: `pip install python-dotenv`

In [None]:
### Python Code:

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Fetch API keys
openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")

# Check if API keys are loaded
if openai_api_key and anthropic_api_key:
    print("‚úÖ API keys are successfully loaded.")
else:
    print("‚ö†Ô∏è Warning: One or more API keys are missing.")

# Optionally, display API keys (for debugging purposes only)
display_keys = False  # Change to True if you want to see the keys

if display_keys:
    print(f"OpenAI API Key: {openai_api_key}")
    print(f"Anthropic API Key: {anthropic_api_key}")
else:
    print("üîí API keys are loaded but hidden for security.")

# How to List Available OpenAI Models via the API
* You can programmatically retrieve the list of available models from OpenAI using their Python client. This is useful to check which models (e.g., gpt-5, gpt-5-mini, gpt-5-nano, gpt-4-turbo, gpt-3.5-turbo, etc.) are accessible to your API key and account.

* Here‚Äôs how you can do it:

In [None]:
import os
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables (if using .env for API key)
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# List available models
models = client.models.list()
print("Available OpenAI Models:")
for model in models.data:
    print(model.id)

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import display, Markdown, HTML
import time

load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")

# Initialize clients
openai_client = OpenAI(api_key=openai_api_key)
anthropic_client = Anthropic(api_key=anthropic_api_key)

# OpenAI Models
GPT_MODEL = 'gpt-4o'  # Regular model for comparison
O3_MODEL = 'o3'  # Latest advanced reasoning model
O4_MINI_MODEL = 'o4-mini'  # Cost-effective reasoning model
O3_MINI_MODEL = 'o3-mini'  # Balanced reasoning model

# Claude Models
CLAUDE_OPUS = 'claude-opus-4-6'  # Most capable
CLAUDE_SONNET = 'claude-sonnet-4-5-20250929'  # Balanced

print("‚úÖ All models configured successfully!")
print(f"üìä OpenAI Reasoning Models: {O3_MODEL}, {O4_MINI_MODEL}, {O3_MINI_MODEL}")
print(f"üìä Claude Models: {CLAUDE_OPUS}, {CLAUDE_SONNET}")
print(f"üìä Comparison Model: {GPT_MODEL}")

# 4 Principles on Prompting with Reasoning Models

When working with reasoning models like o3, o4-mini, and Claude with Extended Thinking, follow these four key principles:

## 1. Keep It Simple and Direct
Reasoning models work best with clear, straightforward prompts. Avoid unnecessary complexity or verbose instructions.

## 2. No Explicit Chain-of-Thought (CoT) Required
**Do NOT** provide step-by-step instructions like "Think through this step by step" or "Let's break this down". 

Reasoning models have built-in reasoning capabilities. Adding explicit CoT instructions can:
- Interfere with the model's internal reasoning process
- Lead to overly verbose outputs
- Cause inaccurate results or refusals

‚ùå **Bad:** "Think through this step by step, and don't skip any steps..."
‚úÖ **Good:** "Generate a function that outputs the SMILES IDs for all the molecules involved in insulin."

## 3. Use Structured Formats
Leverage consistent structures like XML tags or markdown to organize your inputs:
- Helps the model parse complex instructions
- Ensures more uniform output
- Improves reliability in production systems

**Example:** Use `<instructions>`, `<policy>`, `<query>` tags to structure your prompts.

## 4. Show Rather Than Tell (Few-Shot Learning)
Instead of explaining what you want, provide 1-2 examples:
- Demonstrates the desired format and style
- Provides domain context naturally
- Often more effective than lengthy instructions

**Example:** Show an example of a legal response before asking for a similar analysis.

In [None]:
bad_prompt = ("Generate a function that outputs the SMILES IDs for all the molecules involved in insulin."
              "Think through this step by step, and don't skip any steps:"
              "- Identify all the molecules involve in insulin"
              "- Make the function"
              "- Loop through each molecule, outputting each into the function and returning a SMILES ID"
              "Molecules: ")

try:
    response = openai_client.chat.completions.create(
        model=GPT_MODEL, 
        messages=[{"role":"user","content": bad_prompt}]
    )
    print(f"‚úÖ Response generated successfully with {GPT_MODEL}")
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
from IPython.display import display, HTML, Markdown

display(HTML('<div style="background-color: #f0fff8; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>üîΩ &nbsp; Markdown Output ‚Äì Beginning</h2></hr></div>'))
display(Markdown(response.choices[0].message.content))
display(HTML('<div style="background-color: #fff4f4; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>üîº &nbsp; Markdown Output ‚Äì End</h2></hr></div>'))

In [None]:
good_prompt = ("Generate a function that outputs the SMILES IDs for all the molecules involved in insulin.")

try:
    start_time = time.time()
    response = openai_client.chat.completions.create(
        model=O3_MODEL, 
        messages=[{"role":"user","content": good_prompt}]
    )
    end_time = time.time()
    
    # Display metrics
    print(f"‚úÖ Response generated successfully with {O3_MODEL}")
    print(f"‚è±Ô∏è Response time: {end_time - start_time:.2f}s")
    if hasattr(response.usage, 'reasoning_tokens'):
        print(f"üß† Reasoning tokens: {response.usage.reasoning_tokens}")
    print(f"üìù Completion tokens: {response.usage.completion_tokens}")
    print(f"üé´ Total tokens: {response.usage.total_tokens}")
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
display(HTML('<div style="background-color: #f0fff8; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>üîΩ &nbsp; Markdown Output ‚Äì Beginning</h2></hr></div>'))
display(Markdown(response.choices[0].message.content))
display(HTML('<div style="background-color: #fff4f4; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>üîº &nbsp; Markdown Output ‚Äì End</h2></hr></div>'))

# 3. Use structured formats
* Using a consistent structure like XML or markdown can help structure your inputs and ensure a more uniform output. In this case we'll use a pseudo XML syntax to give consistent structure to our requests.

In [None]:
structured_prompt = ("<instructions>You are a customer service assistant for AnyCorp, a provider"
          "of fine storage solutions. Your role is to follow your policy to answer the user's question. "
          "Be kind and respectful at all times.</instructions>\n"
          "<policy>**AnyCorp Customer Service Assistant Policy**\n\n"
            "1. **Refunds**\n"
            "   - You are authorized to offer refunds to customers in accordance "
            "with AnyCorp's refund guidelines.\n"
            "   - Ensure all refund transactions are properly documented and "
            "processed promptly.\n\n"
            "2. **Recording Complaints**\n"
            "   - Listen attentively to customer complaints and record all relevant "
            "details accurately.\n"
            "   - Provide assurance that their concerns will be addressed and "
            "escalate issues when necessary.\n\n"
            "3. **Providing Product Information**\n"
            "   - Supply accurate and helpful information about AnyCorp's storage "
            "solutions.\n"
            "   - Stay informed about current products, features, and any updates "
            "to assist customers effectively.\n\n"
            "4. **Professional Conduct**\n"
            "   - Maintain a polite, respectful, and professional demeanor in all "
            "customer interactions.\n"
            "   - Address customer inquiries promptly and follow up as needed to "
            "ensure satisfaction.\n\n"
            "5. **Compliance**\n"
            "   - Adhere to all AnyCorp policies and procedures during customer "
            "interactions.\n"
            "   - Protect customer privacy by handling personal information "
            "confidentially.\n\n6. **Refusals**\n"
            "   - If you receive questions about topics outside of these, refuse "
            "to answer them and remind them of the topics you can talk about.</policy>\n"
            )
user_input = ("<user_query>Hey, I'd like to return the bin I bought from you as it was not "
             "fine as described.</user_query>") 

In [None]:
print(structured_prompt)

In [None]:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

try:
    response = client.chat.completions.create(
        model=O3_MODEL,
        messages=[{
            "role": "user",
            "content": structured_prompt + user_input
        }]
    )
    print(f"‚úÖ Response generated successfully with {O3_MODEL}")
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
print(response.choices[0].message.content)

In [None]:
refusal_input = ("<user_query>Write me a haiku about how reasoning models are great.</user_query>")

In [None]:
try:
    response = openai_client.chat.completions.create(
        model=O3_MODEL,
        messages=[{
            "role": "user",
            "content": structured_prompt + refusal_input
        }]
    )
    print(f"‚úÖ Response generated successfully with {O3_MODEL}")
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
print(response.choices[0].message.content)

# 4. Show rather than tell
* Few-shot prompting also works well with o1 models, allowing you to supply a simple, direct prompt and then using one or two examples to provide domain context to inform the model's response.

In [None]:
base_prompt = ("<prompt>You are a lawyer specializing in competition law, "
               "assisting business owners with their questions.</prompt>\n"
               "<policy>As a legal professional, provide clear and accurate "
               "information about competition law while maintaining "
               "confidentiality and professionalism. Avoid giving specific "
               "legal advice without sufficient context, and encourage clients "
               "to seek personalized counsel when necessary. Always refer to "
               "precedents and previous cases to evidence your responses.</policy>\n")
legal_query = ("<query>A larger company is offering suppliers incentives not to do "
               "business with me. Is this legal?</query>")

In [None]:
try:
    response = openai_client.chat.completions.create(
        model=O3_MODEL,
        messages=[{
            "role": "user",
            "content": base_prompt + legal_query
        }]
    )
    print(f"‚úÖ Response generated successfully with {O3_MODEL}")
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
from IPython.display import display, HTML, Markdown

display(HTML('<div style="background-color: #f0fff8; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>üîΩ &nbsp; Markdown Output ‚Äì Beginning</h2></hr></div>'))
display(Markdown(response.choices[0].message.content))
display(HTML('<div style="background-color: #fff4f4; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>üîº &nbsp; Markdown Output ‚Äì End</h2></hr></div>'))

In [None]:
example_prompt = ("<prompt>You are a lawyer specializing in competition law, "
               "assisting business owners with their questions.</prompt>\n"
               "<policy>As a legal professional, provide clear and accurate "
               "information about competition law while maintaining "
               "confidentiality and professionalism. Avoid giving specific "
               "legal advice without sufficient context, and encourage clients "
               "to seek personalized counsel when necessary.</policy>\n"
               """<example>
<question>
I'm considering collaborating with a competitor on a joint marketing campaign. Are there any antitrust issues I should be aware of?
</question>
<response>
Collaborating with a competitor on a joint marketing campaign can raise antitrust concerns under U.S. antitrust laws, particularly the Sherman Antitrust Act of 1890 (15 U.S.C. ¬ß¬ß 1‚Äì7). Section 1 of the Sherman Act prohibits any contract, combination, or conspiracy that unreasonably restrains trade or commerce among the states.

**Key Considerations:**

1. **Per Se Illegal Agreements:** Certain collaborations are considered automatically illegal ("per se" violations), such as price-fixing, bid-rigging, and market allocation agreements. For example, in *United States v. Topco Associates, Inc.*, 405 U.S. 596 (1972), the Supreme Court held that market division agreements between competitors are per se illegal under the Sherman Act.

2. **Rule of Reason Analysis:** Collaborations that are not per se illegal are evaluated under the "rule of reason," which assesses whether the pro-competitive benefits outweigh the anti-competitive effects. In *Broadcast Music, Inc. v. Columbia Broadcasting System, Inc.*, 441 U.S. 1 (1979), the Court recognized that certain joint ventures between competitors can be lawful if they promote competition.

3. **Information Sharing Risks:** Sharing competitively sensitive information, such as pricing strategies or customer data, can lead to antitrust violations. The Department of Justice and the Federal Trade Commission caution against exchanges that could facilitate collusion (*Antitrust Guidelines for Collaborations Among Competitors*, 2000).

**Recommendations:**

- **Define the Scope:** Clearly delineate the parameters of the collaboration to focus on the marketing campaign without involving competitive aspects like pricing or market division.
- **Implement Safeguards:** Establish protocols to prevent the exchange of sensitive information that is not essential to the marketing effort.
- **Legal Consultation:** Given the complexities of antitrust laws, consult with a legal professional to ensure the collaboration complies with all legal requirements.

**Conclusion:**

While joint marketing campaigns between competitors are not inherently illegal, they must be structured carefully to avoid antitrust pitfalls. Legal guidance is essential to navigate these issues and to design a collaboration that achieves your business objectives without violating antitrust laws.
</response>
</example>""")

In [None]:
try:
    response = openai_client.chat.completions.create(
        model=O3_MODEL,
        messages=[{
            "role": "user",
            "content": example_prompt + legal_query
        }]
    )
    print(f"‚úÖ Response generated successfully with {O3_MODEL}")
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
display(HTML('<div style="background-color: #f0fff8; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>üîΩ &nbsp; Markdown Output ‚Äì Beginning</h2></hr></div>'))
display(Markdown(response.choices[0].message.content))
display(HTML('<div style="background-color: #fff4f4; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>üîº &nbsp; Markdown Output ‚Äì End</h2></hr></div>'))

# Summary & Best Practices

## üéì Key Takeaways:

1. **Keep prompts simple** - Let reasoning models figure out the steps
2. **Don't provide explicit CoT** - It can interfere with internal reasoning
3. **Use structured formats** - XML tags help organize complex requests
4. **Show examples** - Few-shot learning works great with reasoning models
5. **Choose the right model** - Consider task complexity, budget, and speed needs
6. **Monitor costs** - Reasoning models are more expensive per token
7. **Use Claude Extended Thinking** - When you need transparency in reasoning

## üìö Further Learning:

- **OpenAI Documentation**: https://platform.openai.com/docs/models
- **Anthropic Documentation**: https://docs.anthropic.com/
- **Prompt Engineering Guide**: https://www.promptingguide.ai/

## üöÄ Next Steps:

1. Try the interactive testing function with your own prompts
2. Experiment with different models on the same task
3. Compare cost vs. quality for your specific use case
4. Practice with the 4 principles in real scenarios

---

**Happy Reasoning! üß†‚ú®**

In [None]:
# Try Your Own Prompt! (Interactive Example)

# Uncomment and modify this to test with your own prompt:
# test_reasoning_model(
#     prompt="Your question here",
#     model_name="o3",  # Options: 'o3', 'o4-mini', 'o3-mini', 'gpt-4o'
#     display_metrics=True
# )

print("üí° Ready to test! Uncomment the code above and add your prompt.")

# Model Selection Guide

## ü§î Which Model Should I Use?

### Decision Tree:

```
START: What's your task?
‚îÇ
‚îú‚îÄ Simple factual query? ‚Üí Use gpt-4o (regular model)
‚îÇ
‚îú‚îÄ Complex reasoning required?
‚îÇ   ‚îÇ
‚îÇ   ‚îú‚îÄ Budget is NOT a concern? ‚Üí Use o3 or Claude Opus
‚îÇ   ‚îÇ
‚îÇ   ‚îú‚îÄ Need cost-effective solution? ‚Üí Use o4-mini or o3-mini
‚îÇ   ‚îÇ
‚îÇ   ‚îî‚îÄ Need transparency in reasoning? ‚Üí Use Claude with Extended Thinking
‚îÇ
‚îî‚îÄ Creative writing/generation? ‚Üí Use gpt-4o (reasoning not needed)
```

## ‚úÖ When TO Use Reasoning Models:

1. **Complex Multi-Step Problems**
   - Mathematical proofs
   - Logic puzzles
   - Strategic planning

2. **Code Analysis & Debugging**
   - Finding subtle bugs
   - Analyzing complex algorithms
   - Architecture decisions

3. **Nuanced Decision Making**
   - Ethical dilemmas
   - Trade-off analysis
   - Legal reasoning

4. **Research & Analysis**
   - Literature review
   - Data interpretation
   - Comparative analysis

## ‚ùå When NOT to Use Reasoning Models:

1. **Simple Tasks** - Overkill for basic queries
2. **Creative Writing** - Regular models often better
3. **Quick Responses Needed** - Reasoning models are slower
4. **High-Volume Applications** - Cost adds up quickly
5. **Straightforward Summarization** - Regular models sufficient

## üí° Model-Specific Recommendations:

### Use **o3** when:
- Accuracy is critical
- Problem requires deep reasoning
- Budget allows ($15-60 per 1M tokens)
- Examples: Medical diagnosis, legal analysis, complex math

### Use **o4-mini / o3-mini** when:
- Good reasoning needed at lower cost
- Moderate complexity problems
- High-volume applications
- Examples: Customer support, code review, tutoring

### Use **Claude Opus with Extended Thinking** when:
- You need to see the reasoning process
- Building AI systems that explain decisions
- Educational purposes (showing how AI thinks)
- Examples: AI safety research, transparent decision-making

### Use **gpt-4o** (regular) when:
- Speed matters more than reasoning depth
- Simple factual questions
- Creative tasks
- General conversation
- Examples: Chatbots, content generation, translation

In [None]:
def test_reasoning_model(prompt, model_name='o3', display_metrics=True):
    """
    Test any reasoning model with your custom prompt
    
    Args:
        prompt: Your question or task
        model_name: 'o3', 'o4-mini', 'o3-mini', or 'gpt-4o'
        display_metrics: Whether to show performance metrics
    """
    model_map = {
        'o3': O3_MODEL,
        'o4-mini': O4_MINI_MODEL,
        'o3-mini': O3_MINI_MODEL,
        'gpt-4o': GPT_MODEL
    }
    
    if model_name not in model_map:
        print(f"‚ùå Unknown model: {model_name}")
        print(f"Available models: {list(model_map.keys())}")
        return
    
    try:
        print(f"ü§ñ Testing with {model_name}...\n")
        start_time = time.time()
        
        response = openai_client.chat.completions.create(
            model=model_map[model_name],
            messages=[{"role": "user", "content": prompt}],
            max_completion_tokens=2000
        )
        
        end_time = time.time()
        
        if display_metrics:
            print(f"‚è±Ô∏è Time: {end_time - start_time:.2f}s")
            print(f"üé´ Tokens: {response.usage.total_tokens}")
            cost = calculate_cost(response.usage, model_map[model_name])
            print(f"üí∞ Cost: ${cost:.4f}\n")
        
        display(HTML(f'<div style="background-color: #f5f5f5; padding: 15px; border-radius: 5px; border-left: 4px solid #2196f3; margin: 10px 0;"><h3>üéØ Response from {model_name}</h3></div>'))
        display(Markdown(response.choices[0].message.content))
        
        return response
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        return None

print("‚úÖ Interactive testing function ready!")
print("\nüìù Usage example:")
print('test_reasoning_model("Your prompt here", model_name="o3")')

# Interactive Testing Function

Now you can test your own prompts with any reasoning model!

In [None]:
# Example 3: Multi-Step Planning

planning_prompt = """Plan a 3-day trip to Singapore with the following constraints:
- Budget: $500 USD total
- Interests: Technology, food, culture
- Must visit at least 2 museums
- Need to account for transportation between locations
- Want to try at least 5 different local dishes

Provide a detailed day-by-day itinerary with estimated costs."""

try:
    print("üó∫Ô∏è Testing Multi-Step Planning with o4-mini...\n")
    response = openai_client.chat.completions.create(
        model=O4_MINI_MODEL,
        messages=[{"role": "user", "content": planning_prompt}],
        max_completion_tokens=1200
    )
    
    display(HTML('<div style="background-color: #e0f7fa; padding: 15px; border-radius: 5px; margin: 10px 0;"><h3>üèñÔ∏è Travel Itinerary</h3></div>'))
    display(Markdown(response.choices[0].message.content))
    
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
# Example 2: Code Debugging

debug_prompt = """Find and explain the bug in this Python code:

```python
def find_duplicates(arr):
    seen = {}
    duplicates = []
    for num in arr:
        if seen[num]:
            duplicates.append(num)
        seen[num] = True
    return duplicates

# Test
result = find_duplicates([1, 2, 3, 2, 4, 5, 3])
print(result)
```

Explain what's wrong and provide the corrected code."""

try:
    print("üêõ Testing Code Debugging with o3...\n")
    response = openai_client.chat.completions.create(
        model=O3_MODEL,
        messages=[{"role": "user", "content": debug_prompt}],
        max_completion_tokens=800
    )
    
    display(HTML('<div style="background-color: #e8eaf6; padding: 15px; border-radius: 5px; margin: 10px 0;"><h3>üîß Debugging Analysis</h3></div>'))
    display(Markdown(response.choices[0].message.content))
    
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
# Example 1: Mathematical Proof

math_prompt = """Prove that the square root of 2 is irrational using proof by contradiction. 
Provide a complete, rigorous mathematical proof."""

try:
    print("üßÆ Testing Mathematical Reasoning with o3...\n")
    response = openai_client.chat.completions.create(
        model=O3_MODEL,
        messages=[{"role": "user", "content": math_prompt}],
        max_completion_tokens=1000
    )
    
    display(HTML('<div style="background-color: #f3e5f5; padding: 15px; border-radius: 5px; margin: 10px 0;"><h3>üìê Mathematical Proof Result</h3></div>'))
    display(Markdown(response.choices[0].message.content))
    
except Exception as e:
    print(f"‚ùå Error: {e}")

# Advanced Reasoning Examples

Let's test reasoning models with more complex, real-world scenarios that require multi-step thinking.

## Example Categories:
1. **Mathematical Proofs** - Formal reasoning with mathematics
2. **Code Debugging** - Finding and explaining bugs
3. **Multi-Step Planning** - Complex task decomposition
4. **Ethical Dilemmas** - Nuanced reasoning about trade-offs

In [None]:
# Compare Multiple Models on the Same Task

test_prompt = "Solve this logic puzzle: Five houses in a row are painted different colors. The green house is immediately to the left of the white house. The person in the green house drinks coffee. The person in the center house drinks milk. Who drinks water and in which house?"

models_to_test = [
    ('o3', O3_MODEL, openai_client, 'openai'),
    ('o4-mini', O4_MINI_MODEL, openai_client, 'openai'),
    ('gpt-4o', GPT_MODEL, openai_client, 'openai')
]

print("üî¨ Running Model Comparison...\n")
results = []

for name, model, client, api_type in models_to_test:
    try:
        start_time = time.time()
        
        if api_type == 'openai':
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": test_prompt}],
                max_completion_tokens=500
            )
        
        end_time = time.time()
        elapsed = end_time - start_time
        cost = calculate_cost(response.usage, model)
        
        results.append({
            'model': name,
            'time': elapsed,
            'tokens': response.usage.total_tokens,
            'cost': cost,
            'success': True
        })
        
        print(f"‚úÖ {name}: {elapsed:.2f}s | {response.usage.total_tokens} tokens | ${cost:.4f}")
        
    except Exception as e:
        print(f"‚ùå {name}: Error - {str(e)[:50]}...")
        results.append({
            'model': name,
            'success': False,
            'error': str(e)
        })

print("\nüìä Comparison complete!")

In [None]:
# Cost Calculator Function

def calculate_cost(usage, model_name):
    """Calculate approximate cost for API call"""
    # Pricing in dollars per 1M tokens (approximate)
    pricing = {
        'o3': {'input': 15.00, 'output': 60.00},
        'o4-mini': {'input': 1.10, 'output': 4.40},
        'o3-mini': {'input': 1.10, 'output': 4.40},
        'gpt-4o': {'input': 2.50, 'output': 10.00},
        'claude-opus-4-6': {'input': 15.00, 'output': 75.00},
        'claude-sonnet-4-5-20250929': {'input': 3.00, 'output': 15.00}
    }
    
    if model_name not in pricing:
        return "Unknown"
    
    input_tokens = getattr(usage, 'input_tokens', 0) or getattr(usage, 'prompt_tokens', 0)
    output_tokens = getattr(usage, 'output_tokens', 0) or getattr(usage, 'completion_tokens', 0)
    
    input_cost = (input_tokens / 1_000_000) * pricing[model_name]['input']
    output_cost = (output_tokens / 1_000_000) * pricing[model_name]['output']
    
    return input_cost + output_cost

print("‚úÖ Cost calculator function ready!")

# Model Comparison: Performance & Cost Analysis

Let's compare different reasoning models across multiple dimensions:
- **Response Quality**: How well does it solve complex problems?
- **Speed**: Time to generate response
- **Cost**: API pricing (approximate)
- **Token Usage**: Input + output tokens

## Pricing Reference (Approximate as of 2025):

### OpenAI Models:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|-------|----------------------|------------------------|----------|
| o3 | $15.00 | $60.00 | Most complex reasoning |
| o4-mini | $1.10 | $4.40 | Fast reasoning, budget-friendly |
| o3-mini | $1.10 | $4.40 | Balanced reasoning |
| gpt-4o | $2.50 | $10.00 | General tasks |

### Claude Models:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Extended Thinking |
|-------|----------------------|------------------------|-------------------|
| Claude Opus | $15.00 | $75.00 | Available |
| Claude Sonnet | $3.00 | $15.00 | Available |

**Note:** Prices may vary. Check official pricing pages for current rates.

In [None]:
# Display Claude's Response with Thinking

if 'response' in locals():
    # Check if this is a Claude response (has .content as a list)
    if hasattr(response, 'content') and isinstance(response.content, list):
        display(HTML('<div style="background-color: #e8f5e9; padding: 15px; border-radius: 5px; border-left: 4px solid #4caf50; margin: 10px 0;"><h3>üß† Claude\'s Thinking Process & Response</h3></div>'))
        
        for i, block in enumerate(response.content):
            if block.type == "thinking":
                display(HTML('<div style="background-color: #fff3e0; padding: 10px; border-radius: 5px; margin: 5px 0;"><strong>üí≠ Internal Reasoning:</strong></div>'))
                display(Markdown(block.text))
            elif block.type == "text":
                display(HTML('<div style="background-color: #e3f2fd; padding: 10px; border-radius: 5px; margin: 5px 0;"><strong>üí¨ Final Response:</strong></div>'))
                display(Markdown(block.text))
        
        display(HTML('<div style="background-color: #fce4ec; padding: 10px; border-radius: 5px; margin: 10px 0;"><strong>üîö End of Claude Response</strong></div>'))
    else:
        print("‚ö†Ô∏è The 'response' variable contains an OpenAI response, not a Claude response.")
        print("‚ÑπÔ∏è  This cell is designed to display Claude's Extended Thinking output.")
        print("üìù Please run the Claude Extended Thinking cell (above) first.")
else:
    print("‚ö†Ô∏è No response to display. Run the Claude Extended Thinking cell first.")

In [None]:
# Example: Claude with Extended Thinking

try:
    print(f"ü§ñ Testing Claude {CLAUDE_OPUS} with Extended Thinking...")
    start_time = time.time()
    
    response = anthropic_client.messages.create(
        model=CLAUDE_OPUS,
        max_tokens=4000,
        thinking={
            "type": "enabled",
            "budget_tokens": 2000
        },
        messages=[{
            "role": "user",
            "content": good_prompt
        }]
    )
    
    end_time = time.time()
    
    # Display metrics
    print(f"‚úÖ Response generated successfully")
    print(f"‚è±Ô∏è Response time: {end_time - start_time:.2f}s")
    print(f"üìù Total tokens: {response.usage.input_tokens + response.usage.output_tokens}")
    print(f"\nüìä Response Structure:")
    
    # Claude's response may include thinking blocks
    for block in response.content:
        print(f"  - {block.type}: {len(block.text) if hasattr(block, 'text') else 'N/A'} characters")
    
except Exception as e:
    print(f"‚ùå Error with Claude Extended Thinking: {e}")
    print("‚ÑπÔ∏è Note: Extended Thinking may not be available on all accounts or regions")

# Claude's Extended Thinking Feature

Claude doesn't have separate "reasoning models" like OpenAI's o-series. Instead, Claude offers an **Extended Thinking** feature that can be enabled on any Claude model (Opus or Sonnet).

## How Extended Thinking Works:
- You enable it by adding a `thinking` parameter
- The model shows its reasoning process before the final answer
- You can control the "thinking budget" (how many tokens to use for reasoning)
- Best for complex problems requiring multi-step reasoning

## When to Use Claude Extended Thinking:
- Complex analytical tasks
- Multi-step problem solving
- When you need transparency in reasoning
- Situations requiring careful deliberation