# GPT-5 Reasoning Models: A Practical Guide

This notebook demonstrates OpenAI's GPT-5 reasoning capabilities using the **Responses API**.

## What Are Reasoning Models?

Reasoning models emit **hidden reasoning tokens** before generating their final answer. This allows them to:
- Break down complex problems into steps
- Explore multiple approaches before committing to an answer
- Verify their work and catch mistakes
- Handle multi-step tasks more reliably

Think of it like showing your work in math class - the model "thinks through" the problem before answering.

## Setup

First, let's set up our OpenAI client with the Responses API.

In [1]:
import os
import getpass
from openai import OpenAI

# Set API key
if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

client = OpenAI()
print("‚úì OpenAI client initialized")

‚úì OpenAI client initialized


## Reasoning Effort Levels

GPT-5 supports four reasoning effort levels:

| Level | Use When | Speed | Cost | Quality |
|-------|----------|-------|------|----------|
| **minimal** | Simple classification, extraction, formatting | Fastest | Lowest | Good |
| **low** | Straightforward tasks with some reasoning | Fast | Low | Better |
| **medium** | Balanced default for most workflows | Moderate | Moderate | Great |
| **high** | Complex multi-step tasks, critical accuracy | Slower | Higher | Best |

**Key Insight**: Higher effort = more reasoning tokens = better accuracy but higher latency/cost

## Example 1: Minimal Reasoning - Quick Classification

Perfect for: sentiment analysis, simple categorization, extraction tasks

In [3]:
# Simple sentiment classification with minimal reasoning
response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "minimal"},
    input=[
        {
            "role": "developer",
            "content": "Classify the sentiment as: positive, neutral, or negative. Return only one word."
        },
        {
            "role": "user",
            "content": "The new update completely broke my workflow. Very disappointed."
        }
    ]
)

print(f"Sentiment: {response.output_text}")
print(f"Total tokens: {response.usage.total_tokens}")

Sentiment: negative
Total tokens: 46


## Example 2: Medium Reasoning - Code Generation

Perfect for: standard coding tasks, data transformations, moderate complexity problems

In [None]:
# Generate a practical Python function with medium reasoning
prompt = """
Write a Python function that validates an email address.
Requirements:
- Check for @ symbol
- Verify domain has at least one dot
- Ensure no spaces
- Return True/False

Include a docstring and 2-3 test cases.
"""

response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "medium"},
    input=[{"role": "user", "content": prompt}]
)

print(response.output_text)
print(f"\n{'='*60}")
print(f"Reasoning tokens: {response.usage.reasoning_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

## Example 3: High Reasoning - Complex Problem Solving

Perfect for: algorithm design, multi-step logic, edge case handling, critical accuracy needs

In [None]:
# Complex algorithmic problem with high reasoning
problem = """
Design an algorithm to find the longest palindromic substring in a string.

Requirements:
- Handle edge cases (empty string, single character, no palindromes)
- Optimize for time complexity
- Provide the implementation in Python
- Explain the approach and time/space complexity
"""

response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "high"},
    input=[{"role": "user", "content": problem}]
)

print(response.output_text)
print(f"\n{'='*60}")
print(f"Reasoning tokens: {response.usage.reasoning_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

## Example 4: Verbosity Control

GPT-5 introduces a **verbosity** parameter to control output length (separate from reasoning depth).

In [None]:
question = "What is dependency injection and why is it useful?"

# Try different verbosity levels
for verbosity in ["low", "medium", "high"]:
    response = client.responses.create(
        model="gpt-5",
        reasoning={"effort": "medium"},
        text={"verbosity": verbosity},
        input=[{"role": "user", "content": question}]
    )
    
    print(f"\n{'='*60}")
    print(f"VERBOSITY: {verbosity.upper()}")
    print(f"{'='*60}")
    print(response.output_text)
    print(f"\nOutput tokens: {response.usage.output_tokens}")

## Example 5: Comparing Reasoning Efforts

Let's see how different reasoning levels handle the same moderately complex task.

In [None]:
task = """
You have a list of meeting times in 24-hour format as strings: 
["09:00-10:30", "10:00-11:00", "14:00-15:30", "15:00-16:00"]

Write a Python function that finds all overlapping meetings.
Return a list of tuples showing which meetings overlap.
"""

efforts = ["minimal", "medium", "high"]
results = {}

for effort in efforts:
    response = client.responses.create(
        model="gpt-5",
        reasoning={"effort": effort},
        input=[{"role": "user", "content": task}]
    )
    
    results[effort] = {
        "output": response.output_text,
        "reasoning_tokens": response.usage.reasoning_tokens,
        "output_tokens": response.usage.output_tokens,
        "total_tokens": response.usage.total_tokens
    }

# Display comparison
for effort in efforts:
    print(f"\n{'='*70}")
    print(f"REASONING EFFORT: {effort.upper()}")
    print(f"{'='*70}")
    print(results[effort]["output"])
    print(f"\nüìä Token Usage:")
    print(f"   Reasoning: {results[effort]['reasoning_tokens']}")
    print(f"   Output: {results[effort]['output_tokens']}")
    print(f"   Total: {results[effort]['total_tokens']}")

## Best Practices for GPT-5 Reasoning Models

### 1. **Don't Ask for Chain-of-Thought**
‚ùå Bad: "Think step-by-step and show your reasoning"
‚úì Good: Just state the task clearly

*Reasoning models already think internally - asking them to plan more can degrade performance.*

### 2. **Choose the Right Effort Level**
- **minimal**: Classification, extraction, simple formatting
- **medium**: Default for most tasks (best price/performance)
- **high**: Critical accuracy, complex multi-step problems

### 3. **Be Clear and Direct**
- Use explicit instructions
- Avoid contradictory directives
- Define clear success criteria

### 4. **Use Developer Messages**
Set context and tone with `{"role": "developer"}` messages:
```python
{
    "role": "developer",
    "content": "You are a Python expert. Provide production-ready code with error handling."
}
```

### 5. **Control Verbosity Separately**
Use `text={"verbosity": "low"}` for concise outputs when you don't need detailed explanations.

### 6. **Leverage the Responses API**
Use `responses.create()` instead of `chat.completions.create()` for better reasoning persistence across tool calls.

## When to Use Reasoning Models?

### ‚úÖ Great Use Cases
- Complex code generation and debugging
- Multi-step problem solving
- Data analysis with multiple considerations
- Document comparison and legal review
- Algorithm design
- Tasks requiring verification and self-correction

### ‚ö†Ô∏è Probably Overkill
- Simple text generation
- Basic Q&A
- Creative writing
- Translation
- Summarization of straightforward content

*For these simpler tasks, consider GPT-4.1 or use `reasoning={"effort": "minimal"}` to reduce latency and cost.*

## Example 6: Practical Application - Bug Detection

Let's use medium reasoning to analyze code for potential bugs.

In [None]:
buggy_code = '''
def calculate_average(numbers):
    total = 0
    for num in numbers:
        total += num
    return total / len(numbers)

def process_user_data(data):
    result = {}
    for item in data:
        result[item['id']] = item['name'].upper()
    return result
'''

prompt = f"""
Review this Python code for potential bugs, edge cases, and issues.
For each issue found:
1. Describe the problem
2. Show what input would cause it to fail
3. Provide a fix

Code:
```python
{buggy_code}
```
"""

response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "medium"},
    input=[{"role": "user", "content": prompt}]
)

print(response.output_text)

## Key Takeaways

1. **Reasoning models think before answering** - hidden reasoning tokens enable better problem-solving

2. **Start with medium effort** - it's the best balance for most tasks

3. **Use minimal for speed** - when you need fast responses for simple tasks

4. **Use high for accuracy** - when correctness is critical and you can afford the latency

5. **Keep prompts clear and direct** - don't ask the model to "think step-by-step"

6. **Control output length with verbosity** - separate from reasoning depth

7. **Use the Responses API** - it's designed specifically for reasoning models

## Resources

- [OpenAI GPT-5 Documentation](https://platform.openai.com/docs/models#gpt-5)
- [OpenAI Cookbook - GPT-5 Prompting Guide](https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide)
- [Responses API Documentation](https://platform.openai.com/docs/api-reference/responses)