# OpenAI Thinking Parameters with GPT-5.2

This notebook demonstrates OpenAI's GPT-5.2 reasoning capabilities using the **Responses API**.

## What Are Reasoning Models?

Reasoning models emit **hidden reasoning tokens** before generating their final answer. This allows them to:
- Break down complex problems into steps
- Explore multiple approaches before committing to an answer
- Verify their work and catch mistakes
- Handle multi-step tasks more reliably

Think of it like showing your work in math class - the model "thinks through" the problem before answering.

## GPT-5.2 Overview

| Feature | Value |
|---------|-------|
| Context Window | 400K tokens |
| Input Cost | $1.75 per M tokens |
| Cached Input | $0.175 per M tokens |
| Output Cost | $14.00 per M tokens |

## Table of Contents

1. [Setup](#setup)
2. [Responses API Basics](#responses-api)
3. [Reasoning Effort Parameter](#reasoning-effort)
4. [Verbosity Control](#verbosity)
5. [Tool Use with Reasoning](#tools)
6. [Structured Outputs](#structured)
7. [Best Practices](#best-practices)
8. [Practical Examples](#examples)
9. [Summary](#summary)

<a id='setup'></a>
## 1. Setup

First, let's set up our environment and initialize the OpenAI client.

In [None]:
import os
import getpass
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables from .env file (if it exists)
load_dotenv()

# Set API key
if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

client = OpenAI()
print("OpenAI client initialized")

<a id='responses-api'></a>
## 2. Responses API Basics

GPT-5.2 uses the **Responses API** (`client.responses.create`) instead of the older Chat Completions API. This API is specifically designed for reasoning models.

### Key Differences from Chat Completions

| Feature | Chat Completions | Responses API |
|---------|-----------------|---------------|
| Method | `chat.completions.create()` | `responses.create()` |
| Messages | `messages=[...]` | `input=[...]` |
| Roles | system, user, assistant | developer, user |
| Output | `response.choices[0].message.content` | `response.output_text` |
| Reasoning | Not available | `reasoning={"effort": "..."}` |

In [None]:
# Basic usage pattern with the Responses API
response = client.responses.create(
    model="gpt-5.2",
    input=[
        {
            "role": "developer",
            "content": "You are a helpful assistant that provides clear, concise answers."
        },
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ]
)

print(f"Answer: {response.output_text}")
print(f"Total tokens: {response.usage.total_tokens}")

<a id='reasoning-effort'></a>
## 3. Reasoning Effort Parameter

GPT-5.2 supports five reasoning effort levels:

| Level | Use When | Speed | Cost | Quality |
|-------|----------|-------|------|----------|
| **none** | No reasoning needed, fastest response | Fastest | Lowest | Basic |
| **low** | Simple tasks with minimal reasoning | Fast | Low | Good |
| **medium** | Balanced default for most workflows | Moderate | Moderate | Great |
| **high** | Complex multi-step tasks, critical accuracy | Slower | Higher | Best |
| **xhigh** | Extremely complex problems, maximum accuracy | Slowest | Highest | Maximum |

**Key Insight**: Higher effort = more reasoning tokens = better accuracy but higher latency/cost

In [None]:
# Example: No reasoning - Quick classification
response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "none"},
    input=[
        {
            "role": "developer",
            "content": "Classify the sentiment as: positive, neutral, or negative. Return only one word."
        },
        {
            "role": "user",
            "content": "The new update completely broke my workflow. Very disappointed."
        }
    ]
)

print(f"Sentiment: {response.output_text}")
print(f"Total tokens: {response.usage.total_tokens}")

In [None]:
# Example: Medium reasoning - Code generation
prompt = """
Write a Python function that validates an email address.
Requirements:
- Check for @ symbol
- Verify domain has at least one dot
- Ensure no spaces
- Return True/False

Include a docstring and 2-3 test cases.
"""

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "medium"},
    input=[{"role": "user", "content": prompt}]
)

print(response.output_text)
print(f"\n{'='*60}")
print(f"Reasoning tokens: {response.usage.output_tokens_details.reasoning_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

In [None]:
# Example: High reasoning - Complex algorithmic problem
problem = """
Design an algorithm to find the longest palindromic substring in a string.

Requirements:
- Handle edge cases (empty string, single character, no palindromes)
- Optimize for time complexity
- Provide the implementation in Python
- Explain the approach and time/space complexity
"""

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "high"},
    input=[{"role": "user", "content": problem}]
)

print(response.output_text)
print(f"\n{'='*60}")
print(f"Reasoning tokens: {response.usage.output_tokens_details.reasoning_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

In [None]:
# Side-by-side comparison of reasoning efforts
task = """
You have a list of meeting times in 24-hour format as strings: 
["09:00-10:30", "10:00-11:00", "14:00-15:30", "15:00-16:00"]

Write a Python function that finds all overlapping meetings.
Return a list of tuples showing which meetings overlap.
"""

efforts = ["none", "low", "medium", "high"]
results = {}

for effort in efforts:
    response = client.responses.create(
        model="gpt-5.2",
        reasoning={"effort": effort},
        input=[{"role": "user", "content": task}]
    )
    
    results[effort] = {
        "output": response.output_text,
        "reasoning_tokens": response.usage.output_tokens_details.reasoning_tokens,
        "output_tokens": response.usage.output_tokens,
        "total_tokens": response.usage.total_tokens
    }

# Display comparison
for effort in efforts:
    print(f"\n{'='*70}")
    print(f"REASONING EFFORT: {effort.upper()}")
    print(f"{'='*70}")
    print(f"Token Usage: Reasoning={results[effort]['reasoning_tokens']}, Output={results[effort]['output_tokens']}, Total={results[effort]['total_tokens']}")
    print(f"\nOutput preview: {results[effort]['output'][:300]}...")

<a id='verbosity'></a>
## 4. Verbosity Control

GPT-5.2 introduces a **verbosity** parameter to control output length. This is **independent from reasoning depth** - you can have high reasoning with concise output, or low reasoning with verbose output.

| Verbosity | Description |
|-----------|-------------|
| **low** | Concise, bullet-point style answers |
| **medium** | Balanced explanations (default) |
| **high** | Detailed, comprehensive responses |

In [None]:
question = "What is dependency injection and why is it useful?"

# Try different verbosity levels
for verbosity in ["low", "medium", "high"]:
    response = client.responses.create(
        model="gpt-5.2",
        reasoning={"effort": "medium"},
        text={"verbosity": verbosity},
        input=[{"role": "user", "content": question}]
    )
    
    print(f"\n{'='*60}")
    print(f"VERBOSITY: {verbosity.upper()}")
    print(f"{'='*60}")
    print(response.output_text[:500] + "..." if len(response.output_text) > 500 else response.output_text)
    print(f"\nOutput tokens: {response.usage.output_tokens}")

<a id='tools'></a>
## 5. Tool Use with Reasoning

GPT-5.2 can use tools (function calling) while reasoning. The model will think about which tools to use and how to use them.

In [None]:
# Define tools for function calling
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g., 'San Francisco'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "medium"},
    tools=tools,
    input=[
        {
            "role": "user",
            "content": "What's the weather like in Tokyo and New York today?"
        }
    ]
)

print("Response:", response.output_text if response.output_text else "Tool calls requested")
if hasattr(response, 'output') and response.output:
    for item in response.output:
        if hasattr(item, 'type') and item.type == 'function_call':
            print(f"Tool call: {item.name}({item.arguments})")

In [None]:
# Built-in web search tool
response = client.responses.create(
    model="gpt-5.2",
    tools=[{"type": "web_search_preview"}],
    input=[{"role": "user", "content": "What were the major tech announcements this week?"}]
)

print(response.output_text)

<a id='structured'></a>
## 6. Structured Outputs

Use Pydantic models to get structured, typed responses from GPT-5.2.

In [None]:
from pydantic import BaseModel, Field
from typing import List

class CodeReview(BaseModel):
    """Structured code review output"""
    issues: List[str] = Field(description="List of identified issues")
    suggestions: List[str] = Field(description="Improvement suggestions")
    severity: str = Field(description="Overall severity: low, medium, high")
    score: int = Field(description="Code quality score from 1-10")

code_to_review = """
def calculate_average(numbers):
    total = 0
    for num in numbers:
        total += num
    return total / len(numbers)
"""

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "medium"},
    input=[
        {
            "role": "developer",
            "content": "Review code for bugs, edge cases, and improvements."
        },
        {
            "role": "user",
            "content": f"Review this code:\n```python\n{code_to_review}\n```"
        }
    ],
    text={"format": {"type": "json_schema", "schema": CodeReview.model_json_schema()}}
)

import json
review = CodeReview(**json.loads(response.output_text))
print(f"Score: {review.score}/10")
print(f"Severity: {review.severity}")
print(f"\nIssues:")
for issue in review.issues:
    print(f"  - {issue}")
print(f"\nSuggestions:")
for suggestion in review.suggestions:
    print(f"  - {suggestion}")

<a id='best-practices'></a>
## 7. Best Practices

### Write Briefs, Not Prompts

Reasoning models work best with comprehensive context, not chat-style prompts.

### Focus on WHAT, Not HOW

Let the model decide how to approach the problem. Don't micromanage the reasoning process.

### Don't Ask for Chain-of-Thought

Reasoning models already think internally - asking them to "think step-by-step" can actually degrade performance.

In [None]:
# BAD: Chat-style prompt
bad_prompt = "Can you help me optimize this function?"

# GOOD: Brief-style prompt with full context
good_prompt = """
CONTEXT:
I'm building a high-performance mathematics library for a financial trading system.
The library needs to handle real-time calculations with microsecond precision.

CURRENT IMPLEMENTATION:
def calculate_fibonacci(n):
    if n <= 1:
        return n
    return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)

PROBLEMS:
- Exponential time complexity O(2^n)
- Stack overflow for n > 1000
- Called millions of times per second in production

REQUIREMENTS:
1. Optimize for speed (target: < 1 microsecond for n < 100)
2. Handle large values (n up to 10,000)
3. Thread-safe implementation

DELIVERABLE:
Provide a production-ready Python implementation with time complexity analysis.
"""

print("BAD (lazy prompt):")
print(bad_prompt)
print("\n" + "="*60 + "\n")
print("GOOD (comprehensive brief):")
print(good_prompt)

In [None]:
# Choosing the Right Effort Level
effort_guide = {
    "none": ["Simple classification", "Data extraction", "Formatting"],
    "low": ["Basic Q&A", "Simple code fixes", "Summarization"],
    "medium": ["Code generation", "Data analysis", "Most general tasks"],
    "high": ["Algorithm design", "Complex debugging", "Architecture planning"],
    "xhigh": ["Research problems", "Novel algorithm design", "Critical accuracy tasks"]
}

print("Effort Level Selection Guide:")
print("="*60)
for effort, use_cases in effort_guide.items():
    print(f"\n{effort.upper()}:")
    for case in use_cases:
        print(f"  - {case}")

<a id='examples'></a>
## 8. Practical Examples

In [None]:
# Example: Code Review with Medium Reasoning
buggy_code = '''
def calculate_average(numbers):
    total = 0
    for num in numbers:
        total += num
    return total / len(numbers)

def process_user_data(data):
    result = {}
    for item in data:
        result[item['id']] = item['name'].upper()
    return result
'''

prompt = f"""
Review this Python code for potential bugs, edge cases, and issues.
For each issue found:
1. Describe the problem
2. Show what input would cause it to fail
3. Provide a fix

Code:
```python
{buggy_code}
```
"""

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "medium"},
    input=[{"role": "user", "content": prompt}]
)

print(response.output_text)

In [None]:
# Example: Business Analysis with High Reasoning
analysis_brief = """
COMPANY CONTEXT:
TechStartup Inc. - B2B SaaS platform for inventory management
- Current MRR: $125,000
- Customer Count: 187
- Monthly Churn Rate: 5.2%
- Customer Acquisition Cost (CAC): $3,200
- Customer Lifetime Value (CLV): $18,500

CHALLENGE:
High churn in SMB segment (8.5% monthly) vs Enterprise (2.1% monthly)

DELIVERABLE:
Provide 3 prioritized recommendations to reduce SMB churn, with expected impact.
"""

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "high"},
    text={"verbosity": "medium"},
    input=[{"role": "user", "content": analysis_brief}]
)

print(response.output_text)

In [None]:
# Example: Algorithm Design with High Reasoning
algorithm_brief = """
PROBLEM:
Design a rate limiter for an API gateway.

REQUIREMENTS:
- Support 10,000 requests/second per user
- Distributed across 50 servers
- Must handle clock skew between servers
- Support burst traffic (allow 2x normal rate for 5 seconds)

DELIVERABLE:
1. Algorithm design with pseudocode
2. Data structure choices with justification
3. How to handle the distributed nature
4. Trade-offs and limitations
"""

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "high"},
    input=[{"role": "user", "content": algorithm_brief}]
)

print(response.output_text)

<a id='summary'></a>
## 9. Summary

### Key Takeaways

1. **Use the Responses API** - `client.responses.create()` is designed for reasoning models

2. **Choose the right effort level**:
   - `none`/`low`: Simple tasks, fast responses
   - `medium`: Default for most tasks (best balance)
   - `high`/`xhigh`: Complex problems requiring deep reasoning

3. **Control verbosity separately** - Reasoning depth and output length are independent

4. **Write briefs, not prompts** - Provide comprehensive context for best results

5. **Don't ask for chain-of-thought** - The model reasons internally automatically

6. **Use developer role** - Set system-level context with `{"role": "developer"}`

### When to Use Reasoning Models

**Great Use Cases:**
- Complex code generation and debugging
- Multi-step problem solving
- Data analysis with multiple considerations
- Algorithm design
- Strategic planning

**Probably Overkill:**
- Simple text generation
- Basic Q&A
- Translation
- Simple formatting tasks

### Resources

- [OpenAI Responses API Documentation](https://platform.openai.com/docs/api-reference/responses)
- [GPT-5.2 Model Card](https://platform.openai.com/docs/models#gpt-5)
- [Reasoning Best Practices](https://platform.openai.com/docs/guides/reasoning)