# Using GPT-5 - OpenAI API Guide

## Introduction to OpenAI's Most Intelligent Model

GPT-5 is OpenAI's most advanced reasoning model, specifically trained for:
- **Code generation**, bug fixing, and refactoring
- **Instruction following** with high accuracy
- **Long context** handling and **tool calling** for agentic tasks

This notebook demonstrates GPT-5's key features with practical code examples.

## Setup

Install the OpenAI Python client if you haven't already:

In [None]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"var: ")

_set_env("OPENAI_API_KEY")

In [None]:
# Install OpenAI client
# Run this if you're running this notebook on google colab
!pip install openai

In [1]:
from openai import OpenAI
import os

# Initialize client
client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY")  # Set your API key as environment variable
)

## GPT-5 Model Variants

| Model | Best For | Trade-offs |
|-------|----------|------------|
| **`gpt-5`** | Complex reasoning, broad knowledge, code-heavy tasks | Highest capability, higher latency |
| **`gpt-5-mini`** | Cost-optimized reasoning and chat | Balanced speed, cost, and capability |
| **`gpt-5-nano`** | High-throughput, simple tasks | Fastest, most cost-effective |

System card uses different names than the API, OpenAI provides this mapping table:

<img src="../assets/map-api-system-names.png" width=40%>

source: https://platform.openai.com/docs/guides/latest-model

## Quickstart: Fast Responses with Low Reasoning

For faster, lower-latency responses similar to GPT-4.1, use **low reasoning effort** and **low verbosity**:

In [3]:
# Fast response with minimal reasoning
result = client.responses.create(
    model="gpt-5",
    input="Write a haiku about code.",
    reasoning={"effort": "low"},
)

print("Output:", result.output_text)
print("\nReasoning tokens used:", len(result.reasoning_text.split()) if hasattr(result, 'reasoning_text') else 'N/A')

Output: Silent loops hum on—
logic weaves midnight patterns,
bugs bloom, dawn reveals.

Reasoning tokens used: N/A


## Reasoning Effort Control

GPT-5 supports four reasoning levels: `minimal` (new one!) `low`, `medium`, `high`

- **`minimal`**: Fastest time-to-first-token, best for coding & instruction following
- **`low`**: Quick responses with light reasoning
- **`medium`**: Default, balanced reasoning (similar to o3)
- **`high`**: Most thorough reasoning for complex problems

In [4]:
# Minimal reasoning for fastest response
response_minimal = client.responses.create(
    model="gpt-5",
    input="Write a Python function to check if a number is prime.",
    reasoning={"effort": "minimal"}
)

print("Minimal Reasoning Output:")
print(response_minimal.output_text)

Minimal Reasoning Output:
Here’s a concise and efficient Python function to check if an integer is prime:

```python
import math

def is_prime(n: int) -> bool:
    # Handle small and negative numbers
    if n <= 1:
        return False
    if n <= 3:
        return True  # 2 and 3 are prime
    if n % 2 == 0 or n % 3 == 0:
        return False

    # Check divisibility by numbers of the form 6k ± 1 up to sqrt(n)
    limit = int(math.isqrt(n))
    i = 5
    while i <= limit:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True
```

Notes:
- Uses early exits for small cases and even/3-multiples.
- Only tests potential factors up to sqrt(n).
- Iterates using 6k ± 1 optimization for fewer checks.


In [5]:
# High reasoning for complex problems
response_high = client.responses.create(
    model="gpt-5",
    input="How much gold would it take to coat the Statue of Liberty in a 1mm layer? Show your calculations.",
    reasoning={"effort": "high"}
)

print("High Reasoning Output:")
print(response_high.output_text)

High Reasoning Output:
Short answer: about 25–30 metric tons of gold. A reasonable estimate using the statue’s known copper skin gives about 25.5 metric tons.

Assumptions and data
- Coating only the statue (not the pedestal), exterior surface only
- Copper skin thickness of the statue: 3/32 in = 0.09375 in = 0.00238125 m
- Mass of the statue’s copper skin: 62,000 lb = 28,124 kg
- Density of copper ρCu = 8,960 kg/m^3
- Density of gold ρAu = 19,320 kg/m^3
- Desired gold thickness tAu = 1 mm = 0.001 m

Steps
1) Infer the statue’s exterior surface area A from the copper skin:
   A = mCu / (ρCu × tCu)
     = 28,124 kg / (8,960 kg/m^3 × 0.00238125 m)
     ≈ 28,124 / 21.336
     ≈ 1,318 m^2

2) Volume of gold needed for a 1 mm coat:
   VAu = A × tAu = 1,318 m^2 × 0.001 m = 1.318 m^3

3) Mass of gold:
   mAu = ρAu × VAu = 19,320 kg/m^3 × 1.318 m^3 ≈ 25,458 kg

Result
- Gold required ≈ 25,500 kg ≈ 25.5 metric tons (≈ 28.1 short tons)

Note
- If you prefer to use a different surface-area estima

## Verbosity Control

Control output length with `verbosity` parameter:
- **`low`**: Concise answers, minimal code comments
- **`medium`**: Balanced explanations (default)
- **`high`**: Thorough explanations, detailed code documentation

In [6]:
# Low verbosity for concise responses
response_concise = client.responses.create(
    model="gpt-5",
    input="Generate a SQL query to find the top 5 customers by total purchase amount.",
    text={"verbosity": "low"}
)

print("Concise Output:")
print(response_concise.output_text)

Concise Output:
SELECT
  c.id AS customer_id,
  c.name AS customer_name,
  SUM(oi.quantity * oi.unit_price) AS total_purchase
FROM customers c
JOIN orders o        ON o.customer_id = c.id
JOIN order_items oi  ON oi.order_id = o.id
WHERE o.status = 'completed'
GROUP BY c.id, c.name
ORDER BY total_purchase DESC
LIMIT 5;


In [7]:
# High verbosity for detailed explanations
response_detailed = client.responses.create(
    model="gpt-5",
    input="Explain how async/await works in JavaScript.",
    text={"verbosity": "high"}
)

print("Detailed Output:")
print(response_detailed.output_text[:500] + "...")  # Truncated for display

Detailed Output:
Async/await is JavaScript’s way to write asynchronous code that looks and reads like synchronous code, while still being non-blocking under the hood. It is built on top of Promises and the event loop.

Core ideas
- A Promise represents a value that will be available later (fulfilled) or fail (rejected).
- An async function always returns a Promise.
  - If you return a value, it becomes Promise.resolve(value).
  - If you throw, it becomes Promise.reject(error).
- await “pauses” the async function...


- High verbosity: Use when you need the model to provide thorough explanations of documents or perform extensive code refactoring.

- Low verbosity: Best for situations where you want concise answers or simple code generation, such as SQL queries.

## Custom Tools: Freeform Text Inputs

GPT-5 introduces **custom tools** that accept raw text instead of structured JSON. Perfect for:
- Executing code snippets
- SQL queries
- Shell commands
- Configuration files

In [28]:
# Define a custom tool for code execution
response_with_tool = client.responses.create(
    model="gpt-5",
    input="Use the code_exec tool to calculate the factorial of 10.",
    tools=[
        {
            "type": "custom",
            "name": "code_exec",
            "description": "Executes arbitrary Python code that prints something and returns the result"
        }
    ]
)

print("Tool Call Generated:")
for reasoning_output in response_with_tool.output:
    if reasoning_output.type == "custom_tool_call":
        print(f"Tool: {reasoning_output.name}")
        print(f"Input: {reasoning_output.input}")

Tool Call Generated:
Tool: code_exec
Input: import math
print(math.factorial(10))


In [29]:
def code_exec(code: str) -> str:
    """
    Executes arbitrary Python code and returns the result as a string.
    WARNING: This function is dangerous and should only be used in secure, sandboxed environments.
    """
    import sys
    import io
    import traceback

    # Redirect stdout to capture print statements
    old_stdout = sys.stdout
    sys.stdout = io.StringIO()
    result = ""
    try:
        # Try to compile as an expression first
        try:
            compiled = compile(code, "<string>", "eval")
            output = eval(compiled, {}, {})
            if output is not None:
                print(output)
        except SyntaxError:
            # If not an expression, treat as statements
            exec(code, {}, {})
        result = sys.stdout.getvalue()
    except Exception:
        result = "Error:\n" + traceback.format_exc()
    finally:
        sys.stdout = old_stdout
    return result.strip()

code_exec(reasoning_output.input)

'3628800'

## Context-Free Grammar (CFG) Constraints

Constrain custom tool outputs to specific syntax using Lark grammars (or regex grammrs). 

We'll see a detailed example in notebook 2.0.

## Allowed Tools: Selective Tool Access

Define a full toolkit but restrict which tools can be used in specific contexts:

In [34]:
# Define multiple tools but restrict usage
all_tools = [
    {"type": "function", "name": "get_weather", "description": "Get current weather"},
    {"type": "function", "name": "search_docs", "description": "Search documentation"},
    {"type": "function", "name": "run_tests", "description": "Execute test suite"},
    {"type": "function", "name": "deploy_code", "description": "Deploy to production"}
]

# Only allow safe operations
response_restricted = client.responses.create(
    model="gpt-5",
    input="What's the weather like and can you search for React hooks documentation?",
    tools=all_tools,
    tool_choice={
        "type": "allowed_tools",
        "mode": "auto",  # Model decides which to use
        "tools": [
            {"type": "function", "name": "get_weather"},
            {"type": "function", "name": "search_docs"}
        ]
    }
)

response_restricted.output

[ResponseReasoningItem(id='rs_68d691511d5081978f965991433591860257326b2b51fa3d', summary=[], type='reasoning', content=None, encrypted_content=None, status=None),
 ResponseFunctionToolCall(arguments='{}', call_id='call_jmfCsbT4SMmwhJpplgvZojSl', name='get_weather', type='function_call', id='fc_68d69155d3148197a52b0a1e93bd5dfb0257326b2b51fa3d', status='completed'),
 ResponseFunctionToolCall(arguments='{}', call_id='call_PArxuLpuaC7kF6Bq01RPvkMr', name='search_docs', type='function_call', id='fc_68d69155e72081979fbeedce747ba4230257326b2b51fa3d', status='completed')]

In [44]:
response_restricted.tool_choice.tools

[{'type': 'function', 'name': 'get_weather'},
 {'type': 'function', 'name': 'search_docs'}]

## Tool Preambles for Transparency

Enable preambles to see GPT-5's reasoning before tool calls:

In [45]:
# Enable preambles for tool transparency
response_preamble = client.responses.create(
    model="gpt-5",
    input="Before you call a tool, explain why you are calling it. Now search for information about Python decorators.",
    tools=[
        {"type": "function", "name": "search_docs", "description": "Search documentation"}
    ]
)

print("Response with preamble:")
print(response_preamble.output_text)

Response with preamble:
I will use the search tool to fetch authoritative, up-to-date references and tutorials on Python decorators so I can provide accurate and concise information.


## Migration Guide

### From Other Models to GPT-5

| From Model | Migrate To | Recommended Settings |
|------------|------------|---------------------|
| **o3** | `gpt-5` | `reasoning.effort: "medium"` or `"high"` |
| **gpt-4.1** | `gpt-5` | `reasoning.effort: "minimal"` or `"low"` |
| **o4-mini** | `gpt-5-mini` | Default settings with prompt tuning |
| **gpt-4.1-nano** | `gpt-5-nano` | Default settings with prompt tuning |

### ⚠️ Important: Unsupported Parameters

GPT-5 does **NOT** support:
- `temperature`
- `top_p` 
- `logprobs`

Use GPT-5-specific controls instead:
- `reasoning: {effort: ...}`
- `text: {verbosity: ...}`
- `max_output_tokens`

## Responses API vs Chat Completions

### Key Advantage: Chain of Thought (CoT) Persistence

The Responses API passes reasoning between turns, resulting in:
- Improved intelligence
- Fewer reasoning tokens generated
- Higher cache hit rates
- Lower latency

In [46]:
# Multi-turn conversation with CoT persistence
first_response = client.responses.create(
    model="gpt-5",
    input="Let's solve a complex problem. What's the optimal way to implement a LRU cache in Python?",
    reasoning={"effort": "medium"}
)

print("First response:", first_response.output_text[:200] + "...")

# Continue conversation, passing previous response ID
follow_up = client.responses.create(
    model="gpt-5",
    input="Now add thread-safety to that implementation.",
    previous_response_id=first_response.id  # Passes CoT from previous turn
)

print("\nFollow-up (with CoT context):", follow_up.output_text[:200] + "...")

First response: “Optimal” depends on what you need:

- Fastest and simplest for function memoization: use functools.lru_cache (implemented in C).
- General-purpose key/value LRU cache with O(1) get/put: use OrderedDi...

Follow-up (with CoT context): Here’s a thread-safe LRU cache (O(1) get/put) built on OrderedDict with an internal RLock. It snapshots iteration results to avoid races during traversal.

from collections import OrderedDict
import t...


## Best Practices

### 1. Choose the Right Model
- **`gpt-5`**: Complex reasoning, coding, multi-step tasks
- **`gpt-5-mini`**: General chat, moderate complexity
- **`gpt-5-nano`**: Simple tasks, high throughput

### 2. Optimize for Your Use Case
- **Speed Priority**: Use `minimal` reasoning + `low` verbosity
- **Quality Priority**: Use `high` reasoning + `high` verbosity
- **Balanced**: Use defaults (`medium` for both)

### 3. Leverage New Features
- **Custom Tools**: For code execution, SQL, configs
- **Allowed Tools**: For safety and predictability
- **Preambles**: For debugging and transparency

### 4. Use Responses API for Multi-turn
- Always pass `previous_response_id` for context
- Reduces re-reasoning and improves coherence

## Practical Example: Building a Code Assistant

In [47]:
def code_assistant(task, code_context=None, optimize_for="balanced"):
    """
    GPT-5 powered code assistant with configurable optimization.
    
    Args:
        task: What you want the assistant to do
        code_context: Existing code to work with
        optimize_for: "speed", "quality", or "balanced"
    """
    
    # Configure based on optimization preference
    settings = {
        "speed": {"reasoning": {"effort": "minimal"}, "text": {"verbosity": "low"}},
        "quality": {"reasoning": {"effort": "high"}, "text": {"verbosity": "high"}},
        "balanced": {"reasoning": {"effort": "medium"}, "text": {"verbosity": "medium"}}
    }
    
    config = settings.get(optimize_for, settings["balanced"])
    
    # Build the prompt
    prompt = task
    if code_context:
        prompt = f"Task: {task}\n\nExisting code:\n```python\n{code_context}\n```"
    
    # Call GPT-5
    response = client.responses.create(
        model="gpt-5",
        input=prompt,
        **config
    )
    
    return response.output_text

# Example usage
result = code_assistant(
    task="Add error handling and logging to this function",
    code_context="""def divide(a, b):
    return a / b""",
    optimize_for="quality"
)

print(result)

Here’s a version with error handling and logging:

```python
import logging

logger = logging.getLogger(__name__)

def divide(a, b):
    """
    Divide a by b with logging and error handling.

    - Logs inputs and result at DEBUG level.
    - Logs and re-raises exceptions for invalid inputs or division by zero.
    """
    try:
        logger.debug("divide called with a=%r, b=%r", a, b)
        result = a / b
    except ZeroDivisionError:
        logger.exception("Division by zero: a=%r, b=%r", a, b)
        raise
    except TypeError:
        logger.exception(
            "Unsupported operand types for division: a=%r (%s), b=%r (%s)",
            a, type(a).__name__, b, type(b).__name__
        )
        raise
    except Exception:
        logger.exception("Unexpected error during division: a=%r, b=%r", a, b)
        raise
    else:
        logger.debug("division result: %r", result)
        return result
```

Note:
- In application code, configure logging once (avoid doing this in l

## Conclusion

GPT-5 represents a significant leap in AI reasoning capabilities. Key takeaways:

1. **Use the Responses API** for multi-turn conversations to leverage CoT persistence
2. **Configure reasoning and verbosity** based on your latency/quality requirements  
3. **Leverage new features** like custom tools and allowed tools for better control
4. **Choose the right model variant** (gpt-5, gpt-5-mini, gpt-5-nano) for your use case
5. **Migrate gradually** using the recommended settings for your current model

For more information:
- [GPT-5 System Card](https://openai.com/index/gpt-5-system-card/)
- [GPT-5 Prompting Guide](https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide)
- [GPT-5 Frontend Development](https://cookbook.openai.com/examples/gpt-5/gpt-5_frontend)
- [API Documentation](https://platform.openai.com/docs/guides/latest-model)