# Introduction to OpenAI Responses API

The Responses API is OpenAI's most advanced interface for generating model responses. It provides a powerful, stateful way to interact with models, supporting text and image inputs with text or JSON outputs. The API enables multi-turn conversations, tool integration (web search, file search, code interpreter, computer use), function calling for custom code execution, and advanced features like structured outputs and reasoning models. This notebook demonstrates all core operations including creating responses, managing conversation state, and understanding the response lifecycle.

## Setup

First, let's import the OpenAI library and initialize our client.

In [1]:
from openai import OpenAI
import os
import json

# Initialize the client
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

## 1. Create a Model Response

Creating a response generates model output from text, image, or file inputs. The API supports sophisticated features like tool integration, structured outputs, function calling, and conversation continuity via `previous_response_id` or `conversation` parameters. Responses can run synchronously or asynchronously (background mode) and support streaming for real-time output.

### Key Parameters

- **`model`** (string, optional): Model ID like `"gpt-5-mini"` or `"gpt-5"`. Determines capabilities, performance, and cost.
- **`input`** (string/array, optional): Text, images, or files to process. Can be a simple string or complex array of items.
- **`instructions`** (string, optional): System/developer message for context. Not carried over when using `previous_response_id`.
- **`temperature`** (number, 0-2, default 1): Controls randomness. Higher = more creative, lower = more deterministic.
- **`max_output_tokens`** (integer, optional): Token limit for response including reasoning tokens.
- **`tools`** (array, optional): Built-in tools (web search, file search) or custom functions the model can call.
- **`tool_choice`** (string/object, optional): Controls tool selection ("auto", "required", "none", or specific tool).
- **`text`** (object, optional): Configuration for text output format (plain text or structured JSON).
- **`previous_response_id`** (string, optional): ID of previous response for multi-turn conversations.
- **`conversation`** (string/object, optional): Conversation ID or object for persistent context.
- **`metadata`** (map, optional): Custom key-value pairs (16 max, 64/512 char limits).
- **`background`** (boolean, default false): Run asynchronously, enabling cancellation.
- **`stream`** (boolean, default false): Stream response data in real-time via server-sent events.
- **`store`** (boolean, default true): Whether to store response for later retrieval.

In [2]:
# Simple text response
response = client.responses.create(
    model="gpt-5-mini",
    input="Explain quantum entanglement in two sentences suitable for a high school student."
)

print(f"Response ID: {response.id}")
print(f"Model: {response.model}")
print(f"Status: {response.status}")
print(f"\nOutput Text:\n{response.output_text}")
print(f"\nToken Usage:")
print(f"  Input: {response.usage.input_tokens}")
print(f"  Output: {response.usage.output_tokens}")
print(f"  Total: {response.usage.total_tokens}")

Response ID: resp_07156feef9c847ff006926ed7097fc81909ee0b1dc4dd9740d
Model: gpt-5-mini-2025-08-07
Status: completed

Output Text:
Quantum entanglement is a quantum phenomenon where two or more particles become linked so that measuring a property (like spin or polarization) of one immediately fixes the corresponding property of the other, even if they are far apart. Although the outcomes are random and can't be used to send information faster than light, the measurements show stronger correlations than you'd expect from ordinary, non-quantum objects.

Token Usage:
  Input: 21
  Output: 339
  Total: 360


In [4]:
# Response with system instructions and temperature control
creative_response = client.responses.create(
    model="gpt-5-mini",
    input="Write a haiku about artificial intelligence.",
    instructions="You are a creative poet who specializes in concise, evocative imagery.",
    metadata={"type": "poetry", "format": "haiku"}
)

print(f"Creative Response ({creative_response.temperature} temperature):\n")
print(creative_response.output_text)
print(f"\nMetadata: {creative_response.metadata}")

Creative Response (1.0 temperature):

Glass brain hums softly
learning the hush of voices
Dreams without heartbeat

Metadata: {'type': 'poetry', 'format': 'haiku'}


In [5]:
# Multi-turn conversation using previous_response_id
first_response = client.responses.create(
    model="gpt-5-mini",
    input="What are the three laws of thermodynamics?"
)

print("First Response:")
print(first_response.output_text)
print(f"\nResponse ID: {first_response.id}")

# Follow-up question using previous response
second_response = client.responses.create(
    model="gpt-5-mini",
    input="Can you give me a real-world example of the second law?",
    previous_response_id=first_response.id
)

print("\n" + "="*50)
print("Follow-up Response:")
print(second_response.output_text)

First Response:
Usually stated as four fundamental principles if you include the Zeroth law, but classically the “three laws” are the Zeroth (often numbered 0), First, Second and Third. Briefly:

- Zeroth law (thermal equilibrium / temperature): If A is in thermal equilibrium with B and B is in thermal equilibrium with C, then A is in thermal equilibrium with C. This justifies the existence of temperature.

- First law (energy conservation): Energy is conserved. For a closed system, change in internal energy equals heat added minus work done by the system:
  dU = δQ − δW
  (or ΔU = Q − W).

- Second law (direction of processes / entropy): In any real (irreversible) process the total entropy of an isolated system never decreases:
  ΔS_total ≥ 0.
  For reversible heat transfer, dS = δQ_rev/T. This law also implies limits on heat-engine efficiency (Carnot efficiency η ≤ 1 − Tc/Th).

- Third law (zero‑temperature entropy / unattainability): As temperature approaches absolute zero, the entr

## 2. Get a Model Response

Retrieving a response allows you to access previously generated outputs by ID. This is essential for asynchronous workflows, conversation history retrieval, or inspecting stored responses. The endpoint supports streaming mode to fetch response data as it's being generated if the response is still in progress.

### Parameters

- **`response_id`** (string, required): The unique ID of the response to retrieve.
- **`include`** (array, optional): Additional data to include (web search sources, code outputs, image URLs, logprobs).
- **`stream`** (boolean, optional): Stream the response data if still generating.
- **`starting_after`** (integer, optional): Sequence number to start streaming from.
- **`include_obfuscation`** (boolean, optional): Include random chars to normalize payload sizes (security feature).

In [6]:
# Retrieve a previously created response
retrieved_response = client.responses.retrieve(response.id)

print(f"Retrieved Response ID: {retrieved_response.id}")
print(f"Created At: {retrieved_response.created_at}")
print(f"Status: {retrieved_response.status}")
print(f"Model: {retrieved_response.model}")
print(f"\nOutput:")
print(retrieved_response.output_text)

# Check if there was an error
if retrieved_response.error:
    print(f"\nError: {retrieved_response.error}")
else:
    print(f"\nNo errors - response completed successfully")

Retrieved Response ID: resp_07156feef9c847ff006926ed7097fc81909ee0b1dc4dd9740d
Created At: 1764158832.0
Status: completed
Model: gpt-5-mini-2025-08-07

Output:
Quantum entanglement is a quantum phenomenon where two or more particles become linked so that measuring a property (like spin or polarization) of one immediately fixes the corresponding property of the other, even if they are far apart. Although the outcomes are random and can't be used to send information faster than light, the measurements show stronger correlations than you'd expect from ordinary, non-quantum objects.

No errors - response completed successfully


## 3. Delete a Model Response

Deleting a response permanently removes it from storage. This is useful for managing data retention, removing sensitive information, or cleaning up test responses. The operation returns a deletion confirmation.

### Parameters

- **`response_id`** (string, required): The ID of the response to delete.

In [7]:
# Create a test response to delete
test_response = client.responses.create(
    model="gpt-5-mini",
    input="This is a test response for deletion."
)

print(f"Created test response: {test_response.id}")

# Note: Uncomment to actually delete the response
# deletion_result = client.responses.delete(test_response.id)
# print(f"\nDeletion Result:")
# print(f"  ID: {deletion_result.id}")
# print(f"  Deleted: {deletion_result.deleted}")

print("\nSkipping actual deletion to preserve response for following examples")

Created test response: resp_0685f13b7b552e71006926edcb8b2081958c93f7410fd8856e

Skipping actual deletion to preserve response for following examples


## 4. Cancel a Response

Cancelling a response stops an in-progress generation. This operation only works for responses created with `background=true`. Cancellation is useful for managing long-running requests, handling user interruptions, or stopping unnecessary computation when requirements change.

### Parameters

- **`response_id`** (string, required): The ID of the background response to cancel.

In [None]:
# Create a background response that can be cancelled
background_response = client.responses.create(
    model="gpt-5-mini",
    input="Write a detailed essay about the history of computing, covering at least 10 major milestones.",
    background=True,
    max_output_tokens=2000
)

print(f"Created background response: {background_response.id}")
print(f"Initial Status: {background_response.status}")

# Note: Code to cancel the response
import time
time.sleep(1)  # Wait a moment
cancelled_response = client.responses.cancel(background_response.id)
print(f"\nCancelled Response Status: {cancelled_response.status}")

Created background response: resp_01043e89a93740c6006926edf9b55c81959e1327b006edc498
Initial Status: queued

Cancelled Response Status: cancelled


## 5. List Input Items

Listing input items retrieves all content that was provided to the model when generating a specific response. This includes user messages, system instructions, images, files, and context from previous responses. The endpoint supports pagination and ordering, making it ideal for auditing, debugging, or understanding what context the model had when generating output.

### Parameters

- **`response_id`** (string, required): The response ID to list input items from.
- **`limit`** (integer, optional): Number of items to return (1-100, default 20).
- **`order`** (string, optional): Sort order - `"asc"` or `"desc"` (default).
- **`after`** (string, optional): Item ID for pagination cursor.
- **`include`** (array, optional): Additional data to include (images, logprobs, etc.).

In [12]:
# List input items from a previous response
input_items = client.responses.input_items.list(
    response.id,
    limit=10,
    order="asc"
)

print(f"Input Items for Response {response.id}:")
print(f"Total items retrieved: {len(input_items.data)}")
print(f"Has more items: {input_items.has_more}")
print(f"First item ID: {input_items.first_id}")
print(f"Last item ID: {input_items.last_id}")

print("\nItem Details:")
for i, item in enumerate(input_items.data, 1):
    print(f"\n{i}. Item ID: {item.id}")
    print(f"   Type: {item.type}")
    print(f"   Role: {item.role}")
    if hasattr(item, 'content') and item.content:
        for content_item in item.content:
            if content_item.type == 'input_text':
                text_preview = content_item.text[:80] + "..." if len(content_item.text) > 80 else content_item.text
                print(f"   Content: {text_preview}")

Input Items for Response resp_07156feef9c847ff006926ed7097fc81909ee0b1dc4dd9740d:
Total items retrieved: 1
Has more items: False
First item ID: msg_07156feef9c847ff006926ed709ac88190b892752dd2b1067a
Last item ID: msg_07156feef9c847ff006926ed709ac88190b892752dd2b1067a

Item Details:

1. Item ID: msg_07156feef9c847ff006926ed709ac88190b892752dd2b1067a
   Type: message
   Role: user
   Content: Explain quantum entanglement in two sentences suitable for a high school student...


In [13]:
# List input items from a multi-turn conversation
multi_turn_inputs = client.responses.input_items.list(
    second_response.id,
    limit=5
)

print(f"Multi-turn Conversation Input Items:")
print(f"Total items: {len(multi_turn_inputs.data)}")
print("\nThis shows both the original question and the follow-up:")
for item in multi_turn_inputs.data:
    if item.role == 'user' and item.content:
        for content in item.content:
            if hasattr(content, 'text'):
                print(f"  - {content.text}")

Multi-turn Conversation Input Items:
Total items: 4

This shows both the original question and the follow-up:
  - Can you give me a real-world example of the second law?
  - What are the three laws of thermodynamics?


## 6. Get Input Token Counts

Token counting allows you to estimate costs and validate inputs before creating a response. This endpoint calculates the exact number of input tokens that would be used for a given set of parameters, including conversation history, instructions, tools, and formatting overhead. Essential for budgeting, rate limiting, and ensuring requests fit within context windows.

### Parameters

- **`model`** (string, optional): Model ID to count tokens for (different models have different tokenization).
- **`input`** (string/array, optional): The input content to count tokens for.
- **`instructions`** (string, optional): System message to include in the count.
- **`conversation`** (string/object, optional): Conversation context to include.
- **`previous_response_id`** (string, optional): Previous response for multi-turn counting.
- **`tools`** (array, optional): Tools configuration to include in count.
- **`text`**, **`tool_choice`**, **`reasoning`** (various, optional): Other configuration affecting token count.

In [14]:
# Count tokens for a simple input
token_count = client.responses.input_tokens.count(
    model="gpt-5-mini",
    input="What is the capital of France?"
)

print(f"Simple Query Token Count: {token_count.input_tokens}")

Simple Query Token Count: 13


In [15]:
# Count tokens with instructions and longer input
complex_token_count = client.responses.input_tokens.count(
    model="gpt-5-mini",
    input="""Analyze the following customer feedback and extract:
    1. Sentiment (positive, negative, neutral)
    2. Main topics mentioned
    3. Action items for the product team
    
    Feedback: 'I love the new UI design, but the app crashes frequently when uploading large files. 
    The customer support team was very helpful though.'""",
    instructions="You are an expert customer feedback analyst with experience in product management."
)

print(f"Complex Query Token Count: {complex_token_count.input_tokens}")
print(f"\nThis includes:")
print(f"  - System instructions")
print(f"  - User input text")
print(f"  - Formatting overhead")

Complex Query Token Count: 95

This includes:
  - System instructions
  - User input text
  - Formatting overhead


In [16]:
# Compare token counts across different models
models_to_compare = ["gpt-5-mini", "gpt-5"]
test_input = "Explain the concept of machine learning in simple terms."

print("Token Count Comparison:")
for model in models_to_compare:
    count = client.responses.input_tokens.count(
        model=model,
        input=test_input
    )
    print(f"  {model}: {count.input_tokens} tokens")

Token Count Comparison:
  gpt-5-mini: 16 tokens
  gpt-5: 16 tokens


## 7. The Response Object

The Response object is the core data structure returned by the API, containing all information about a model generation including outputs, metadata, configuration, and usage statistics. Understanding this structure is crucial for extracting results, handling errors, and managing conversation state.

### Response Object Structure

**Core Identification:**
- **`id`** (string): Unique identifier (e.g., `"resp_67ccd2bed1ec8190b14f964abc0542670bb6a6b452d3795b"`).
- **`object`** (string): Always `"response"` for this type.
- **`created_at`** (number): Unix timestamp of creation.

**Status and Completion:**
- **`status`** (string): One of `completed`, `failed`, `in_progress`, `cancelled`, `queued`, or `incomplete`.
- **`error`** (object): Error details if status is `failed`.
- **`incomplete_details`** (object): Reason for incomplete status.

**Output:**
- **`output`** (array): Generated content items (messages, tool calls, reasoning).
- **`output_text`** (string, SDK only): Aggregated text from all output items (convenience property).

**Configuration:**
- **`model`** (string): Model ID used.
- **`instructions`** (string/array): System messages provided.
- **`temperature`**, **`top_p`** (number): Sampling parameters.
- **`max_output_tokens`** (integer): Token limit.
- **`tools`** (array): Available tools.
- **`tool_choice`** (string/object): Tool selection strategy.
- **`text`** (object): Output format configuration.

**Conversation State:**
- **`conversation`** (object): Linked conversation if any.
- **`previous_response_id`** (string): Previous response in chain.

**Usage and Metadata:**
- **`usage`** (object): Token counts (input, output, cached, reasoning).
- **`metadata`** (map): Custom key-value pairs.
- **`store`** (boolean): Whether response is stored.
- **`background`** (boolean): Whether run in background mode.

In [18]:
# Examine a complete response object
sample_response = client.responses.create(
    model="gpt-5-mini",
    input="Describe photosynthesis in one paragraph.",
    instructions="You are a biology teacher explaining concepts to high school students.",
    max_output_tokens=200,
    metadata={"subject": "biology", "topic": "photosynthesis"}
)

print("Response Object Structure:\n")
print(f"ID: {sample_response.id}")
print(f"Object Type: {sample_response.object}")
print(f"Created At: {sample_response.created_at}")
print(f"Status: {sample_response.status}")
print(f"\nModel Configuration:")
print(f"  Model: {sample_response.model}")
print(f"  Temperature: {sample_response.temperature}")
print(f"  Max Output Tokens: {sample_response.max_output_tokens}")
print(f"  Top P: {sample_response.top_p}")
print(f"\nOutput:")
print(f"  Number of output items: {len(sample_response.output)}")
print(f"  Output text: {sample_response.output_text[:100]}...")
print(f"\nUsage Statistics:")
print(f"  Input Tokens: {sample_response.usage.input_tokens}")
print(f"  Output Tokens: {sample_response.usage.output_tokens}")
print(f"  Total Tokens: {sample_response.usage.total_tokens}")
print(f"  Cached Tokens: {sample_response.usage.input_tokens_details.cached_tokens}")
print(f"\nMetadata: {sample_response.metadata}")
print(f"Store: {sample_response.store}")
print(f"Background: {sample_response.background}")

Response Object Structure:

ID: resp_00d7e4d6f8a98d60006926ee49f1b881939473c810177b2f1c
Object Type: response
Created At: 1764159049.0
Status: incomplete

Model Configuration:
  Model: gpt-5-mini-2025-08-07
  Temperature: 1.0
  Max Output Tokens: 200
  Top P: 1.0

Output:
  Number of output items: 2
  Output text: Photosynthesis is the process by which plants, algae, and some bacteria use sunlight to convert carb...

Usage Statistics:
  Input Tokens: 29
  Output Tokens: 141
  Total Tokens: 170
  Cached Tokens: 0

Metadata: {'subject': 'biology', 'topic': 'photosynthesis'}
Store: True
Background: False


In [23]:
# Examine the output array structure
print("Output Array Structure:\n")
for i, output_item in enumerate(sample_response.output, 1):
    print(f"Output Item {i}:")
    print(f"  Type: {output_item.type}")
    print(f"  ID: {output_item.id}")
    print(f"  Status: {output_item.status}")
    
    if hasattr(output_item, 'content') and output_item.content:
        print(f"  Content Items: {len(output_item.content)}")
        for j, content in enumerate(output_item.content, 1):
            print(f"    {j}. Type: {content.type}")
            if hasattr(content, 'text'):
                print(f"       Text length: {len(content.text)} chars")

Output Array Structure:

Output Item 1:
  Type: reasoning
  ID: rs_00d7e4d6f8a98d60006926ee4a3a2c81938065b8bf7e77a6c1
  Status: None
Output Item 2:
  Type: message
  ID: msg_00d7e4d6f8a98d60006926ee4c0244819384977ba94effa683
  Status: incomplete
  Content Items: 1
    1. Type: output_text
       Text length: 376 chars


## 8. The Input Item List

The Input Item List represents all content provided to the model when generating a response. This includes user messages, system instructions, conversation history from previous responses, and any additional context. Understanding this structure helps with debugging, auditing input costs, and managing conversation state.

### Input Item List Structure

- **`object`** (string): Always `"list"` for list responses.
- **`data`** (array): Array of input items used to generate the response.
  - Each item can be a message, tool result, or other input type
  - Contains `id`, `type`, `role`, and `content` fields
- **`first_id`** (string): ID of the first item in the current page.
- **`last_id`** (string): ID of the last item in the current page.
- **`has_more`** (boolean): Whether additional items exist beyond this page.

In [24]:
# Examine input item list structure
input_list = client.responses.input_items.list(
    sample_response.id,
    limit=10
)

print("Input Item List Structure:\n")
print(f"Object Type: {input_list.object}")
print(f"Number of Items: {len(input_list.data)}")
print(f"First Item ID: {input_list.first_id}")
print(f"Last Item ID: {input_list.last_id}")
print(f"Has More Items: {input_list.has_more}")

print("\nItem Breakdown:")
for i, item in enumerate(input_list.data, 1):
    print(f"\nItem {i}:")
    print(f"  ID: {item.id}")
    print(f"  Type: {item.type}")
    print(f"  Role: {item.role}")
    
    if hasattr(item, 'content') and item.content:
        print(f"  Content Items: {len(item.content)}")
        for content_item in item.content:
            print(f"    - Type: {content_item.type}")
            if hasattr(content_item, 'text'):
                preview = content_item.text[:60] + "..." if len(content_item.text) > 60 else content_item.text
                print(f"      Text: {preview}")

Input Item List Structure:

Object Type: list
Number of Items: 1
First Item ID: msg_00d7e4d6f8a98d60006926ee49f4708193b8ec7e8b7ddd7ee0
Last Item ID: msg_00d7e4d6f8a98d60006926ee49f4708193b8ec7e8b7ddd7ee0
Has More Items: False

Item Breakdown:

Item 1:
  ID: msg_00d7e4d6f8a98d60006926ee49f4708193b8ec7e8b7ddd7ee0
  Type: message
  Role: user
  Content Items: 1
    - Type: input_text
      Text: Describe photosynthesis in one paragraph.


In [25]:
# Demonstrate pagination with input items
# First, create a response with multiple input items via conversation
conversation = client.conversations.create(
    items=[
        {"type": "message", "role": "user", "content": "What is Python?"},
        {"type": "message", "role": "user", "content": "What are its main features?"},
        {"type": "message", "role": "user", "content": "Give me a code example."}
    ]
)

conv_response = client.responses.create(
    model="gpt-5-mini",
    conversation=conversation.id,
    input="Now explain list comprehensions."
)

# List with pagination
first_page = client.responses.input_items.list(
    conv_response.id,
    limit=2,
    order="asc"
)

print(f"First Page of Input Items:")
print(f"  Items in this page: {len(first_page.data)}")
print(f"  Has more: {first_page.has_more}")

if first_page.has_more:
    second_page = client.responses.input_items.list(
        conv_response.id,
        limit=2,
        after=first_page.last_id
    )
    print(f"\nSecond Page of Input Items:")
    print(f"  Items in this page: {len(second_page.data)}")
    print(f"  Has more: {second_page.has_more}")

First Page of Input Items:
  Items in this page: 2
  Has more: True

Second Page of Input Items:
  Items in this page: 1
  Has more: False


## Summary

This notebook covered the complete OpenAI Responses API including:

1. **Creating responses** - Text generation with instructions, temperature control, and multi-turn conversations
2. **Retrieving responses** - Accessing stored responses by ID
3. **Deleting responses** - Removing responses from storage
4. **Cancelling responses** - Stopping background response generation
5. **Listing input items** - Inspecting what was sent to the model
6. **Counting tokens** - Estimating costs and validating inputs
7. **Response object structure** - Understanding output format and metadata
8. **Input item list structure** - Managing conversation context and pagination

The Responses API provides the most advanced interface for working with OpenAI models, enabling sophisticated applications with tool integration, structured outputs, conversation state management, and fine-grained control over model behavior.