# Managing Conversations in the OpenAI Responses API

## Introduction

In this lesson, we'll explore how to effectively manage conversations in the OpenAI Responses API. The Responses API is the modern replacement for the Assistants API, providing a streamlined, stateful interface for building conversational AI applications.

### Key Differences from Assistants API

**Assistants API → Responses API Mapping:**
- Assistants → Prompts (instructions)
- Threads → Conversations (implicit, linked via response IDs)
- Runs → Responses
- Run-Steps → Items

**Main Advantages:**
- Server-side conversation state management
- No need to manually track threads and message history
- Simplified API with less boilerplate code
- Built-in tools (web search, file search, code interpreter)
- Conversation forking capabilities

First, let's set up our environment:

In [None]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")

In [None]:
from openai import OpenAI
import time

# Initialize the OpenAI client
client = OpenAI()

## Understanding Conversations in the Responses API

Unlike the Assistants API which required explicit thread creation and management, the Responses API handles conversations implicitly through response chaining. Each response has a unique ID, and you maintain conversation continuity by referencing the previous response ID.

**Key Concepts:**
- Responses are stored for 30 days by default
- You can disable storage with `store=False`
- Conversations are formed by linking responses via `previous_response_id`
- All prior input tokens remain billable, even when using `previous_response_id`

### Creating a Basic Response

Let's start by creating a simple response:

In [None]:
def create_basic_response(user_input):
    """Create a basic response without conversation history."""
    response = client.responses.create(
        model="gpt-4o-mini",
        input=user_input
    )
    print(f"Response ID: {response.id}")
    print(f"Output: {response.output[0].content[0].text}")
    return response

# Create a new response
response = create_basic_response("Tell me a joke about programming.")

### Continuing Conversations with previous_response_id

To continue a conversation, simply pass the `previous_response_id` parameter. The API automatically retrieves the full conversation history:

In [None]:
def continue_conversation(previous_response_id, user_input):
    """Continue an existing conversation by referencing the previous response."""
    response = client.responses.create(
        model="gpt-4o-mini",
        input=user_input,
        previous_response_id=previous_response_id
    )
    print(f"Response ID: {response.id}")
    print(f"Output: {response.output[0].content[0].text}")
    return response

# Continue the conversation
response_2 = continue_conversation(response.id, "Tell me another one!")

## Managing Conversation State

### Using the store Parameter

By default, responses are stored on OpenAI's servers (`store=True`). You can disable this for privacy or cost reasons:

In [None]:
def create_ephemeral_response(user_input):
    """Create a response that won't be stored on OpenAI's servers."""
    response = client.responses.create(
        model="gpt-4o-mini",
        input=user_input,
        store=False  # Don't store conversation state
    )
    print(f"Response ID: {response.id}")
    print(f"Output: {response.output[0].content[0].text}")
    return response

# Example: Create a response without storing
ephemeral_response = create_ephemeral_response("What's the weather like?")

### Retrieving Previous Responses

You can retrieve any stored response by its ID:

In [None]:
def retrieve_response(response_id):
    """Retrieve a previously stored response by its ID."""
    fetched_response = client.responses.retrieve(response_id=response_id)
    print(f"Retrieved Response ID: {fetched_response.id}")
    print(f"Output: {fetched_response.output[0].content[0].text}")
    return fetched_response

# Example: Retrieve the first response we created
retrieved = retrieve_response(response.id)

## Forking Conversations

One powerful feature is the ability to fork conversations - branching from any previous response to explore alternative paths:

In [None]:
def fork_conversation(fork_from_id, user_input):
    """Fork a conversation from a specific response ID."""
    response = client.responses.create(
        model="gpt-4o-mini",
        input=user_input,
        previous_response_id=fork_from_id
    )
    print(f"Forked Response ID: {response.id}")
    print(f"Output: {response.output[0].content[0].text}")
    return response

# Fork from the first response with a different question
forked_response = fork_conversation(
    response.id, 
    "Actually, can you explain what makes that joke funny?"
)

## Working with Instructions (System Prompts)

You can provide instructions to shape the assistant's behavior. Instructions are similar to system prompts in Chat Completions:

In [None]:
def create_response_with_instructions(instructions, user_input):
    """Create a response with custom instructions."""
    response = client.responses.create(
        model="gpt-4o-mini",
        instructions=instructions,
        input=user_input
    )
    print(f"Response: {response.output[0].content[0].text}")
    return response

# Example: Create a response with a specific persona
pirate_response = create_response_with_instructions(
    instructions="You are a helpful coding assistant that talks like a pirate.",
    user_input="How do I declare a variable in Python?"
)

## Working with Different Content Types

### Text Messages with Multiple Turns

You can structure input with role-based messages for more complex conversations:

In [None]:
def create_multi_turn_response():
    """Create a response with structured message history."""
    response = client.responses.create(
        model="gpt-4o-mini",
        input=[
            {
                "role": "user",
                "content": "I'm learning about data structures."
            },
            {
                "role": "user",
                "content": "Can you explain what a hash table is?"
            }
        ]
    )
    print(f"Response: {response.output[0].content[0].text}")
    return response

# Example with multiple message turns
multi_turn_response = create_multi_turn_response()

### Messages with Images

The Responses API supports multimodal inputs including images:

In [None]:
def analyze_image(image_url, question):
    """Analyze an image by providing a URL."""
    response = client.responses.create(
        model="gpt-4o",  # Use gpt-4o for vision capabilities
        input=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": question
                    },
                    {
                        "type": "input_image",
                        "image_url": image_url
                    }
                ]
            }
        ]
    )
    print(f"Response: {response.output[0].content[0].text}")
    return response

# Example: Analyze an image (uncomment with a real image URL)
# image_response = analyze_image(
#     "https://example.com/image.jpg",
#     "What do you see in this image?"
# )

### Working with Base64 Images

You can also provide images as base64-encoded strings:

In [None]:
import base64

def analyze_local_image(image_path, question):
    """Analyze a local image file by encoding it as base64."""
    with open(image_path, "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode('utf-8')
    
    # Determine image format from file extension
    image_format = image_path.split('.')[-1].lower()
    
    response = client.responses.create(
        model="gpt-4o",
        input=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": question
                    },
                    {
                        "type": "input_image",
                        "image_url": f"data:image/{image_format};base64,{image_data}"
                    }
                ]
            }
        ]
    )
    print(f"Response: {response.output[0].content[0].text}")
    return response

# Example: Analyze a local image (uncomment with a real image path)
# local_image_response = analyze_local_image(
#     "/path/to/image.png",
#     "Describe this image in detail."
# )

## Using Built-in Tools

### Web Search Tool

The Responses API includes built-in tools like web search:

In [None]:
def search_web(query):
    """Use the built-in web search tool."""
    response = client.responses.create(
        model="gpt-4o",
        input=query,
        tools=[{"type": "web_search"}]
    )
    
    # The response may include tool execution results
    for item in response.output:
        if hasattr(item, 'content'):
            for content in item.content:
                if hasattr(content, 'text'):
                    print(f"Response: {content.text}")
    
    return response

# Example: Search for current information
# web_response = search_web("What are the latest developments in AI in 2025?")

### File Search Tool

You can enable file search for document retrieval and analysis:

In [None]:
def use_file_search(query):
    """Use the built-in file search tool."""
    response = client.responses.create(
        model="gpt-4o",
        input=query,
        tools=[{"type": "file_search"}]
    )
    print(f"Response: {response.output[0].content[0].text}")
    return response

# Example usage with file search
# file_search_response = use_file_search("Find information about Python decorators in the documentation.")

### Code Interpreter Tool

Enable the code interpreter for data analysis and code execution:

In [None]:
def use_code_interpreter(query):
    """Use the built-in code interpreter tool."""
    response = client.responses.create(
        model="gpt-4o",
        input=query,
        tools=[{"type": "code_interpreter"}]
    )
    print(f"Response: {response.output[0].content[0].text}")
    return response

# Example: Request data analysis
code_response = use_code_interpreter(
    "Create a Python function to calculate the Fibonacci sequence up to n terms."
)

## Managing Context Windows and Token Limits

You can control token usage with `max_prompt_tokens` and `max_completion_tokens`:

In [None]:
def create_limited_response(user_input):
    """Create a response with token limits."""
    response = client.responses.create(
        model="gpt-4o-mini",
        input=user_input,
        max_prompt_tokens=500,
        max_completion_tokens=1000
    )
    print(f"Response: {response.output[0].content[0].text}")
    return response

# Example with token limits
limited_response = create_limited_response(
    "Explain machine learning in simple terms."
)

### Automatic Truncation

For long conversations, use automatic truncation to manage context:

In [None]:
def create_truncated_response(previous_response_id, user_input):
    """Create a response with automatic truncation for long conversations."""
    response = client.responses.create(
        model="gpt-4o-mini",
        input=user_input,
        previous_response_id=previous_response_id,
        truncation="auto"  # Automatically truncate to fit context window
    )
    print(f"Response: {response.output[0].content[0].text}")
    return response

# Example with truncation
truncated_response = create_truncated_response(
    response.id,
    "Continue our conversation from earlier."
)

## Complete Conversation Example

Let's create a complete multi-turn conversation with state management:

In [None]:
def multi_turn_conversation():
    """Demonstrate a complete multi-turn conversation."""
    print("=" * 60)
    print("Starting a multi-turn conversation")
    print("=" * 60)
    
    # Turn 1: Initial response
    response_1 = client.responses.create(
        model="gpt-4o-mini",
        instructions="You are a helpful Python programming tutor.",
        input="I'm new to Python. What should I learn first?"
    )
    print(f"\nTurn 1 - Response ID: {response_1.id}")
    print(f"Assistant: {response_1.output[0].content[0].text}\n")
    
    # Turn 2: Continue conversation
    response_2 = client.responses.create(
        model="gpt-4o-mini",
        input="Can you give me an example of a for loop?",
        previous_response_id=response_1.id
    )
    print(f"Turn 2 - Response ID: {response_2.id}")
    print(f"Assistant: {response_2.output[0].content[0].text}\n")
    
    # Turn 3: Continue further
    response_3 = client.responses.create(
        model="gpt-4o-mini",
        input="What's the difference between a list and a tuple?",
        previous_response_id=response_2.id
    )
    print(f"Turn 3 - Response ID: {response_3.id}")
    print(f"Assistant: {response_3.output[0].content[0].text}\n")
    
    # Turn 4: Fork back to turn 1
    response_4 = client.responses.create(
        model="gpt-4o-mini",
        input="Actually, what about web development with Python instead?",
        previous_response_id=response_1.id  # Fork from first response
    )
    print(f"Turn 4 (Forked from Turn 1) - Response ID: {response_4.id}")
    print(f"Assistant: {response_4.output[0].content[0].text}\n")
    
    print("=" * 60)
    print("Conversation complete")
    print("=" * 60)
    
    return response_1, response_2, response_3, response_4

# Run the complete conversation example
conv_responses = multi_turn_conversation()

## Best Practices

### 1. Conversation Management

- **Store response IDs**: Keep track of response IDs to continue conversations
- **Use forking wisely**: Fork conversations to explore alternative paths without losing context
- **Monitor token usage**: Remember that all prior tokens are billable when using `previous_response_id`
- **Set storage preferences**: Use `store=False` for sensitive conversations

### 2. Performance Optimization

- **Use appropriate models**: Use `gpt-4o-mini` for simple tasks, `gpt-4o` for complex reasoning
- **Limit context**: Use `max_prompt_tokens` and `truncation="auto"` for long conversations
- **Cache responses**: Store frequently accessed responses locally to reduce API calls

### 3. Content Best Practices

- **Clear instructions**: Provide clear, specific instructions for consistent behavior
- **Structured input**: Use role-based messages for complex conversations
- **Tool selection**: Choose appropriate tools (web_search, file_search, code_interpreter) based on the task

### 4. Error Handling

Always implement proper error handling:

In [None]:
def safe_create_response(user_input, previous_response_id=None):
    """Create a response with proper error handling."""
    try:
        response = client.responses.create(
            model="gpt-4o-mini",
            input=user_input,
            previous_response_id=previous_response_id
        )
        return response
    except Exception as e:
        print(f"Error creating response: {e}")
        return None

# Example with error handling
safe_response = safe_create_response("Hello, how are you?")

## Exercise: Build a Conversational Assistant

Try this exercise to practice working with conversations:

In [None]:
def interactive_conversation():
    """
    Create an interactive conversation loop.
    Type 'quit' to exit.
    """
    print("Starting interactive conversation. Type 'quit' to exit.\n")
    
    last_response_id = None
    
    while True:
        user_input = input("You: ")
        
        if user_input.lower() in ['quit', 'exit', 'q']:
            print("Goodbye!")
            break
        
        try:
            response = client.responses.create(
                model="gpt-4o-mini",
                instructions="You are a friendly and helpful assistant.",
                input=user_input,
                previous_response_id=last_response_id
            )
            
            last_response_id = response.id
            print(f"\nAssistant: {response.output[0].content[0].text}\n")
            
        except Exception as e:
            print(f"Error: {e}\n")

# Uncomment to run the interactive conversation
# interactive_conversation()

## Migration from Assistants API

If you're migrating from the Assistants API, here's a quick reference:

### Assistants API Pattern
```python
# Old way (Assistants API)
assistant = client.beta.assistants.create(
    name="My Assistant",
    instructions="You are helpful.",
    model="gpt-4"
)

thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Hello"
)

run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)
```

### Responses API Pattern
```python
# New way (Responses API)
response = client.responses.create(
    model="gpt-4o",
    instructions="You are helpful.",
    input="Hello"
)

# Continue conversation
response_2 = client.responses.create(
    model="gpt-4o",
    input="How are you?",
    previous_response_id=response.id
)
```

**Benefits of Migration:**
- Simpler API with less boilerplate
- Faster response times
- Built-in state management
- Conversation forking capabilities
- Unified interface for tools and multimodal inputs

## Conclusion

The Responses API provides a streamlined, powerful way to build conversational AI applications. Key takeaways:

1. **Simplified State Management**: No need to manually manage threads and messages
2. **Server-Side Storage**: Conversations are stored automatically for 30 days
3. **Flexible Continuation**: Use `previous_response_id` to continue or fork conversations
4. **Built-in Tools**: Web search, file search, and code interpreter available out of the box
5. **Multimodal Support**: Handle text, images, and files in the same API
6. **Cost Management**: Control token usage with limits and truncation

The Responses API represents a significant improvement over the Assistants API, offering better performance, simpler code, and more powerful features for building modern AI applications.

## Additional Resources

- [OpenAI Responses API Documentation](https://platform.openai.com/docs/api-reference/responses)
- [OpenAI Cookbook - Responses API Examples](https://cookbook.openai.com/examples/responses_api/responses_example)
- [Migration Guide: Assistants API to Responses API](https://apimagic.ai/blog/switching-assistant-responses-api)