# Introduction

In a [previous blog post](https://drchrislevy.github.io/blog), I explored various aspects of building AI agents, including coding agents, ReAct prompting, and tool-calling loops. Recently, OpenAI announced [new tools for building agents](https://openai.com/index/new-tools-for-building-agents/), introducing the Responses API and Agents SDK. OpenAI set the standard with their Chat Completions API, which has been widely adopted and extended by developers.

However, it's important to remain cautious. Remember what happened to the Assistants API? It never fully emerged from beta and is scheduled to be sunset in 2026. In contrast, these new APIs are not marked as beta, indicating that OpenAI feels more confident about this direction. Given their prior success with the Chat Completions API, there's reason to be optimistic that they might achieve similar success again. OpenAI has likely incorporated key learnings from their experience with the Assistants API into these latest developments, making these new tools intriguing and well worth exploring further.

# Responses API

OpenAI plans to continue supporting the [Chat Completions API](https://platform.openai.com/docs/guides/responses-vs-chat-completions#the-chat-completions-api-is-not-going-away). However, for new projects, they recommend using the newly introduced [Responses API](https://platform.openai.com/docs/api-reference/responses). 

One advantage of the Chat Completions API I've appreciated is its wide adoption by other LLM providers, making it easy to switch between services. Because of this flexibility, it may still be practical to use Chat Completions for some new projects. It remains to be seen whether other providers will adopt the Responses API as well.

Here's a screenshot from the OpenAI documentation explaining the Responses API:

<img src="static_blog_imgs/why_responses_api.png" width="75%">

### Key points to note about the Responses API:

- **Stateful**: It includes a `previous_response_id` to support long-running conversations.
- It is distinctly different from the Chat Completions API.
- If your application doesn't require built-in tools, you can confidently continue using **Chat Completions**.
- When ready for advanced capabilities tailored to agent workflows, the Responses API is recommended.
- The Responses API represents OpenAI's future direction for agent-building.


## Quickstart

I'm not going to go over all the details of the Responses API, because that's what the docs are for.
But I'm going to cover some things that are **new** to me.


In [1]:
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI()

response = client.responses.create(model="gpt-4o-mini", input="Tell a quick dad joke!")

print(response)

Response(id='resp_67dcb9129b288191b03676b4c3ee91770b18a606f20fdf95', created_at=1742518546.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_67dcb912f2f48191911626bf906ad6840b18a606f20fdf95', content=[ResponseOutputText(annotations=[], text="Why don't skeletons fight each other? \n\nThey don't have the guts!", type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort=None, generate_summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=31, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=16, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=47), user=

In [2]:
response.to_dict()

{'id': 'resp_67dcb9129b288191b03676b4c3ee91770b18a606f20fdf95',
 'created_at': 1742518546.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'msg_67dcb912f2f48191911626bf906ad6840b18a606f20fdf95',
   'content': [{'annotations': [],
     'text': "Why don't skeletons fight each other? \n\nThey don't have the guts!",
     'type': 'output_text'}],
   'role': 'assistant',
   'status': 'completed',
   'type': 'message'}],
 'parallel_tool_calls': True,
 'temperature': 1.0,
 'tool_choice': 'auto',
 'tools': [],
 'top_p': 1.0,
 'max_output_tokens': None,
 'previous_response_id': None,
 'reasoning': {'effort': None, 'generate_summary': None},
 'status': 'completed',
 'text': {'format': {'type': 'text'}},
 'truncation': 'disabled',
 'usage': {'input_tokens': 31,
  'input_tokens_details': {'cached_tokens': 0},
  'output_tokens': 16,
  'output_tokens_details': {'reasoning_tokens': 0},


In [3]:
print(response.output[0].content[0].text)

Why don't skeletons fight each other? 

They don't have the guts!


Or a little shortcut:

In [4]:
print(response.output_text)

Why don't skeletons fight each other? 

They don't have the guts!


- Response objects are saved for 30 days by default. You can disable this behavior by setting store to false when creating a Response.
- Can be viewed in the dashboard logs page or retrieved via the API.

I never inspected traces in chat completions before because I don't think they are enabled by default.
But here with Responses API, you can inspect traces by default for 30 days unless you disable it on the API call.

<img src="static_blog_imgs/responses_logs.png" width="100%">

<img src="static_blog_imgs/responses_log_example.png" width="100%">

Or retrieve traces via the API.

In [5]:
client.responses.retrieve(response.id)

Response(id='resp_67dcb9129b288191b03676b4c3ee91770b18a606f20fdf95', created_at=1742518546.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_67dcb912f2f48191911626bf906ad6840b18a606f20fdf95', content=[ResponseOutputText(annotations=[], text="Why don't skeletons fight each other? \n\nThey don't have the guts!", type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort=None, generate_summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=31, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=16, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=47), user=

## Instruction Following

In [6]:
response = client.responses.create(model="gpt-4o-mini", instructions="You return markdown and lots of emojis. ", input="Tell a quick dad joke!")
print(response.output_text)

Why did the scarecrow win an award? 🌾🏆 

Because he was outstanding in his field! 😄


*The instructions parameter gives the model high-level instructions on how it should behave while generating a response, including tone, goals, and examples of correct responses. Any instructions provided this way will take priority over a prompt in the input parameter.* [source](https://platform.openai.com/docs/guides/text#message-roles-and-instruction-following)

This example is roughly equivalent to:

In [7]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[{"role": "developer", "content": "You return markdown and lots of emojis. "}, {"role": "user", "content": "Tell a quick dad joke!"}],
)

print(response.output_text)

Why did the scarecrow win an award? 🎉

Because he was outstanding in his field! 🌾😂


The argument `instructions` is used to insert a system (or developer) message as the first item in the model's context [source](https://platform.openai.com/docs/api-reference/responses/create#responses-create-instructions).






In [8]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[{"role": "developer", "content": "You return markdown and lots of emojis. "}, {"role": "user", "content": "Tell a quick dad joke!"}],
    instructions="You must talk like a pirate and do not return any markdown or emojis.",
)

print(response.output_text)

Why did the scarecrow win an award? Because he was outstanding in his field!


<img src="static_blog_imgs/instructions_insertion.png" width="100%">

## Conversation State

We can manually handle the chat history using alternating `user` and `assistant` messages, just as previously done with the chat completions API.



In [9]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Chris, and my age is 40."},
        {"role": "assistant", "content": "Nice to meet you, Chris!"},
        {"role": "user", "content": "How old am I?"},
    ],
)

print(response.output_text)

You mentioned that you are 40 years old.


Alternately, we can use the `previous_response_id` parameter to manage conversation state.

In [10]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Chris, and my age is 40."},
    ],
)

print(response.output_text)
print(response.id)

Nice to meet you, Chris! If there's anything specific you'd like to talk about or any questions you have, feel free to let me know!
resp_67dcb9189d248191aa8b5380deffe0170a7498d0c4209037


In [11]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "How old am I"},
    ],
    previous_response_id=response.id,
)
print(response.output_text)
print(response.id)

You mentioned that you are 40 years old.
resp_67dcb919f984819194f8ba91b5c7d4130a7498d0c4209037


In [12]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "And what was my name?"},
    ],
    previous_response_id=response.id,
)
print(response.output_text)
print(response.id)

Your name is Chris.
resp_67dcb91ae9148191a8974cf8d267ba5e0a7498d0c4209037


When using `previous_response_id`, all previous input tokens for responses in the chain are billed as input tokens in the API [source](https://platform.openai.com/docs/guides/conversation-state#openai-apis-for-conversation-state).

When you view the logs in dashboard for a message that used `previous_response_id`, there is a link/button to find the previous response.

<img src="static_blog_imgs/prev_response_log.png" width="100%">


When using `previous_response_id`, the **instructions** from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses [source](https://platform.openai.com/docs/api-reference/responses/create#responses-create-instructions). The instructions parameter only applies to the current response generation request.

In [13]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "What country is the city Toronto in?"},
    ],
    instructions= 'You only write lower case letters'
)
print(response.output_text)
response.id

toronto is in canada.


'resp_67dcb91bd80c8191b96a2e4c3ed03b3508e4210ba7e03814'

In [14]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "What country was it again?"},
    ],
    previous_response_id=response.id
)
print(response.output_text)

Toronto is in Canada.


In [15]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "I forget, what was it?"},
    ],
    instructions= 'You only write UPPER CASE letters',
    previous_response_id=response.id
)
print(response.output_text)

TORONTO IS IN CANADA.


Of course there all the other OpenAI LLM goodies such as function calling, structured outputs, streaming, analyzing images, etc. I assume there
are slight changes to the API calls but I am already familiar with these things from the chat completions API, so no point in going over them here.

## Built-in tools

Something new here to me is the ability to use built-in tools with the Responses API.
As of writing, the built-in tools include things like web search, file search, computer use, and function calling.
I'm already familiar with tool/function calling. But let's take a look at some of these other tools.







In [16]:
# Importing these to make the output look nicer
from monsterui.all import render_md
from fasthtml.common import show

### Web Search

In [25]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[{"type": "web_search_preview"}],
    input="Did Alabama State win their first game of the March Madness tournament in 2025?" # web_search_preview_2025_03_11 points to a dated version of the tool
)

show(render_md(response.output_text))

In [26]:
response.to_dict()

{'id': 'resp_67dcbc296a848191beea151d745164f5086920f67eda0861',
 'created_at': 1742519337.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'ws_67dcbc2a09ec8191991c57849138d81d086920f67eda0861',
   'status': 'completed',
   'type': 'web_search_call'},
  {'id': 'msg_67dcbc2c6eac8191a9bd11350095c2b8086920f67eda0861',
   'content': [{'annotations': [{'end_index': 363,
       'start_index': 210,
       'title': 'Alabama State stuns Saint Francis for last-second First Four victory',
       'type': 'url_citation',
       'url': 'https://www.reuters.com/sports/basketball/alabama-state-stuns-saint-francis-last-second-first-four-victory-2025-03-19/?utm_source=openai'},
      {'end_index': 627,
       'start_index': 463,
       'title': 'No. 1 overall seed Auburn puts away Alabama State 83-63 to open March Madness',
       'type': 'url_citation',
       'url': 'https://www.clickorl

You can also force the use of the web_search_preview tool by using the tool_choice parameter, and setting it to `{type: "web_search_preview"}`:


In [27]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[{"type": "web_search_preview"}],
    tool_choice={"type": "web_search_preview"},
    input="Is there an upcoming federal election in Canada?"
)
show(render_md(response.output_text))


In [28]:
response.to_dict()

{'id': 'resp_67dcbc2fa508819182c54a57f44f00fb00684d0dd54f2bf5',
 'created_at': 1742519343.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'ws_67dcbc2fb01c8191aaa188f79e91d28100684d0dd54f2bf5',
   'status': 'completed',
   'type': 'web_search_call'},
  {'id': 'msg_67dcbc3211888191b2b8fc4c36c1899000684d0dd54f2bf5',
   'content': [{'annotations': [{'end_index': 313,
       'start_index': 209,
       'title': '45th Canadian federal election - Wikipedia',
       'type': 'url_citation',
       'url': 'https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai'},
      {'end_index': 894,
       'start_index': 736,
       'title': 'Canada PM Carney to call for April 28 election on Sunday, Globe and Mail reports',
       'type': 'url_citation',
       'url': 'https://www.reuters.com/world/americas/canada-pm-carney-expected-call-snap-election-april-28-globe-

Note that for a web search we have:

- A `web_search_call` output item with the ID of the search call.

In [29]:
response.output[0]

ResponseFunctionWebSearch(id='ws_67dcbc2fb01c8191aaa188f79e91d28100684d0dd54f2bf5', status='completed', type='web_search_call')

- the annotations:




In [30]:
response.output[1].content[0].annotations

[AnnotationURLCitation(end_index=313, start_index=209, title='45th Canadian federal election - Wikipedia', type='url_citation', url='https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai'),
 AnnotationURLCitation(end_index=894, start_index=736, title='Canada PM Carney to call for April 28 election on Sunday, Globe and Mail reports', type='url_citation', url='https://www.reuters.com/world/americas/canada-pm-carney-expected-call-snap-election-april-28-globe-mail-reports-2025-03-20/?utm_source=openai'),
 AnnotationURLCitation(end_index=1163, start_index=938, title='Canada PM Carney to call for April 28 election on Sunday, Globe and Mail reports', type='url_citation', url='https://www.reuters.com/world/americas/canada-pm-carney-expected-call-snap-election-april-28-globe-mail-reports-2025-03-20/?utm_source=openai'),
 AnnotationURLCitation(end_index=1298, start_index=1166, title='Canada election expected to be held on April 28', type='url_citation', url='https://ww

- text content:

In [31]:
response.output[1].content[0].text







"Yes, there is an upcoming federal election in Canada. The 45th Canadian federal election is scheduled to take place on or before October 20, 2025, as per the fixed-date provisions of the Canada Elections Act. ([en.m.wikipedia.org](https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai))\n\nHowever, recent reports indicate that Prime Minister Mark Carney plans to call for an early election on April 28, 2025. Carney, who became Prime Minister after Justin Trudeau's resignation, aims to secure a strong mandate amid ongoing trade tensions and sovereignty threats from U.S. President Donald Trump. The election would see Carney's Liberal Party facing off against the Conservative Party led by Pierre Poilievre. ([reuters.com](https://www.reuters.com/world/americas/canada-pm-carney-expected-call-snap-election-april-28-globe-mail-reports-2025-03-20/?utm_source=openai))\n\n\n## Canada's Upcoming Federal Election:\n- [Canada PM Carney to call for April 28 election on Sund

To refine search results based on geography, you can specify an approximate user location using country, city, region, and/or timezone.




In [32]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[{
        "type": "web_search_preview",
        "user_location": {
            "type": "approximate",
            "country": "CA", #  two-letter ISO country code
            "city": "Halifax", # free text strings
            "region": "Nova Scotia", # free text strings
        }
    }],
    input="What are the best restaurants around Halifax?",
)

show(render_md(response.output_text))

The parameter `search_context_size` controls the number of search results. The tokens used by the search tool **do not** affect the context window of the main model.
Choosing the `search_context_size` parameter is a trade-off between cost, quality, and latency. The available values are 'high', 'medium', and 'low'. The default is 'medium'.
The [pricing page](https://platform.openai.com/docs/pricing) has all the details

## File Search