# Introduction

In a [previous blog post](https://drchrislevy.github.io/blog) I did a deep dive into agents, coding agents, ReAct prompting, and tool calling loops.
Recently OpenAI announced [new tools for building agents](https://openai.com/index/new-tools-for-building-agents/). 
OpenAI set the standard with their chat completions API and many developers have built on top of it. 
I don't know how these new APIs and SDKs are going to pan out in the long run, but I think it's definitely worth exploring them.

# Responses API

OpenAI intends to keep [supporting the chat completions API](https://platform.openai.com/docs/guides/responses-vs-chat-completions#the-chat-completions-api-is-not-going-away), but for new projects they are recommending using the new [Responses API](https://platform.openai.com/docs/api-reference/responses). One thing I **like** about the chat completions API is that many other LLM providers adopted the same API, so it was easy to switch between providers. On this reason alone, it probably makes sense to use the chat completions API for some projects, even new ones. I wonder if other providers will follow OpenAI's lead and adopt the Responses API?

Here is a  screenshot from the OpenAI docs regarding the Responses API:

<img src="static_blog_imgs/why_responses_api.png" width="75%">

Some things I think are worth noting:

- The responses API is **stateful**
    - Responses has `previous_response_id` to help you with long-running conversations.
- It's a completely different API from the chat completions API
- *If you don't need built-in tools for your application, you can confidently continue **using Chat Completions**.*
- *When you're ready for advanced capabilities designed specifically for agent workflows, we recommend the Responses API.*
- *The Responses API represents the future direction for building agents on OpenAI.*
- *Assistants API* is being sunset in 2026 - I've never used it personally.

## Quickstart

I'm not going to go over all the details of the Responses API, because that's what the docs are for.
But I'm going to cover some things that are **new** to me.


In [1]:
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI()

response = client.responses.create(model="gpt-4o-mini", input="Tell a quick dad joke!")

print(response)

Response(id='resp_67dc628d293c8191bc5f6158a0774349022a9190ff1f6329', created_at=1742496397.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_67dc628d77748191876ea996230aa8b8022a9190ff1f6329', content=[ResponseOutputText(annotations=[], text="Why can't you give Elsa a balloon?  \n\nBecause she will let it go!", type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort=None, generate_summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=31, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=17, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=48), user=

In [2]:
response.to_dict()

{'id': 'resp_67dc628d293c8191bc5f6158a0774349022a9190ff1f6329',
 'created_at': 1742496397.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'msg_67dc628d77748191876ea996230aa8b8022a9190ff1f6329',
   'content': [{'annotations': [],
     'text': "Why can't you give Elsa a balloon?  \n\nBecause she will let it go!",
     'type': 'output_text'}],
   'role': 'assistant',
   'status': 'completed',
   'type': 'message'}],
 'parallel_tool_calls': True,
 'temperature': 1.0,
 'tool_choice': 'auto',
 'tools': [],
 'top_p': 1.0,
 'max_output_tokens': None,
 'previous_response_id': None,
 'reasoning': {'effort': None, 'generate_summary': None},
 'status': 'completed',
 'text': {'format': {'type': 'text'}},
 'truncation': 'disabled',
 'usage': {'input_tokens': 31,
  'input_tokens_details': {'cached_tokens': 0},
  'output_tokens': 17,
  'output_tokens_details': {'reasoning_tokens': 0},


In [3]:
print(response.output[0].content[0].text)

Why can't you give Elsa a balloon?  

Because she will let it go!


Or a little shortcut:

In [4]:
print(response.output_text)

Why can't you give Elsa a balloon?  

Because she will let it go!


- Response objects are saved for 30 days by default. You can disable this behavior by setting store to false when creating a Response.
- Can be viewed in the dashboard logs page or retrieved via the API.

I never inspected traces in chat completions before because I don't think they are enabled by default.
But here with Responses API, you can inspect traces by default for 30 days unless you disable it on the API call.

<img src="static_blog_imgs/responses_logs.png" width="100%">

<img src="static_blog_imgs/responses_log_example.png" width="100%">

Or retrieve traces via the API.

In [5]:
client.responses.retrieve(response.id)

Response(id='resp_67dc628d293c8191bc5f6158a0774349022a9190ff1f6329', created_at=1742496397.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_67dc628d77748191876ea996230aa8b8022a9190ff1f6329', content=[ResponseOutputText(annotations=[], text="Why can't you give Elsa a balloon?  \n\nBecause she will let it go!", type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort=None, generate_summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=31, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=17, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=48), user=

## Instruction Following

In [6]:
response = client.responses.create(model="gpt-4o-mini", instructions="You return markdown and lots of emojis. ", input="Tell a quick dad joke!")
print(response.output_text)

Why don't skeletons fight each other?  

Because they don't have the guts! 😂💀


*The instructions parameter gives the model high-level instructions on how it should behave while generating a response, including tone, goals, and examples of correct responses. Any instructions provided this way will take priority over a prompt in the input parameter.* [source](https://platform.openai.com/docs/guides/text#message-roles-and-instruction-following)

This example is roughly equivalent to:

In [7]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[{"role": "developer", "content": "You return markdown and lots of emojis. "}, {"role": "user", "content": "Tell a quick dad joke!"}],
)

print(response.output_text)

Why did the scarecrow win an award?  

Because he was outstanding in his field! 🌾🤣


The argument `instructions`: **Inserts a system (or developer) message as the first item in the model's context.** [source](https://platform.openai.com/docs/api-reference/responses/create#responses-create-instructions)






In [8]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[{"role": "developer", "content": "You return markdown and lots of emojis. "}, {"role": "user", "content": "Tell a quick dad joke!"}],
    instructions="You must talk like a pirate and do not return any markdown or emojis.",
)

print(response.output_text)

Why did the scarecrow win an award? Because he was outstanding in his field!


<img src="static_blog_imgs/instructions_insertion.png" width="100%">

## Conversation State

We can manually handle the chat history using alternating `user` and `assistant` messages, just as previously done with the chat completions API.



In [9]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Chris, and my age is 40."},
        {"role": "assistant", "content": "Nice to meet you, Chris!"},
        {"role": "user", "content": "How old am I?"},
    ],
)

print(response.output_text)

You mentioned that you are 40 years old.


Alternately, we can use the `previous_response_id` parameter to manage conversation state.

In [10]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Chris, and my age is 40."},
    ],
)

print(response.output_text)
print(response.id)

Nice to meet you, Chris! If there's anything specific you'd like to talk about or ask, feel free to let me know!
resp_67dc6293a3fc8191beb395d9740cfe69018771069fff6258


In [11]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "How old am I"},
    ],
    previous_response_id=response.id,
)
print(response.output_text)
print(response.id)

You mentioned that you are 40 years old.
resp_67dc6294b01881918ed7ee04ab8408b8018771069fff6258


In [12]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "And what was my name?"},
    ],
    previous_response_id=response.id,
)
print(response.output_text)
print(response.id)

Your name is Chris.
resp_67dc62957bac8191a758ec17f3409ef8018771069fff6258


**Even when using previous_response_id, all previous input tokens for responses in the chain are billed as input tokens in the API.** [source](https://platform.openai.com/docs/guides/conversation-state#openai-apis-for-conversation-state)

When you view the logs in the dashboard, for this message there is a link/button to find the previous response.
<img src="static_blog_imgs/prev_response_log.png" width="100%">


**When using along with previous_response_id, the instructions from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.** [source](https://platform.openai.com/docs/api-reference/responses/create#responses-create-instructions)

**Note that the instructions parameter only applies to the current response generation request. If you are managing conversation state with the `previous_response_id` parameter, the instructions used on previous turns will not be present in the context. If you'd like to persist the same model instructions across turns, use a developer message instead.** [source](https://platform.openai.com/docs/guides/text#message-roles-and-instruction-following)

In [13]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "What country is the city Toronto in?"},
    ],
    instructions= 'You only write lower case letters'
)
print(response.output_text)
response.id

toronto is in canada.


'resp_67dc629656bc81918e3e3f328c351ce1016a810fe9349c9f'

In [14]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "What country was it again?"},
    ],
    previous_response_id=response.id
)
print(response.output_text)

Toronto is in Canada.


In [15]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "I forget, what was it?"},
    ],
    instructions= 'You only write UPPER CASE letters',
    previous_response_id=response.id
)
print(response.output_text)

TORONTO IS IN CANADA.


Of course there all the other OpenAI LLM goodies such as function calling, structured outputs, streaming, analyzing images, etc. I assume there
are slight changes to the API calls but I am already familiar with these things from the chat completions API, so no point in going over them here.

## Built-in tools

Something new here to me is the ability to use built-in tools with the Responses API.
As of writing, the built-in tools include things like web search, file search, computer use, and function calling.
I'm already familiar with tool/function calling. But let's take a look at some of these other tools.







In [31]:
# Importing these to make the output look nicer
from monsterui.all import render_md
from fasthtml.common import show

### Web Search

In [35]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[{"type": "web_search_preview"}],
    input="Did Alabama State win their first game of the March Madness tournament in 2025?" # web_search_preview_2025_03_11 points to a dated version of the tool
)

show(render_md(response.output_text))

In [36]:
response.to_dict()

{'id': 'resp_67dc694b6c88819193f172c1c74b47670cefcc8e86f5a6fd',
 'created_at': 1742498123.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'ws_67dc694bbc18819196799890345f1d740cefcc8e86f5a6fd',
   'status': 'completed',
   'type': 'web_search_call'},
  {'id': 'msg_67dc694e1ca8819193ae55dda79393f90cefcc8e86f5a6fd',
   'content': [{'annotations': [{'end_index': 636,
       'start_index': 483,
       'title': 'Alabama State stuns Saint Francis for last-second First Four victory',
       'type': 'url_citation',
       'url': 'https://www.reuters.com/sports/basketball/alabama-state-stuns-saint-francis-last-second-first-four-victory-2025-03-19/?utm_source=openai'},
      {'end_index': 898,
       'start_index': 690,
       'title': 'Alabama State stuns Saint Francis for last-second First Four victory',
       'type': 'url_citation',
       'url': 'https://www.reuters.com/sport

You can also force the use of the web_search_preview tool by using the tool_choice parameter, and setting it to `{type: "web_search_preview"}`:


In [45]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[{"type": "web_search_preview"}],
    tool_choice={"type": "web_search_preview"},
    input="Is there an upcoming federal election in Canada?"
)
show(render_md(response.output_text))


In [72]:
response.to_dict()

{'id': 'resp_67dc6f81973c81919e1903d5bb441b030a76b63a8b73a92a',
 'created_at': 1742499713.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'ws_67dc6f81a0a48191b27193dbd6b309570a76b63a8b73a92a',
   'status': 'completed',
   'type': 'web_search_call'},
  {'id': 'msg_67dc6f8469b481919426b0050aeafd450a76b63a8b73a92a',
   'content': [{'annotations': [{'end_index': 313,
       'start_index': 209,
       'title': '45th Canadian federal election - Wikipedia',
       'type': 'url_citation',
       'url': 'https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai'},
      {'end_index': 852,
       'start_index': 694,
       'title': 'Canada PM Carney to call for April 28 election on Sunday, Globe and Mail reports',
       'type': 'url_citation',
       'url': 'https://www.reuters.com/world/americas/canada-pm-carney-expected-call-snap-election-april-28-globe-

Note that for a web search we have:

- A `web_search_call` output item with the ID of the search call.

In [85]:
response.output[0]

ResponseFunctionWebSearch(id='ws_67dc6f81a0a48191b27193dbd6b309570a76b63a8b73a92a', status='completed', type='web_search_call')

- We have the annotations:




In [88]:
response.output[1].content[0].annotations

[AnnotationURLCitation(end_index=313, start_index=209, title='45th Canadian federal election - Wikipedia', type='url_citation', url='https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai'),
 AnnotationURLCitation(end_index=852, start_index=694, title='Canada PM Carney to call for April 28 election on Sunday, Globe and Mail reports', type='url_citation', url='https://www.reuters.com/world/americas/canada-pm-carney-expected-call-snap-election-april-28-globe-mail-reports-2025-03-20/?utm_source=openai'),
 AnnotationURLCitation(end_index=1134, start_index=909, title='Canada PM Carney to call for April 28 election on Sunday, Globe and Mail reports', type='url_citation', url='https://www.reuters.com/world/americas/canada-pm-carney-expected-call-snap-election-april-28-globe-mail-reports-2025-03-20/?utm_source=openai'),
 AnnotationURLCitation(end_index=1269, start_index=1137, title='Canada election expected to be held on April 28', type='url_citation', url='https://ww

- And we have the text response content:

In [89]:
response.output[1].content[0].text







"Yes, there is an upcoming federal election in Canada. The 45th Canadian federal election is scheduled to take place on or before October 20, 2025, as per the fixed-date provisions of the Canada Elections Act. ([en.m.wikipedia.org](https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai))\n\nHowever, recent developments suggest that an earlier election may be called. Prime Minister Mark Carney, who assumed office on March 9, 2025, after Justin Trudeau's resignation, is expected to announce a snap election for April 28, 2025. This decision aims to capitalize on the Liberal Party's improved standing in the polls, following tensions with U.S. President Donald Trump. ([reuters.com](https://www.reuters.com/world/americas/canada-pm-carney-expected-call-snap-election-april-28-globe-mail-reports-2025-03-20/?utm_source=openai))\n\n\n## Canada's Upcoming Federal Election Developments:\n- [Canada PM Carney to call for April 28 election on Sunday, Globe and Mail reports](h

To refine search results based on geography, you can specify an approximate user location using country, city, region, and/or timezone.




In [92]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[{
        "type": "web_search_preview",
        "user_location": {
            "type": "approximate",
            "country": "CA", #  two-letter ISO country code
            "city": "Halifax", # free text strings
            "region": "Nova Scotia", # free text strings
        }
    }],
    input="What are the best restaurants around Halifax?",
)

show(render_md(response.output_text))

The parameter `search_context_size` controls the number of search results. The tokens used by the search tool **do not** affect the context window of the main model.
Choosing the `search_context_size` parameter is a trade-off between cost, quality, and latency. The available values are 'high', 'medium', and 'low'. The default is 'medium'.
The [pricing page](https://platform.openai.com/docs/pricing) has all the details

## File Search