# Introduction

<img src="static_blog_imgs/agents_cogs_cartoon.png" width="40%">


In a [previous blog post](https://drchrislevy.github.io/blog), I explored various aspects of building AI agents, including coding agents, ReAct prompting, and tool-calling loops. Recently, OpenAI announced [new tools for building agents](https://openai.com/index/new-tools-for-building-agents/), introducing the Responses API and Agents SDK. OpenAI set the standard with their Chat Completions API, which has been widely adopted and extended by developers.

However, it's important to remain cautious. Remember what happened to the Assistants API? It never fully emerged from beta and is scheduled to be sunset in 2026. In contrast, these new APIs are not marked as beta, indicating that OpenAI feels more confident about this direction. Given their prior success with the Chat Completions API, there's reason to be optimistic that they might achieve similar success again. OpenAI has likely incorporated key learnings from their experience with the Assistants API into these latest developments, making these new tools intriguing and well worth exploring further.

# Responses API

OpenAI plans to continue supporting the [Chat Completions API](https://platform.openai.com/docs/guides/responses-vs-chat-completions#the-chat-completions-api-is-not-going-away). However, for new projects, they recommend using the newly introduced [Responses API](https://platform.openai.com/docs/api-reference/responses). 

One advantage of the Chat Completions API I've appreciated is its wide adoption by other LLM providers, making it easy to switch between services. Because of this flexibility, it may still be practical to use Chat Completions for some new projects. It remains to be seen whether other providers will adopt the Responses API as well.

Here's a screenshot from the OpenAI documentation explaining the Responses API:

<img src="static_blog_imgs/why_responses_api.png" width="75%">

### Key points to note about the Responses API:

- **Stateful**: It includes a `previous_response_id` to support long-running conversations.
- It is distinctly different from the Chat Completions API.
- If your application doesn't require built-in tools, you can confidently continue using **Chat Completions**.
- When ready for advanced capabilities tailored to agent workflows, the Responses API is recommended.
- The Responses API represents OpenAI's future direction for agent-building.


## Quickstart

I'm not going to go over all the details of the Responses API, because that's what the docs are for.
But I'm going to cover some things that are **new** to me.


In [172]:
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv(dotenv_path='../../.env')

client = OpenAI()

response = client.responses.create(model="gpt-4o-mini", input="Tell a quick dad joke!")

print(response)

Response(id='resp_67f17dae5a248191ad48845675e5f0270ec9ac340ff2e128', created_at=1743879598.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_67f17daf01d081919d4da886e8bbd6200ec9ac340ff2e128', content=[ResponseOutputText(annotations=[], text='Why did the scarecrow win an award? \n\nBecause he was outstanding in his field!', type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort=None, generate_summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=13, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=19, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_toke

In [2]:
response.to_dict()

{'id': 'resp_67edbed0ea6081919ac9fbae7f5d43840bb7a39cd0aed44a',
 'created_at': 1743634128.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'msg_67edbed1635c8191b0520cf21f7902180bb7a39cd0aed44a',
   'content': [{'annotations': [],
     'text': 'Why did the scarecrow win an award? \n\nBecause he was outstanding in his field!',
     'type': 'output_text'}],
   'role': 'assistant',
   'status': 'completed',
   'type': 'message'}],
 'parallel_tool_calls': True,
 'temperature': 1.0,
 'tool_choice': 'auto',
 'tools': [],
 'top_p': 1.0,
 'max_output_tokens': None,
 'previous_response_id': None,
 'reasoning': {'effort': None, 'generate_summary': None},
 'status': 'completed',
 'text': {'format': {'type': 'text'}},
 'truncation': 'disabled',
 'usage': {'input_tokens': 13,
  'input_tokens_details': {'cached_tokens': 0},
  'output_tokens': 19,
  'output_tokens_details': {'reasoning_

In [3]:
print(response.output[0].content[0].text)

Why did the scarecrow win an award? 

Because he was outstanding in his field!


Or a little shortcut:

In [4]:
print(response.output_text)

Why did the scarecrow win an award? 

Because he was outstanding in his field!


- Response objects are saved for 30 days by default. You can disable this behavior by setting store to false when creating a Response.
- Can be viewed in the dashboard logs page or retrieved via the API.

I never inspected traces in chat completions before because I don't think they are enabled by default.
But here with Responses API, you can inspect traces by default for 30 days unless you disable it on the API call.

<img src="static_blog_imgs/responses_logs.png" width="100%">

<img src="static_blog_imgs/responses_log_example.png" width="100%">

Or retrieve traces via the API.

In [5]:
client.responses.retrieve(response.id)

Response(id='resp_67edbed0ea6081919ac9fbae7f5d43840bb7a39cd0aed44a', created_at=1743634128.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_67edbed1635c8191b0520cf21f7902180bb7a39cd0aed44a', content=[ResponseOutputText(annotations=[], text='Why did the scarecrow win an award? \n\nBecause he was outstanding in his field!', type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort=None, generate_summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=13, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=19, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_toke

## Instruction Following

In [6]:
response = client.responses.create(model="gpt-4o-mini", instructions="You return markdown and lots of emojis. ", input="Tell a quick dad joke!")
print(response.output_text)

Why did the scarecrow win an award? 🌾🏆

Because he was outstanding in his field! 😂


*The instructions parameter gives the model high-level instructions on how it should behave while generating a response, including tone, goals, and examples of correct responses. Any instructions provided this way will take priority over a prompt in the input parameter.* [source](https://platform.openai.com/docs/guides/text#message-roles-and-instruction-following)

This example is roughly equivalent to:

In [7]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[{"role": "developer", "content": "You return markdown and lots of emojis. "}, {"role": "user", "content": "Tell a quick dad joke!"}],
)

print(response.output_text)

Why did the scarecrow win an award?  

Because he was outstanding in his field! 🌾😂


The argument `instructions` is used to insert a system (or developer) message as the first item in the model's context [source](https://platform.openai.com/docs/api-reference/responses/create#responses-create-instructions).






In [8]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[{"role": "developer", "content": "You return markdown and lots of emojis. "}, {"role": "user", "content": "Tell a quick dad joke!"}],
    instructions="You must talk like a pirate and do not return any markdown or emojis.",
)

print(response.output_text)

Why did the pirate go to school? To improve his "arrrticulation!"


<img src="static_blog_imgs/instructions_insertion.png" width="100%">

## Conversation State

We can manually handle the chat history using alternating `user` and `assistant` messages, just as previously done with the chat completions API.



In [9]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Chris, and my age is 40."},
        {"role": "assistant", "content": "Nice to meet you, Chris!"},
        {"role": "user", "content": "How old am I?"},
    ],
)

print(response.output_text)

You're 40 years old.


Alternately, we can use the `previous_response_id` parameter to manage conversation state.

In [10]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Chris, and my age is 40."},
    ],
)

print(response.output_text)
print(response.id)

Nice to meet you, Chris! How can I assist you today?
resp_67edbef36968819181d0bb3ca233c9b2018e3a257691136c


In [11]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "How old am I"},
    ],
    previous_response_id=response.id,
)
print(response.output_text)
print(response.id)

You mentioned that you are 40 years old.
resp_67edbef6e18881919dcb074c38a4e75e018e3a257691136c


In [12]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "And what was my name?"},
    ],
    previous_response_id=response.id,
)
print(response.output_text)
print(response.id)

Your name is Chris.
resp_67edbef7fa90819186efba3354f2a0bd018e3a257691136c


When using `previous_response_id`, all previous input tokens for responses in the chain are billed as input tokens in the API [source](https://platform.openai.com/docs/guides/conversation-state#openai-apis-for-conversation-state).

When you view the logs in dashboard for a message that used `previous_response_id`, there is a link/button to find the previous response.

<img src="static_blog_imgs/prev_response_log.png" width="100%">


When using `previous_response_id`, the **instructions** from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses [source](https://platform.openai.com/docs/api-reference/responses/create#responses-create-instructions). The instructions parameter only applies to the current response generation request.

In [13]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "What country is the city Toronto in?"},
    ],
    instructions="You only write lower case letters",
)
print(response.output_text)
response.id

toronto is in canada.


'resp_67edbefcba848191bbf8762e794e3144063735a4e8fd49c8'

In [14]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "What country was it again?"},
    ],
    previous_response_id=response.id,
)
print(response.output_text)

Toronto is in Canada.


In [15]:
response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "I forget, what was it?"},
    ],
    instructions="You only write UPPER CASE letters",
    previous_response_id=response.id,
)
print(response.output_text)

TORONTO IS IN CANADA.


Of course there all the other OpenAI LLM goodies such as function calling, structured outputs, streaming, analyzing images, and so on.
They even released a bunch of new audio features (see [here](https://openai.com/index/introducing-our-next-generation-audio-models/()) and [here](https://platform.openai.com/docs/guides/audio()))
I'm just going to cover some thins that are new to me, as a way to get familiar with some of the new features.

## Built-in tools

Something new here to me is the ability to use built-in tools with the Responses API.
As of writing, the built-in tools include things like web search, file search, computer use, and function calling.
I'm already familiar with tool/function calling. But let's take a look at some of these other tools.







In [16]:
# Importing these to make the output look nicer
from fasthtml.common import show
from monsterui.all import render_md

### Web Search

In [17]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[{"type": "web_search_preview"}],
    input="Did Alabama State win their first game of the March Madness tournament in 2025?",  # web_search_preview_2025_03_11 points to a dated version of the tool
)

show(render_md(response.output_text))

In [18]:
response.to_dict()

{'id': 'resp_67edbf09d9588191a91f13d6104c5a320d2bf909fd3c9e1b',
 'created_at': 1743634185.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'ws_67edbf0a2498819184d034b0930e37460d2bf909fd3c9e1b',
   'status': 'completed',
   'type': 'web_search_call'},
  {'id': 'msg_67edbf0bf4b4819196f84b35112a4a0d0d2bf909fd3c9e1b',
   'content': [{'annotations': [{'end_index': 397,
       'start_index': 244,
       'title': 'Alabama State stuns Saint Francis for last-second First Four victory',
       'type': 'url_citation',
       'url': 'https://www.reuters.com/sports/basketball/alabama-state-stuns-saint-francis-last-second-first-four-victory-2025-03-19/?utm_source=openai'},
      {'end_index': 776,
       'start_index': 626,
       'title': 'No. 1 overall seed Auburn puts away Alabama State 83-63 to open March Madness - WTOP News',
       'type': 'url_citation',
       'url': 'https://

You can also force the use of the web_search_preview tool by using the tool_choice parameter, and setting it to `{type: "web_search_preview"}`:


In [19]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[{"type": "web_search_preview"}],
    tool_choice={"type": "web_search_preview"},
    input="Is there an upcoming federal election in Canada?",
)
show(render_md(response.output_text))

In [20]:
response.to_dict()

{'id': 'resp_67edbf0d8fe481919fdf71321c16b9ea001b9061119ef2f9',
 'created_at': 1743634189.0,
 'error': None,
 'incomplete_details': None,
 'instructions': None,
 'metadata': {},
 'model': 'gpt-4o-mini-2024-07-18',
 'object': 'response',
 'output': [{'id': 'ws_67edbf0da74081918431bf3330c3e50a001b9061119ef2f9',
   'status': 'completed',
   'type': 'web_search_call'},
  {'id': 'msg_67edbf0fe9788191b1eaeb310e55c7ce001b9061119ef2f9',
   'content': [{'annotations': [{'end_index': 407,
       'start_index': 303,
       'title': '45th Canadian federal election - Wikipedia',
       'type': 'url_citation',
       'url': 'https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai'},
      {'end_index': 1079,
       'start_index': 909,
       'title': 'More Canadians want next federal election in 2024 than 2025: Nanos | CTV News',
       'type': 'url_citation',
       'url': 'https://www.ctvnews.ca/politics/nearly-1-in-2-canadians-would-prefer-the-next-federal-election-happen

Note that for a web search we have:

- A `web_search_call` output item with the ID of the search call.

In [21]:
response.output[0]

ResponseFunctionWebSearch(id='ws_67edbf0da74081918431bf3330c3e50a001b9061119ef2f9', status='completed', type='web_search_call')

- the annotations:




In [22]:
response.output[1].content[0].annotations

[AnnotationURLCitation(end_index=407, start_index=303, title='45th Canadian federal election - Wikipedia', type='url_citation', url='https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai'),
 AnnotationURLCitation(end_index=1079, start_index=909, title='More Canadians want next federal election in 2024 than 2025: Nanos | CTV News', type='url_citation', url='https://www.ctvnews.ca/politics/nearly-1-in-2-canadians-would-prefer-the-next-federal-election-happen-before-2025-nanos-survey-1.6709260?utm_source=openai'),
 AnnotationURLCitation(end_index=1580, start_index=1359, title="Canada gov't could avoid confidence vote with slim mini budget, says source", type='url_citation', url='https://www.reuters.com/world/americas/canada-govt-could-avoid-confidence-vote-with-slim-mini-budget-says-source-2024-10-10/?utm_source=openai'),
 AnnotationURLCitation(end_index=1782, start_index=1583, title='Canada PM Trudeau reflecting on criticism amid leadership crisis, says ally', 

- text content:

In [23]:
response.output[1].content[0].text

"As of April 2, 2025, the next Canadian federal election is scheduled for October 20, 2025, in accordance with the fixed-date provisions of the Canada Elections Act. This date may be adjusted to October 27, 2025, to avoid conflicting with the Hindu festival of Diwali and municipal elections in Alberta. ([en.m.wikipedia.org](https://en.m.wikipedia.org/wiki/45th_Canadian_federal_election?utm_source=openai))\n\nHowever, recent political developments have introduced the possibility of an earlier election. Prime Minister Justin Trudeau's minority government has faced internal challenges, including the resignation of Finance Minister Chrystia Freeland, who criticized Trudeau's leadership. Additionally, a significant number of Canadians have expressed a preference for an earlier election. A Nanos Research survey indicated that 46% of respondents would prefer the next federal election to occur before 2025. ([ctvnews.ca](https://www.ctvnews.ca/politics/nearly-1-in-2-canadians-would-prefer-the-n

To refine search results based on geography, you can specify an approximate user location using country, city, region, and/or timezone.




In [24]:
response = client.responses.create(
    model="gpt-4o-mini",
    tools=[
        {
            "type": "web_search_preview",
            "user_location": {
                "type": "approximate",
                "country": "CA",  #  two-letter ISO country code
                "city": "Halifax",  # free text strings
                "region": "Nova Scotia",  # free text strings
            },
        }
    ],
    input="What are the best restaurants around Halifax?",
)

show(render_md(response.output_text))

The parameter `search_context_size` controls the number of search results. The tokens used by the search tool **do not** affect the context window of the main model.
Choosing the `search_context_size` parameter is a trade-off between cost, quality, and latency. The available values are 'high', 'medium', and 'low'. The default is 'medium'.
The [pricing page](https://platform.openai.com/docs/pricing) has all the details

In [25]:
response = client.responses.create(
    model="gpt-4o",
    tools=[
        {
            "type": "web_search_preview",
            "user_location": {
                "type": "approximate",
            },
            "search_context_size": "high",
        }
    ],
    input="Give me a markdown table of all the 2025 March Madness games and scores that have been played so far.",
)

show(render_md(response.output_text))

Round,Date,Matchup,Score
First Four,"March 18, 2025",(16) Alabama State vs. (16) Saint Francis,70-68
,,(11) North Carolina vs. (11) San Diego State,95-68
,"March 19, 2025",(16) Mount St. Mary's vs. (16) American University,83-72
,,(11) Xavier vs. (11) Texas,86-80
First Round,"March 20, 2025",(9) Creighton vs. (8) Louisville,89-75
,,(4) Purdue vs. (13) High Point,75-63
,,(3) Wisconsin vs. (14) Montana,85-66
,,(1) Houston vs. (16) SIU Edwardsville,78-40
,,(1) Auburn vs. (16) Alabama State,83-63
,,(12) McNeese vs. (5) Clemson,69-67


## File Search

File search in the Responses API allows models to easily access information from previously uploaded files using semantic and keyword search. By uploading files to vector stores, you enhance the model's built-in knowledge without additional coding—OpenAI manages everything automatically. It's essentially RAG (Retrieval-Augmented Generation) hosted by OpenAI. The documentation is [here](https://platform.openai.com/docs/guides/tools-file-search).

**TODO:** Come back to this when interested.

## Computer Use

**TODO:** Come back when interested. See [here](https://platform.openai.com/docs/guides/tools-computer-use).

# Agent SDK

This is what I came here to learn more about.
I already wrote about [agents part 1](https://drchrislevy.github.io/blog) extensively in a previous blog post, but I never used any frameworks.
I built from scratch to understand the concepts.

There are a lot of pieces here that can be glued together, as seen from the OpenAI documentation [here](https://platform.openai.com/docs/guides/agents).
Here is an image from their documentation showing the components:

<img src="static_blog_imgs/openai_agent_components.png" width="75%">

<img src="static_blog_imgs/model_use_cases.png" width="75%">





## Hello World

When you read the [docs](https://openai.github.io/openai-agents-python/#hello-world-example)
and try to run the opening example (which uses `run_sync`), it won't work from a notebook. `run_sync` is a convenience method that wraps async code to run synchronously by creating a new event loop. It's meant for simple synchronous scripts. However, this causes problems in environments that already have an event loop running, such as Jupyter notebooks, etc.  So we will use `run` with `await` directly at the notebook level when required.

In [26]:
from agents import Agent, Runner

agent = Agent(name="Chris", instructions="You are a helpful assistant.")
result = await Runner.run(agent, "Hey")
print(result.final_output)

Hello! How can I assist you today?


In [27]:
result.to_input_list()

[{'content': 'Hey', 'role': 'user'},
 {'id': 'msg_67edbf2eb5d88191a3c25734c4d1164c05e7df807c96b84c',
  'content': [{'annotations': [],
    'text': 'Hello! How can I assist you today?',
    'type': 'output_text'}],
  'role': 'assistant',
  'status': 'completed',
  'type': 'message'}]

In [28]:
result

RunResult(input='Hey', new_items=[MessageOutputItem(agent=Agent(name='Chris', instructions='You are a helpful assistant.', handoff_description=None, handoffs=[], model=None, model_settings=ModelSettings(temperature=None, top_p=None, frequency_penalty=None, presence_penalty=None, tool_choice=None, parallel_tool_calls=False, truncation=None, max_tokens=None), tools=[], mcp_servers=[], input_guardrails=[], output_guardrails=[], output_type=None, hooks=None, tool_use_behavior='run_llm_again', reset_tool_choice=True), raw_item=ResponseOutputMessage(id='msg_67edbf2eb5d88191a3c25734c4d1164c05e7df807c96b84c', content=[ResponseOutputText(annotations=[], text='Hello! How can I assist you today?', type='output_text')], role='assistant', status='completed', type='message'), type='message_output_item')], raw_responses=[ModelResponse(output=[ResponseOutputMessage(id='msg_67edbf2eb5d88191a3c25734c4d1164c05e7df807c96b84c', content=[ResponseOutputText(annotations=[], text='Hello! How can I assist you t

## Building an Agent with Coding Abilities

In my [Part 1 blog post on agents](https://drchrislevy.github.io//blog), I implemented a CodeACT-style agent from scratch.  
This agent was able to express all of its actions directly in code, rather than relying on JSON-style tools.

Now that we're using the OpenAI Agent SDK, we're limited to JSON-style tools, so we can't use the CodeACT approach directly.  
I actually created an [issue on the OpenAI Agents SDK GitHub repo](https://github.com/openai/openai-agents-python/issues/383) to ask whether there are any future plans to support CodeACT-style agents within the SDK.  
I think it would be a great addition, though it would likely require a fair amount of work to implement. There are no known plans at the time of writing this.

Anyway, back to the task at hand. Let's build an agent using the OpenAI Agent SDK that can write arbitrary Python code to solve a task.  
We'll execute that code in a sandboxed environment using Modal. I also covered this in my [Part 1 blog post](https://drchrislevy.github.io//blog).

Here is the code which uses Modal to execute the code in a sandboxed environment:


In [140]:
import json

import modal

# Create image with IPython installed
image = modal.Image.debian_slim().pip_install("ipython", "pandas")


# Create the driver program that will run in the sandbox
def create_driver_program():
    return """
import json
import sys
import re
from IPython.core.interactiveshell import InteractiveShell
from IPython.utils.io import capture_output

def strip_ansi_codes(text):
    ansi_escape = re.compile(r'\\x1B(?:[@-Z\\\\-_]|\\[[0-?]*[ -/]*[@-~])')
    return ansi_escape.sub('', text)

# Create a persistent IPython shell instance
shell = InteractiveShell()
shell.colors = 'NoColor'  # Disable color output
shell.autoindent = False  # Disable autoindent

# Keep reading commands from stdin
while True:
    try:
        # Read a line of JSON from stdin
        command = json.loads(input())
        code = command.get('code')
        
        if code is None:
            print(json.dumps({"error": "No code provided"}))
            continue
            
        # Execute the code and capture output
        with capture_output() as captured:
            result = shell.run_cell(code)

        # Clean the outputs
        stdout = strip_ansi_codes(captured.stdout)
        stderr = strip_ansi_codes(captured.stderr)
        error = strip_ansi_codes(str(result.error_in_exec)) if not result.success else None

        # Format the response
        response = {
            "stdout": stdout,
            "stderr": stderr,
            "success": result.success,
            "result": repr(result.result) if result.success else None,
            "error": error
        }
        
        # Send the response
        print(json.dumps(response), flush=True)
        
    except Exception as e:
        print(json.dumps({"error": strip_ansi_codes(str(e))}), flush=True)
"""


def create_sandbox(timeout=300):
    """Creates and returns a Modal sandbox running an IPython shell."""
    app = modal.App.lookup("ipython-sandbox", create_if_missing=True)

    # Create the sandbox with the driver program
    with modal.enable_output():
        sandbox = modal.Sandbox.create("python", "-c", create_driver_program(), image=image, app=app, timeout=timeout)

    return sandbox


def execute_python_code(code: str, sandbox=None) -> dict:
    created_sandbox = False
    if sandbox is None:
        sandbox = create_sandbox()
        created_sandbox = True
    # Send the code to the sandbox
    sandbox.stdin.write(json.dumps({"code": code}))
    sandbox.stdin.write("\n")
    sandbox.stdin.drain()

    # Get the response
    response = next(iter(sandbox.stdout))
    if created_sandbox:
        sandbox.terminate()
    return json.loads(response)

I will make it a `tool` that can be used by the agent.
We will also add the `WebSearchTool` to the agent as well.

In [167]:
from agents import WebSearchTool, function_tool

sb = create_sandbox()


@function_tool()
def execute_code(code: str) -> dict:
    """Execute the given Python code in a sandboxed environment and return the output."""
    return execute_python_code(code, sb)

In [89]:
coding_instructions = """
You solve tasks using an agentic coding loop in Python.

Follow this loop carefully:

- Think and write code. Send the code to the execute_code tool.
- Get code output results back.
- Think again, improve or add more code.
- Get code output results back.
...
- Repeat until you've solved the problem completely.

If you encounter any errors:

- FIX THEM and continue the loop.
- If modules are not found, install them using: !pip install <package_name>

Never give up. Continue iterating until the task is fully solved.
The sandbox environment is safe, isolated, and supports running arbitrary Python code.
State is maintained between code snippets:
Variables and definitions persist across executions.
# first code snippet
x = 2
print(x)
# second code snippet in separate request
y = 6
print(x + y)  # This works because state persists

Begin your agentic coding loop!

"""

coding_agent = Agent(
    name="Code Agent",
    instructions=coding_instructions,
    model="gpt-4o",
    tools=[execute_code, WebSearchTool()],
)

In [78]:
result = await Runner.run(
    coding_agent,
    """
                          Use an open source weather API to get the temperature in Halifax, Nova Scotia for the last 60 days. 
                          Then fit a statistical model to predict the temperature for the next week. 
                          Print the final predictions in a table (markdown format).
                          Todays date is 2025-04-02.""",
    max_turns=20,
)
print(result.final_output)

Here's the markdown table with the predicted mean temperatures for Halifax, Nova Scotia, from April 1 to April 7, 2025:

```markdown
| Date                |   Predicted_Mean_Temperature |
|:--------------------|-----------------------------:|
| 2025-04-01 00:00:00 |                      3.30113 |
| 2025-04-02 00:00:00 |                      2.24348 |
| 2025-04-03 00:00:00 |                      2.5076  |
| 2025-04-04 00:00:00 |                      2.32606 |
| 2025-04-05 00:00:00 |                      2.01458 |
| 2025-04-06 00:00:00 |                      2.52581 |
| 2025-04-07 00:00:00 |                      2.66123 |
```

These predictions were generated using an ARIMA model based on the historical temperature data.


There are some different attributes that can be accessed on the result object. Here are some others worth noting:

In [79]:
print(result.input)


                          Use an open source weather API to get the temperature in Halifax, Nova Scotia for the last 60 days. 
                          Then fit a statistical model to predict the temperature for the next week. 
                          Print the final predictions in a table (markdown format).
                          Todays date is 2025-04-02.


In [80]:
for r in result.raw_responses:
    print(r)

ModelResponse(output=[ResponseFunctionWebSearch(id='ws_67edcd295da481919095a35d1770ce060e2583e709c47d03', status='completed', type='web_search_call'), ResponseOutputMessage(id='msg_67edcd2b183081919aae61ed7d739ca60e2583e709c47d03', content=[ResponseOutputText(annotations=[AnnotationURLCitation(end_index=358, start_index=239, title='🏛️ Historical Weather API | Open-Meteo.com', type='url_citation', url='https://open-meteo-website.pages.dev/en/docs/historical-weather-api?utm_source=openai')], text='To obtain the temperature data for Halifax, Nova Scotia over the past 60 days and predict the temperatures for the next week, we can utilize the Open-Meteo API, which provides free access to historical weather data for non-commercial use. ([open-meteo-website.pages.dev](https://open-meteo-website.pages.dev/en/docs/historical-weather-api?utm_source=openai))\n\n**Step 1: Retrieve Historical Weather Data**\n\nWe\'ll use the Open-Meteo Historical Weather API to fetch daily mean temperatures for Hal

In [81]:
result.new_items

[ToolCallItem(agent=Agent(name='Code Agent', instructions="\nYou solve tasks using an agentic coding loop in Python.\n\nFollow this loop carefully:\n\n- Think and write code. Send the code to the execute_code tool.\n- Get code output results back.\n- Think again, improve or add more code.\n- Get code output results back.\n...\n- Repeat until you've solved the problem completely.\n\nIf you encounter any errors:\n\n- FIX THEM and continue the loop.\n- If modules are not found, install them using: !pip install <package_name>\n\nNever give up. Continue iterating until the task is fully solved.\nThe sandbox environment is safe, isolated, and supports running arbitrary Python code.\nState is maintained between code snippets:\nVariables and definitions persist across executions.\n# first code snippet\nx = 2\nprint(x)\n# second code snippet in separate request\ny = 6\nprint(x + y)  # This works because state persists\n\nBegin your agentic coding loop!\n\n", handoff_description=None, handoffs=[

In [82]:
result.to_input_list()

[{'content': '\n                          Use an open source weather API to get the temperature in Halifax, Nova Scotia for the last 60 days. \n                          Then fit a statistical model to predict the temperature for the next week. \n                          Print the final predictions in a table (markdown format).\n                          Todays date is 2025-04-02.',
  'role': 'user'},
 {'id': 'ws_67edcd295da481919095a35d1770ce060e2583e709c47d03',
  'status': 'completed',
  'type': 'web_search_call'},
 {'id': 'msg_67edcd2b183081919aae61ed7d739ca60e2583e709c47d03',
  'content': [{'annotations': [{'end_index': 358,
      'start_index': 239,
      'title': '🏛️ Historical Weather API | Open-Meteo.com',
      'type': 'url_citation',
      'url': 'https://open-meteo-website.pages.dev/en/docs/historical-weather-api?utm_source=openai'}],
    'text': 'To obtain the temperature data for Halifax, Nova Scotia over the past 60 days and predict the temperatures for the next week, we

We could hack together a function to display the conversation like this:

In [87]:
def show_conversation(conversation):
    from ast import literal_eval

    for message in conversation:
        if message.get("role") == "user":
            show(render_md("### User:"))
            show(render_md(message.get("content", "")))

        elif message.get("role") == "assistant":
            show(render_md("### Assistant:"))
            # Handle the new content format which is a list of dictionaries
            if isinstance(message.get("content"), list):
                for content_item in message["content"]:
                    if content_item.get("type") == "output_text":
                        show(render_md(content_item.get("text", "")))

        # Handle tool calls (like execute_code)
        elif message.get("type") == "function_call":
            show(render_md(f"### Tool Call: {message.get('name')}"))
            if "arguments" in message:
                arguments = json.loads(message["arguments"])
                for k, v in arguments.items():
                    show(render_md(f"#### Argument: {k}"))
                    show(render_md(f"```python\n{v}\n```"))

        # Handle tool outputs
        elif message.get("type") == "function_call_output":
            output = message.get("output", {})
            output = literal_eval(output)
            for k, v in output.items():
                if not v:
                    continue
                show(render_md(f"#### Output: {k}"))
                show(render_md(f"```python\n{v}\n```"))
        elif message.get("type") == "web_search_call":
            show(render_md("### Performing Web Search:"))
        else:
            raise Exception(f"Unknown message type: {message.get('type')}")


# Use it like this:
show_conversation(result.to_input_list())

This view of everything that was done, as well as traces, can be viewed in the OpenAI traces dashboard

<img src="static_blog_imgs/code_agent_traces.png" width="100%">

Note: I could have pre-installed those pip packages but it's nice to see the agent fix errors on its own.

## Multi-Agent Collaboration

The Agents SDK supports multi-agent collaboration via handoffs.


In [156]:
from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX

print(RECOMMENDED_PROMPT_PREFIX)

# System context
You are part of a multi-agent system called the Agents SDK, designed to make agent coordination and execution easy. Agents uses two primary abstraction: **Agents** and **Handoffs**. An agent encompasses instructions and tools and can hand off a conversation to another agent when appropriate. Handoffs are achieved by calling a handoff function, generally named `transfer_to_<agent_name>`. Transfers between agents are handled seamlessly in the background; do not mention or draw attention to these transfers in your conversation with the user.



In [157]:
review_agent = Agent(
    name="Review Agent",
    instructions=f"{RECOMMENDED_PROMPT_PREFIX} You need to review the code, logic, and answer provided by the coding agent. If the coding agent does not provide a complete answer then explain to the coding agent what it needs to do to fix it.",
    model="gpt-4o",
    handoff_description="Specialist agent for reviewing code",
    handoffs=[coding_agent],
)

coding_agent = Agent(
    name="Code Agent",
    instructions=f"{RECOMMENDED_PROMPT_PREFIX} {coding_instructions}. After completing your task, hand off the results to the review agent for final review.",
    model="gpt-4o",
    tools=[execute_code, WebSearchTool()],
    handoff_description="Specialist agent for coding and executing code",
    handoffs=[review_agent],
)

triage_agent = Agent(
    name="Triage Agent",
    instructions=f"{RECOMMENDED_PROMPT_PREFIX} You determine which agent to use based on the users request. You start by handing off the request and details to the coding agent.",
    model="gpt-4o",
    handoff_description="Specialist agent for triaging tasks",
    handoffs=[coding_agent, review_agent],
)

In [158]:
result = await Runner.run(
    triage_agent,
    "I need a table of dow jones closing prices for the last 30 days. Present the results in markdown format. Also compute the summary statistics of the closing prices using Python code.",
)
print(result.final_output)

Here are the summary statistics for the Dow Jones closing prices over the last 30 days:

- **Mean Closing Price:** 44,585.69
- **Median Closing Price:** 44,556.04
- **Standard Deviation:** 185.31
- **Minimum Closing Price:** 44,303.40
- **Maximum Closing Price:** 44,882.13

If you need any further assistance, feel free to ask!


In [159]:
result.to_input_list()

[{'content': 'I need a table of dow jones closing prices for the last 30 days. Present the results in markdown format. Also compute the summary statistics of the closing prices using Python code.',
  'role': 'user'},
 {'arguments': '{}',
  'call_id': 'call_wRBF3CRHCzUQ30m66M6Y8SXO',
  'name': 'transfer_to_code_agent',
  'type': 'function_call',
  'id': 'fc_67ee75c7f78c81918f256657292ab05a0cc8de9de06d4715',
  'status': 'completed'},
 {'call_id': 'call_wRBF3CRHCzUQ30m66M6Y8SXO',
  'output': "{'assistant': 'Code Agent'}",
  'type': 'function_call_output'},
 {'id': 'ws_67ee75c8eb8c8191a234f8b2298213ae0cc8de9de06d4715',
  'status': 'completed',
  'type': 'web_search_call'},
 {'id': 'msg_67ee75cafaa08191a68f4d723ed6edb30cc8de9de06d4715',
  'content': [{'annotations': [{'end_index': 1262,
      'start_index': 1175,
      'title': 'Dow Jones Industrial Average Historical Data - DJI | ADVFN',
      'type': 'url_citation',
      'url': 'https://www.advfn.com/stock-market/DOWI/DJI/historical?utm_

In [160]:
show_conversation(result.to_input_list())

Date,Closing Price
2025-04-02,44711.43
2025-04-01,44368.56
2025-03-31,44593.65
2025-03-28,44470.41
2025-03-27,44303.4
2025-03-26,44747.63
2025-03-25,44873.28
2025-03-24,44556.04
2025-03-21,44421.91
2025-03-20,44544.66


## Pre-Defined Workflow

To be honest, I had a lot of issues getting the previous example to work the way I wanted. 
Maybe it's a skill issue? Maybe it's a prompting issue or an obvious bug in my code?
It does not hand off to the review agent at the end. 

An agentic workflow is probably overkill here for this task because 
it could be handled by a pre-defined workflow.

Let's show an example of a pre-defined workflow.

In [168]:
from pydantic import BaseModel


class ReviewResult(BaseModel):
    success: bool
    feedback: str


review_agent = Agent(
    name="Review Agent",
    instructions="""
    Review the work done by the coding agent. 
    Give it feedback on whether or not it was successful.
    Review the code and the output as well as the final answer to make sure it completely answers the user's request.
    If it does not, ask the coding agent to fix it.
    Note: The end user has no ability to view plots so always give feedback that results must be presented in markdown formatted tables instead.
    """,
    model="gpt-4o",
    output_type=ReviewResult,
)

coding_agent = Agent(
    name="Code Agent",
    instructions=coding_instructions,
    model="gpt-4o",
    tools=[execute_code, WebSearchTool()],
)

In [None]:
task = "I need data for dow jones closing prices for the last 30 days as well as a prediction for the next week. Plot the results in a line chart."
res = await Runner.run(coding_agent, task)
for i in range(5):
    res = await Runner.run(review_agent, res.to_input_list(), max_turns=20)
    if res.final_output.success:
        break
    else:
        res = await Runner.run(coding_agent, res.to_input_list(), max_turns=20)

In [166]:
res.to_input_list()

[{'content': 'I need data for dow jones closing prices for the last 30 days as well as a prediction for the next week. Plot the results in a line chart.',
  'role': 'user'},
 {'id': 'msg_67ee7627f89c8191ac7149c0ad293d230697dfdf62cfc520',
  'content': [{'annotations': [],
    'text': "To accomplish this task, I'll follow these steps:\n\n1. **Fetch the last 30 days of Dow Jones closing prices.**\n2. **Predict the closing prices for the next week using a time series forecasting model.**\n3. **Plot the historical and predicted prices in a line chart.**\n\nI'll begin by fetching the historical data.",
    'type': 'output_text'}],
  'role': 'assistant',
  'status': 'completed',
  'type': 'message'},
 {'id': 'ws_67ee762a3fc481918238bec9cfc808fb0697dfdf62cfc520',
  'status': 'completed',
  'type': 'web_search_call'},
 {'id': 'msg_67ee762c40088191beef5a6491e2b07b0697dfdf62cfc520',
  'content': [{'annotations': [],
    'text': "To obtain the Dow Jones Industrial Average (DJIA) closing prices for

In [165]:
show_conversation(res.to_input_list())

Date,Closing Price
2025-03-27,42299.7
2025-03-28,41583.9
2025-03-31,42001.76
2025-04-01,41989.96
2025-04-02,42225.32

Date,Forecasted Price
2025-04-03,42597.53
2025-04-04,42444.95
2025-04-05,42430.02
2025-04-06,42403.91
2025-04-07,42243.07
2025-04-08,42289.15
2025-04-09,42311.54


# Conclusion

After digging a little into OpenAI's nw suite of tools, I find myself with a balanced perspective on their offerings. The Responses API streamlines implementation, particularly for web search functionality, making it remarkably straightforward to incorporate into projects. But with it being so new, responses API format is currently limited to OpenAI's ecosystem. The practical benefits are tangible for rapid development.

The Agents SDK demonstrates potential for prototyping agentic systems without writing repetitive code. The interface is clean and intuitive, though I'm still evaluating where multi-agent approaches provide advantages over traditional workflows in practical scenarios. 

Files search, computer vision capabilities, and the new audio and text-to-speech models remain on my radar for future exploration. Their technical specifications suggest promising applications, though practical implementation will be the true test of their utility.

I think next I need to dip my toes into all the MCP hype.

