# Deep Research

One of the classic cross-business Agentic use cases!

## Goal

Build an end‑to‑end research pipeline that plans web searches, performs them (via a hosted WebSearchTool), synthesizes findings into a long-form report, and emails the result.

This lab demonstrates multimodal agent orchestration, structured outputs (Pydantic), hosted tools, and traceable execution.

## Learning outcomes

- Create agents with tooling requirements and structured output schemas.
- Compose planner → searcher → writer → emailer workflow.
- Use function tools to integrate third‑party services (SendGrid).
- Observe and debug execution using OpenAI traces.
- Control cost and concurrency (limit number of searches, run searches in parallel).

## Imports and Setup

In [2]:
from agents import Agent, WebSearchTool, trace, Runner, gen_trace_id, function_tool
from agents.model_settings import ModelSettings
from pydantic import BaseModel, Field
from dotenv import load_dotenv
import asyncio
import sendgrid
import os
from sendgrid.helpers.mail import Mail, Email, To, Content
from typing import Dict
from IPython.display import display, Markdown

In [3]:
load_dotenv(override=True)

True

# TOOLS

## OpenAI Hosted Tools

OpenAI Agents SDK includes the following hosted tools:

The `WebSearchTool` lets an agent search the web.  
The `FileSearchTool` allows retrieving information from your OpenAI Vector Stores.  
The `ComputerTool` allows automating computer use tasks like taking screenshots and clicking.

### Important note - API charge of WebSearchTool

Pricing: https://platform.openai.com/docs/pricing#web-search

Calls for OpenAI WebSearchTool can add up to $2-$3 for the next 2 labs. Sometimes charges for multiple searches in a single call can exceed that.

Skip running if cost is a concern. We'll use free and low cost Search tools with other platforms.

### WebSearchTool

A **Search Agent** uses the `WebSearchTool`. The agent can then use this tool to perform web searches as part of its reasoning process. We can specify `search_context_size`, an optional context length for the tool to control how much information it uses. Its values [low, medium, high] aligns with cost levels.

To ensure the agent uses the tool, we specify it in the `model_settings` parameter the mandatry use of tools as: 

```python
model_settings=ModelSettings(tool_choice="required"),
```

In [31]:
INSTRUCTIONS = "You are a research assistant. Given a search term, you search the web for that term and \
produce a concise summary of the results. The summary must 2-3 paragraphs and less than 300 \
words. Capture the main points. Write succintly, no need to have complete sentences or good \
grammar. This will be consumed by someone synthesizing a report, so it's vital you capture the \
essence and ignore any fluff. Do not include any additional commentary other than the summary itself."

search_agent = Agent(
    name="Search agent",
    instructions=INSTRUCTIONS,
    tools=[WebSearchTool(search_context_size="low")],
    model="gpt-4o-mini",
    model_settings=ModelSettings(tool_choice="required"),
)

In [None]:
message = "Latest AI Agent frameworks in 2025"

with trace("Search"):
    result = await Runner.run(search_agent, message)

display(Markdown(result.final_output))

**Output:**
![Search Agent Output](../img/search-websearchtool.png)

**Traces:**
![Search Agent Traces](../img/search-traces.png)

# E2E Deep Research Agent with Structured Output

## Planning Phase

### Planner Agent - with Structured Outputs

**Structured Outputs**

Structured outputs allow agents to return data in a predefined format, making it easier to parse and utilize the results. In this example, we define **Pydantic dataclasses** to structure the output of our search agent. Using `Field` descriptions helps provide clarity on each field's purpose.

**Planner Agent**

The Planner Agent generates a list of web search queries to gather information on a given topic. It uses structured outputs to ensure that each search query is accompanied by a clear rationale.

We create two Pydantic models: `WebSearchItem` to represent individual search queries and their reasoning, and `WebSearchPlan` to encapsulate a list of these search items. By asking the agent the rationale behind each search term, we encourage more thoughtful and relevant queries (similar to "chain-of-thought" prompting).

*Side note: asking for next-token prediction reflecting a resaon improves predicting the most likely next token for the query. This makes the output more consistent with the agent's internal reasoning.*

In [4]:
# Use Pydantic to define the Schema of our response - this is known as "Structured Outputs"

class WebSearchItem(BaseModel):
    reason: str = Field(description="Your reasoning for why this search is important to the query.")
    query: str = Field(description="The search term to use for the web search.")


class WebSearchPlan(BaseModel):
    searches: list[WebSearchItem] = Field(description="A list of web searches to perform to best answer the query.")

In [None]:
HOW_MANY_SEARCHES = 3 # limit # of searches to perform to control costs

INSTRUCTIONS = f"You are a helpful research assistant. Given a query, come up with a set of web searches \
to perform to best answer the query. Output {HOW_MANY_SEARCHES} terms to query for."

planner_agent = Agent(
    name="PlannerAgent",
    instructions=INSTRUCTIONS,
    model="gpt-4o-mini",
    output_type=WebSearchPlan,
)

In [None]:
message = "Latest AI Agent frameworks in 2025"

with trace("Search"):
    result = await Runner.run(planner_agent, message)
    print(result.final_output)

![WebSearchPlan Structured Output](../img/websearch-plan-output.png)

### Send Email - Function tool

Define custom tools `send_email` as tools to send emails (using third-party services like SendGrid, Resend, etc). 

In [6]:
EMAIL = os.getenv("EMAIL")
SENDGRID_API_KEY = os.getenv("SENDGRID_API_KEY")

In [7]:
@function_tool
def send_email(subject: str, html_body: str) -> Dict[str, str]:
    """ Send out an email with the given subject and HTML body """
    sg = sendgrid.SendGridAPIClient(api_key=SENDGRID_API_KEY)
    from_email = Email(EMAIL)
    to_email = To(EMAIL)
    content = Content("text/html", html_body)
    mail = Mail(from_email, to_email, subject, content).get()
    response = sg.client.mail.send.post(request_body=mail)
    return {"status": "success"}

In [8]:
send_email

FunctionTool(name='send_email', description='Send out an email with the given subject and HTML body', params_json_schema={'properties': {'subject': {'title': 'Subject', 'type': 'string'}, 'html_body': {'title': 'Html Body', 'type': 'string'}}, 'required': ['subject', 'html_body'], 'title': 'send_email_args', 'type': 'object', 'additionalProperties': False}, on_invoke_tool=<function function_tool.<locals>._create_function_tool.<locals>._on_invoke_tool at 0x1114993a0>, strict_json_schema=True, is_enabled=True)

In [None]:
INSTRUCTIONS = """You are able to send a nicely formatted HTML email based on a detailed report.
You will be provided with a detailed report. You should use your tool to send one email, providing the 
report converted into clean, well presented HTML with an appropriate subject line."""

email_agent = Agent(
    name="Email agent",
    instructions=INSTRUCTIONS,
    tools=[send_email],
    model="gpt-4o-mini",
)

### Writer Agent - with Structured Outputs

The Writer Agent writes a report based on the research findings from the Planner Agent. It also uses structured outputs to ensure the report is well-organized and easy to read.

In [38]:
INSTRUCTIONS = (
    "You are a senior researcher tasked with writing a cohesive report for a research query. "
    "You will be provided with the original query, and some initial research done by a research assistant.\n"
    "You should first come up with an outline for the report that describes the structure and "
    "flow of the report. Then, generate the report and return that as your final output.\n"
    "The final output should be in markdown format, and it should be lengthy and detailed. Aim "
    "for 5-10 pages of content, at least 1000 words."
)


class ReportData(BaseModel):
    short_summary: str = Field(description="A short 2-3 sentence summary of the findings.")

    markdown_report: str = Field(description="The final report")

    follow_up_questions: list[str] = Field(description="Suggested topics to research further")


writer_agent = Agent(
    name="WriterAgent",
    instructions=INSTRUCTIONS,
    model="gpt-4o-mini",
    output_type=ReportData,
)

## Search Execution

The next 3 functions will plan and execute the searches, and write the report.

Coroutines:
- `plan_searches(topic) -> WebSearchPlan` using `planner_agent`
- `perform_searches(plan) -> list[str]` using `search_agent` (parallel execution with `asyncio.gather`)
- `write_report(query, search_results) -> ReportData` using `writer_agent`

In [39]:
async def plan_searches(query: str):
    """ Use the planner_agent to plan which searches to run for the query """
    print("Planning searches...")
    result = await Runner.run(planner_agent, f"Query: {query}")
    print(f"Will perform {len(result.final_output.searches)} searches")
    return result.final_output

async def perform_searches(search_plan: WebSearchPlan):
    """ Call search() for each item in the search plan """
    print("Searching...")
    tasks = [asyncio.create_task(search(item)) for item in search_plan.searches]
    results = await asyncio.gather(*tasks)
    print("Finished searching")
    return results

async def search(item: WebSearchItem):
    """ Use the search agent to run a web search for each item in the search plan """
    input = f"Search term: {item.query}\nReason for searching: {item.reason}"
    result = await Runner.run(search_agent, input)
    return result.final_output

## Reporting & Emailing

The final step is to compile the report and send it via email using the `send_email` tool defined earlier. This allows for easy sharing of the research findings with stakeholders.

Coroutines:
- `write_report(query, search_results) -> ReportData` using `writer_agent`
- `send_email(report) -> None` using `send_email` tool

In [40]:
async def write_report(query: str, search_results: list[str]):
    """ Use the writer agent to write a report based on the search results"""
    print("Thinking about report...")
    input = f"Original query: {query}\nSummarized search results: {search_results}"
    result = await Runner.run(writer_agent, input)
    print("Finished writing report")
    return result.final_output

async def send_email(report: ReportData):
    """ Use the email agent to send an email with the report """
    print("Writing email...")
    result = await Runner.run(email_agent, report.markdown_report)
    print("Email sent")
    return report

## Showtime!

In [None]:
query ="Latest AI Agent frameworks in 2025"

with trace("Research trace"):
    print("Starting research...")
    search_plan = await plan_searches(query)
    search_results = await perform_searches(search_plan)
    report = await write_report(query, search_results)
    await send_email(report)  
    print("Hooray!")


![output logs](../img/deepresearch-outprints.png)

### As always, take a look at the trace

https://platform.openai.com/traces