# Building an **AI Research Assistant** with the OpenAI Agents SDK

This notebook provides a reference patterns for implementing a multi‑agent AI Research Assistant that can plan, search, curate, and draft high‑quality reports with citations.

While the Deep Research feature is avaialble in ChatGPT, however, individual and companies may want to implement their own API based solution for a more finegrained control over the output.

With support for Agents, and built-in tools such as Code Interpreter, Web Search, and File Search, - Responses API makes building your own Research Assistant fast and easy. 

## Table of Contents
1. [Overview](#overview)
2. [Solution Workflow](#workflow)
3. [High‑Level Architecture](#architecture)
4. [Agent Definitions (Pseudo Code)](#agents)
    * Research Planning Agent
    * Web Search Agent
    * Knowledge Assistant Agent
    * Report Creation Agent
    * Data Analysis Agent (optional)
    * Image‑Gen Agent (optional)
5. [Guardrails & Best Practices](#best-practices)
6. [Risks & Mitigation](#risks)

### 1 — Overview <a id='overview'></a>
The AI Research Assistant helps drives better research quality and faster turnaround for knowledge content.

1. **Performs autonomous Internet research** to gather the most recent sources.
2. **Incorporates internal data sources** such as a Company's proprietery knowledge sources. 
3. **Reduces analyst effort from days to minutes** by automating search, curation and first‑draft writing.
4. **Produces draft reports with citations** and built‑in hallucination detection.

### 2 — Solution Workflow <a id='workflow'></a>
The typical workflow consists of five orchestrated steps: 

| Step | Purpose | Model |
|------|---------|-------|
| **Query Expansion** | Draft multi‑facet prompts / hypotheses | `gpt‑4o` |
| **Search‑Term Generation** | Expand/clean user query into rich keyword list | `gpt‑4o` |
| **Conduct Research** | Run web & internal searches, rank & summarise results | `gpt‑4o` + tools |
| **Draft Report** | Produce first narrative with reasoning & inline citations | `o1` / `gpt‑4o` |
| **Report Expansion** | Polish formatting, add charts / images / appendix | `gpt‑4o` + tools |

### 3 — High‑Level Architecture <a id='architecture'></a>
The following diagram groups agents and tools:

* **Research Planning Agent** – interprets the user request and produces a research plan/agenda.
* **Knowledge Assistant Agent** – orchestrates parallel web & file searches via built‑in tools, curates short‑term memory.
* **Web Search Agent(s)** – perform Internet queries, deduplicate, rank and summarise pages.
* **Report Creation Agent** – consumes curated corpus and drafts the structured report.
* **(Optional) Data Analysis Agent** – executes code for numeric/CSV analyses via the Code Interpreter tool.
* **(Optional) Image‑Gen Agent** – generates illustrative figures.

Input/output guardrails wrap user prompts and final content for policy, safety and citation checks.

### 4 — Pre-requisites <a id='pre-requisites'></a>

Create a virual environment  

Install dependencies 

In [None]:
%pip install openai openai-agents --quiet

### 5 — Agents (Pseudo Code) <a id='agents'></a>
Below are skeletal class definitions illustrating how each agent’s policy and tool‑usage might look.

#### Step 1 - Query Expansion

The query expansion step ensures the subsequent agents conducting research have sufficient context of user's inquiry. 

The first step is to understand user's intent, and make sure the user has provided sufficinet details for subsequent agents to search the web, build a knowledge repository, and prepare a deepdive report. The `query_expansion_agent.py` accomplishes this with the prompt that outlines minimum information needed from the user to generate a report. This could include timeframe, industry, target audience, etc. The prompt can be tailored to the need of your deepresearch assistant. The agent will put a `is_task_clear` yes or no, when its no, it would prompt the user with additional questions, if sufficent information is available, it would output the expanded prompt. 

This is also an opportunity to enforce input guardrails for any research topics that you'd like to restrict the user from reserarching based on your usage policies. 

##### Input Guardrails with Agents SDK 
Let's assume our ficticious guardrail is to prevent the user from generating a non-AI releated topic report. For this we will define a guardrail agent. The guardrail agent `topic_guradrail.py` checks whether the topic is related to AI, if not, it raises an execption. The function `ai_topic_guardrail` is passed to the `QueryExpansionAgent()` as `input_guardrails`

In [3]:
from ai_research_assistant_resources.agents_tools_registry.query_expansion_agent import QueryExpansionAgent
from agents import InputGuardrailTripwireTriggered

query_expansion_agent_guardrail_check = QueryExpansionAgent()

try:

    result = await query_expansion_agent_guardrail_check.task("Write a research report on the latest trends in luxury goods market")

except InputGuardrailTripwireTriggered as e:
    reason = e.guardrail_result.output.output_info.reasoning
    #            └─────┬─────┘
    #            GuardrailFunctionOutput
    print("🚫 Guardrail tripped – not an AI topic:", reason)


🚫 Guardrail tripped – not an AI topic: The request is about trends in the luxury goods market, which is not focused on artificial intelligence.


In [4]:
from ai_research_assistant_resources.agents_tools_registry.query_expansion_agent import QueryExpansionAgent

query_expansion_agent = QueryExpansionAgent()

# Initial prompt to the agent
prompt: str = "Draft a research report on the latest trends in AI developments"
expanded_query = "" 

try: 

    while True:
        # Execute the agent with the current prompt
        result = await query_expansion_agent.task(prompt)

        # When the task is clear, show the expanded query and exit.
        if result.is_task_clear == "yes":
            expanded_query = result.expanded_query
            print("\nExpanded query:\n", expanded_query)
            break

        # Otherwise, display the clarifying questions and ask the user for input.
        print("\nThe task is not clear. The agent asks:\n", result.questions)
        prompt = input("Please provide the missing details so I can refine the query: ")
        print("\n")
        print("user input: ", prompt)
        

except Exception as e:
    print("Non-AI topic guardrail tripped!", e)


The task is not clear. The agent asks:
 Could you please specify the timeframe you have in mind for the research report (e.g., current year, last 5 years, or another period)? Additionally, should the report focus on any specific geographic region or subfields within AI developments (e.g., machine learning, natural language processing) or cover the topic broadly?


user input:  within the last 1 year, in the US and around ehtical AI development 

Expanded query:
 Draft a research report that examines the latest trends in ethical AI development within the United States over the last year, providing an analysis of emerging practices, challenges, and regulatory considerations unique to this timeframe and region.


#### Step 2 - Web Search Terms 

Conducting Web search is typically an integral part of the deep research process. First we generate web search terms relevant to the research report. In the next step we will search the web and build a knowledge repository of the data.

The `WebSearchTermsGenerationAgent` takes as input the the expanded prompt, and generates succient search terms. You can structure the search term generation prompt according to your user's typical requirements such as include adjacent industries in the search terms, include competitors, etc. Additionally, you can also control how much data you want to gather e.g., number of search terms to generate. In our case, we will limit to 3 search terms. 

In [5]:
placeholder_query = "Draft a research report that examines the latest trends in ethical AI development within the United States over the last year, providing an analysis of emerging practices, challenges, and regulatory considerations unique to this timeframe and region."

from ai_research_assistant_resources.agents_tools_registry.web_search_terms_generation_agent import WebSearchTermsGenerationAgent

search_terms_agent = WebSearchTermsGenerationAgent(3)

result = await search_terms_agent.task(placeholder_query)

search_terms_raw = result

for i, query in enumerate(search_terms_raw.Search_Queries, start=1):
    print(f"{i}. {query}")

1. Ethical AI development trends USA 2025
2. Challenges in AI ethics and regulations in 2025
3. Emerging AI practices and legal considerations in the US 2025


#### Step 3 - Scroll the Web build a inventory of data sources 

We will use custom web search to identify and knowledge content to form the baseline for our report. You can learn more about building custom web search and retreival here. [Building a Bring Your Own Browser (BYOB) Tool for Web Browsing and Summarization](https://cookbook.openai.com/examples/third_party/web_search_with_google_api_bring_your_own_browser_tool). You will also need a Google Custom Search API key and Custom Search Engine ID (CSE ID) in a .env file at the root. 

NOTE: The reason for using custom web search is provide more finegrained control over which information is retreived, and guardrails such as excluding competitor's content from your report. 

This is a 3 step process: 

1. Obtain the search results (top 10 pages)
2. Scroll the pages, and summarize the key points 
3. Output guardrails to weedout irrelevant or undesirable results (e.g., the timeframe of the content doesn't align with user's need, or mentions a competitor)

prerequisite pip install nest_asyncio

In [7]:
from ai_research_assistant_resources.utils.web_search_and_util import get_results_for_search_term
import json
from dotenv import load_dotenv
import os

load_dotenv('.env')

api_key = os.getenv('API_KEY')
cse_id = os.getenv('CSE_ID')

if not api_key or not cse_id:
    raise ValueError("API_KEY and CSE_ID must be set as environment variables or in a .env file")

research_results = []

for i, query in enumerate(search_terms_raw.Search_Queries, start=1):
    print(f"{i}. {query}")
    results =  get_results_for_search_term(query)
    research_results.append(results)

# Pretty-print the JSON response (or a friendly message if no results).
if results:
    # Write results to a file
    with open("research_results.json", "w", encoding="utf-8") as f:
        json.dump(research_results, f, indent=2, ensure_ascii=False)
    print("Results written to research_results.json")
else:
    print("No results returned. Check your API credentials or search term.")

1. Ethical AI development trends USA 2025
2. Challenges in AI ethics and regulations in 2025
3. Emerging AI practices and legal considerations in the US 2025
Results written to research_results.json


### Step-4: 

### 5 — Guardrails & Best Practices <a id='best-practices'></a>
* **Crawl → Walk → Run**: start with a single agent, then expand into a swarm. 
* **Expose intermediate reasoning** (“show the math”) to build user trust. 
* **Parameterise UX** so analysts can tweak report format and source mix. 
* **Native OpenAI tools first** (web browsing, file ingestion) before reinventing low‑level retrieval. 

In [None]:
from ai_research_assistant_resources.utils.web_search_and_util import get_results_for_search_term
import json
from dotenv import load_dotenv
import os
import asyncio
import nest_asyncio

load_dotenv('.env')

api_key = os.getenv('API_KEY')
cse_id = os.getenv('CSE_ID')

if not api_key or not cse_id:
    raise ValueError("API_KEY and CSE_ID must be set as environment variables or in a .env file")
else:
    print("API_KEY and CSE_ID are: ", api_key, cse_id)

nest_asyncio.apply()
results = asyncio.run(get_results_for_search_term("AI Trends"))

# Pretty-print the JSON response (or a friendly message if no results).
if results:
    print(json.dumps(results, indent=2))
else:
    print("No results returned. Check your API credentials or search term.")

API_KEY and CSE_ID are:  AIzaSyCQH3GUXJwnqOmvBp9U12P54eScvMJLH7c 50c7decc940664df9


TypeError: An asyncio.Future, a coroutine or an awaitable is required

### 6 — Risks & Mitigation <a id='risks'></a>
| Pitfall | Mitigation |
|---------|------------|
| Scope‑creep & endless roadmap | Narrow MVP & SMART milestones | fileciteturn1file4L23-L24 |
| Hallucinations & weak guardrails | Golden‑set evals, RAG with citation checks | fileciteturn1file4L25-L26 |
| Run‑away infra costs | Cost curve modelling; efficient models + autoscaling | fileciteturn1file4L27-L28 |
| Talent gaps | Upskill & leverage Agents SDK to offload core reasoning | fileciteturn1file4L29-L30 |