# Deep Research

In [1]:
from dotenv import load_dotenv
from agents import Agent, trace, Runner, WebSearchTool, ModelSettings
from pydantic import BaseModel
from pprint import pprint
import asyncio
import os
from IPython.display import display, Markdown
from pprint import pprint
import requests

In [2]:
load_dotenv(override=True)

True

In [3]:
agent = Agent(name="jokester", instructions="You are a jokester bot that tells funny jokes.", model="gpt-4.1-mini")

In [4]:
with trace("telling a joke"):
    result = await Runner.run(agent, "tell me a joke about autonomous AI agent")
    print(result.final_output)

Why did the autonomous AI agent bring a ladder to work?

Because it heard it was supposed to be *self-*driving to new heights! üöóü§ñüòÑ


trace can be found here
https://platform.openai.com/logs/trace?trace_id=trace_2e0f07a3fa4645579d4d2c5fc5fbc40c

for work laptop: https://platform.openai.com/logs?api=traces

# More complicated stuff

## Search Agent

In [31]:
INSTRUCTIONS = """You are an AI research assistant for Dwelling Health named ‚ÄúEventOutcomeLiteratureSearcher‚Äù.  
Your role: when you receive the instruction ‚Äúinitiate search‚Äù, you will perform a literature search for any **recent (past 1 year)** publications linking **environmental factors or exposures** to **health or biomedical outcomes**.

Examples of environmental factors or exposures:
- Air pollution, wildfire smoke, traffic emissions  
- Heavy metals (lead, mercury, arsenic)  
- Water contaminants (PFAS, microplastics)  
- Radiation exposure (radon, UV, EMF)  
- Climate events (heat waves, floods, hurricanes)  
- Built environment and urban noise  
- Indoor pollutants (mold, VOCs, dust)  

Examples of health or biomedical outcomes:
- Respiratory diseases (asthma, COPD, lung cancer)  
- Cardiovascular diseases (stroke, hypertension, heart disease)  
- Neurological or cognitive disorders (dementia, ADHD)  
- Reproductive or developmental outcomes (birth defects, fertility)  
- Metabolic or endocrine effects (obesity, diabetes)  
- Immune and inflammatory responses  
- Mental health impacts (depression, anxiety)  

Inputs when triggered:
- No additional inputs required beyond ‚Äúinitiate search‚Äù.

Outputs:
Provide a numbered list of relevant studies. For each study include:
1. Title  
2. Authors  
3. Year  
4. Source (PubMed, arXiv, bioRxiv, medRxiv, Other)  
5. One-line summary of the paper‚Äôs focus  

If no relevant studies published in the past year are found, respond exactly:
‚ÄúNo relevant studies found in the past year.‚Äù

Guardrails:
Do:
- Restrict results to publications in the **past 12 months**.
- Use the structured output format (numbered list, each entry with the five fields).
- Prefer high-impact or peer-reviewed biomedical literature when available.
- Provide a confidence score (0.0‚Äì1.0) with each study or overall.

Don‚Äôt:
- Fabricate or hallucinate any papers or details.
- Output narrative text or commentary beyond the structured list.
- Include personally identifying information beyond author names as published.

"""

In [32]:
search_agent = Agent(name="search_agent", instructions=INSTRUCTIONS, tools=[WebSearchTool(search_context_size="low")], model="gpt-4.1-mini", model_settings=ModelSettings(tool_choice="required"))

In [33]:
message = "intiate search"
with trace("Search"):
    result = await Runner.run(search_agent, message)

print(result.final_output)

Here are recent studies from the past year linking environmental factors to health outcomes:

1. **Title:** Burning of fossil fuels caused 1,500 deaths in recent European heat wave, study estimates
   - **Authors:** Not specified
   - **Year:** 2025
   - **Source:** AP News
   - **Summary:** A study estimated that approximately 1,500 deaths during a recent European heat wave were due to human-caused climate change, marking the first direct link between fossil fuel combustion and specific mortality rates in a single weather event. ([apnews.com](https://apnews.com/article/08421987a1ff7e0de4aac7278a41da21?utm_source=openai))

2. **Title:** Blood tests show highest levels of forever chemicals in those living near New Mexico plume
   - **Authors:** Not specified
   - **Year:** 2025
   - **Source:** AP News
   - **Summary:** A state-funded study in New Mexico revealed alarmingly high levels of PFAS (per- and polyfluoroalkyl substances) in residents' bloodstreams near a contamination plume fr

## Planner

Given a query, what are the 5 most relevant websearches that could be run. Structured outputs would ensure that we get what we need

In [20]:
HOW_MANY_SEARCHES = 5
INSTRUCTIONS = f"You are a helpful research assistant. Given a query, come up with a set of web searches to perform to best answer the query. Output a list of {HOW_MANY_SEARCHES} terms to query"

In [22]:
# use pydantic to define the output
class WebSearchItem(BaseModel):
    reason: str
    "Your reason for why this search is important"

    query: str
    "The search term to use for websearch"

class WebSearchPlan(BaseModel):
    searches: list[WebSearchItem]

# pass this object to ensure the output is correct

planner_agent = Agent(name= "PlannerAgent",
instructions=INSTRUCTIONS,
model = 'gpt-4.1-mini',
output_type=WebSearchPlan,
)

In [26]:
message = "What are the latest studies on lead exposure?"
with trace("Search"):
    result = await Runner.run(planner_agent, message)
    print(result.final_output)

searches=[WebSearchItem(reason='To find the most recent scientific studies and research articles on lead exposure', query='latest research studies on lead exposure 2024'), WebSearchItem(reason='To explore recent health impact findings related to lead exposure', query='recent health effects of lead exposure studies'), WebSearchItem(reason='To identify new guidelines and safety standards regarding lead exposure', query='2024 lead exposure safety standards and guidelines'), WebSearchItem(reason='To discover recent epidemiological studies on populations affected by lead exposure', query='recent epidemiological studies on lead exposure'), WebSearchItem(reason='To find reviews summarizing advancements and findings in lead exposure research', query='latest review articles on lead exposure research 2024')]


# Writer Agent

In [35]:
INSTRUCTIONS = """
You are a health communication assistant for Dwelling Health. 
Your role is to help everyday users understand new scientific findings about how environmental exposures affect human health.

When provided with links or summaries of recent research papers, read and synthesize them into short, engaging write-ups written in plain, accessible English. 
Assume the reader has limited background in biology or chemistry.

Each write-up should include:
1. A brief, easy-to-understand summary of what the study found and why it matters.
2. A one-sentence takeaway that connects the finding to daily life or personal choices ‚Äî this is the "action" section. 
   Example: ‚ÄúIf you live in an older home, consider testing your tap water for lead,‚Äù or ‚ÄúTry to reduce the use of plastic containers for hot food.‚Äù
3. Keep the tone clear, factual, and empowering ‚Äî avoid alarmist or overly technical language.
4. Do not make unsupported medical claims or overstate certainty. If findings are early or mixed, say so clearly (e.g., ‚ÄúScientists are still learning how microplastics may affect health.‚Äù).
5. Start with "as per the latest research by <Authors> in a report/paper titled <Title> published in <Year>,"
Output should be formatted as a short paragraph (3‚Äì5 sentences) per paper, suitable for an in-app pop-up or news card. 
Whenever possible, include the paper title and year in parentheses at the beginning to show credibility.
"""


In [36]:
class ReportData(BaseModel):
    short_summary: str
    """A short 2-3 sentence summary of the report/paper"""

    bulletin: str
    """Final message to the user"""

writer_agent = Agent(name="WriterAgent", instructions=INSTRUCTIONS, model="gpt-4.1-mini", output_type=ReportData)

In [37]:
# orchestrate a deep research where the documents retrieved through search agent go to the planner and then to the writer agent

async def plan_searches(query: str):
    """ Use the planner_agent to plan which searches to run for the query """
    print("Planning searches...")
    result = await Runner.run(planner_agent, f"Query: {query}")
    print(f"Will perform {len(result.final_output.searches)} searches")
    return result.final_output

async def perform_searches(search_plan: WebSearchPlan):
    """ Call search() for each item in the search plan """
    print("Searching...")
    tasks = [asyncio.create_task(search(item)) for item in search_plan.searches]
    results = await asyncio.gather(*tasks)
    print("Finished searching")
    return results

async def search(item: WebSearchItem):
    """ Use the search agent to run a web search for each item in the search plan """
    input = f"Search term: {item.query}\nReason for searching: {item.reason}"
    result = await Runner.run(search_agent, input)
    return result.final_output

async def write_report(papers: list[str]):
    """ Use the writer agent to write a report from the papers """
    input = f"Papers: {papers}"
    result = await Runner.run(writer_agent, input)
    return result.final_output

In [38]:
# run the deep research
query = "What are the latest studies on lead exposure?"
search_plan = await plan_searches(query)
papers = await perform_searches(search_plan)
report = await write_report(papers)
print(report)


Planning searches...
Will perform 5 searches
Searching...
Finished searching
short_summary="As per the latest research by multiple scientists between 2024 and 2025, lead exposure continues to have serious health impacts across life stages and populations. Studies (e.g., by Aaron Reuben and Katherine Svensson) link childhood lead exposure to mental health disorders, faster memory loss, and long-term brain health issues. Economic analyses reveal billions in lost productivity in low- and middle-income countries due to lead's effects on children. Research also ties lead exposure to increased stroke risk and cardiovascular deaths. Advances in detection technology, like Purdue's portable analyzers, are improving quick and precise lead exposure testing. Regulatory updates from agencies including EPA and FDA aim to tighten safety standards for lead in paint dust, water, and baby foods, reflecting ongoing efforts to reduce lead-related health risks. However, lead exposure remains widespread, es