## Medical Research Agent

# Overview

The Medical Research Agent is an AI-powered system designed to process complex medical text and provide clear and structured insights.

It can help with understanding:

Medical Conditions- diseases, disorders and related issues

Medicines- drugs, treatments and their uses

Symptoms- what they may indicate and when to be cautious

Treatments- available options and important considerations

# How It Works

Text Understanding- The agent simplifies complex medical text into clear summaries.

Information Extraction- It identifies symptoms, causes, and treatment details from the input.

Web Retrieval (RAG)- The system fetches verified medical definitions using AI-powered web search.

Structured Report Generation- All information is combined into a clean and easy-to-read medical report.

# Medical Disclaimer

This tool is for educational and informational purposes only.
It should not be used as a substitute for professional medical advice, diagnosis, or treatment.
Always consult a qualified healthcare provider for medical concerns.

## Installation

In this step, we will install all the required packages:

In [12]:
%%capture --no-stderr
%pip install --quiet -U langgraph langchain_openai langchain_community langchain_core tavily-python langchain-tavily wikipedia

## Setting up the API Keys

So for this project, we will need these 2 API keys:

1. **OpenAI API Key**: I got this from https://platform.openai.com/api-keys
2. **Tavily API Key**: I got this from https://tavily.com

In [13]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")
_set_env("TAVILY_API_KEY")

## Importing Dependencies


In [14]:
from typing import List, Annotated
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, get_buffer_string
from langchain_tavily import TavilySearch
from langchain_community.document_loaders import WikipediaLoader
from IPython.display import display, HTML, Image
import operator

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Initialize Tavily Search
tavily_search = TavilySearch(max_results=3)

print("All dependencies are loaded successfully!")

All dependencies are loaded successfully!


## Define Medical Analyst Models

We will create specialized medical analysts:

In [15]:
class MedicalAnalyst(BaseModel):
    """Medical specialist analyst"""
    affiliation: str = Field(description="Medical affiliation or specialty")
    name: str = Field(description="Name of the medical analyst")
    role: str = Field(description="Medical role or specialty area")
    description: str = Field(description="Focus area, concerns, and medical expertise")

    @property
    def persona(self) -> str:
        return f"Name: {self.name}\nRole: {self.role}\nAffiliation: {self.affiliation}\nDescription: {self.description}\n"

class MedicalPerspectives(BaseModel):
    analysts: List[MedicalAnalyst] = Field(
        description="List of medical analysts with their specialties"
    )

class GenerateAnalystsState(TypedDict):
    topic: str  # Medical topic or condition
    max_analysts: int  # Number of analysts
    human_analyst_feedback: str  # Human feedback
    analysts: List[MedicalAnalyst]  # Generated analysts

print("Medical analyst models defined!")

Medical analyst models defined!


## Create Medical Analysts

Generating specialized medical analysts for different aspects of the condition:

In [16]:
analyst_instructions = """You are tasked with creating a set of medical specialist personas to research a health topic.

1. Review the medical topic: {topic}

2. Consider any feedback: {human_analyst_feedback}

3. Determine the most important medical perspectives (symptoms, treatments, prevention, prognosis, causes, etc.)

4. Create {max_analysts} medical specialists, each focusing on a different aspect.

Example specialists:
- Symptomatologist (focuses on symptoms and diagnosis)
- Treatment Specialist (focuses on treatment options)
- Prevention Expert (focuses on prevention and risk factors)
- Pharmacologist (focuses on medications)
"""

def create_medical_analysts(state: GenerateAnalystsState):
    """Create medical analyst personas"""
    topic = state['topic']
    max_analysts = state['max_analysts']
    human_analyst_feedback = state.get('human_analyst_feedback', '')

    structured_llm = llm.with_structured_output(MedicalPerspectives)
    system_message = analyst_instructions.format(
        topic=topic,
        human_analyst_feedback=human_analyst_feedback,
        max_analysts=max_analysts
    )

    analysts = structured_llm.invoke([
        SystemMessage(content=system_message),
        HumanMessage(content="Generate the medical specialist analysts.")
    ])

    return {"analysts": analysts.analysts}

def should_continue(state: GenerateAnalystsState):
    """Check if we should continue or end"""
    if state.get('human_analyst_feedback', None):
        return "create_analysts"
    return END

# Build analyst generation graph
builder = StateGraph(GenerateAnalystsState)
builder.add_node("create_analysts", create_medical_analysts)
builder.add_edge(START, "create_analysts")
builder.add_conditional_edges("create_analysts", should_continue, ["create_analysts", END])

memory = MemorySaver()
analyst_graph = builder.compile(checkpointer=memory)

print("Medical analyst generation system is ready!")

Medical analyst generation system is ready!


## Medical Interview System

Setting up the interview system where the analysts can collect information from medical experts:

In [17]:
from langgraph.graph import MessagesState

class InterviewState(MessagesState):
    max_num_turns: int  # Number of interview turns
    context: Annotated[list, operator.add]  # Web search results
    analyst: MedicalAnalyst  # The analyst
    interview: str  # Interview transcript
    sections: list  # Final sections

class SearchQuery(BaseModel):
    search_query: str = Field(description="Medical search query for web research")

print("Interview state models are defined!")

Interview state models are defined!


## Generation of Questions

The Analysts will generate many relevant questions to ask medical experts:

In [18]:
question_instructions = """You are a medical analyst interviewing an expert about a health topic.

Your goal is to gather specific, evidence-based medical insights.

1. Focus on: {goals}

2. Ask specific questions about:
   - Clinical evidence and research
   - Treatment efficacy and safety
   - Patient outcomes
   - Current medical guidelines

3. Avoid vague questions - be specific and clinical

Begin by introducing yourself, then ask your medical question.

When satisfied, end with: "Thank you so much for your help!"
"""

def generate_question(state: InterviewState):
    """Generate analyst question"""
    analyst = state["analyst"]
    messages = state["messages"]

    system_message = question_instructions.format(goals=analyst.persona)
    question = llm.invoke([SystemMessage(content=system_message)] + messages)

    return {"messages": [question]}

print("The Question generation process is ready!")

The Question generation process is ready!


## Web Search Functions

Now, we will search the web and Wikipedia for getting medical information:

In [19]:

search_instructions = SystemMessage(content="""Given a medical conversation, generate a search query to find evidence-based medical information.

Focus on:
- Peer-reviewed medical sources
- Clinical guidelines
- Medical research
- Reputable health organizations

Create a precise medical search query based on the conversation.""")

def search_web(state: InterviewState):
    """Search web for medical information"""
    structured_llm = llm.with_structured_output(SearchQuery)
    search_query = structured_llm.invoke([search_instructions] + state['messages'])

    # Perform web search
    try:
        search_results = tavily_search.invoke(search_query.search_query)

        # Handle different response formats
        if isinstance(search_results, list):
            search_docs = search_results
        elif isinstance(search_results, dict):
            search_docs = search_results.get("results", [])
        else:
            search_docs = []

        # Format results
        if search_docs:
            formatted_search_docs = "\n\n---\n\n".join([
                f'<Document href="{doc.get("url", "N/A")}"/>\n{doc.get("content", doc.get("snippet", ""))}\n</Document>'
                for doc in search_docs
                if isinstance(doc, dict)
            ])
        else:
            formatted_search_docs = "No search results found."

    except Exception as e:
        print(f"Search error: {e}")
        formatted_search_docs = f"Search error occurred: {str(e)}"

    return {"context": [formatted_search_docs]}

def search_wikipedia(state: InterviewState):
    """Search Wikipedia for medical information"""
    structured_llm = llm.with_structured_output(SearchQuery)
    search_query = structured_llm.invoke([search_instructions] + state['messages'])

    try:
        # Search Wikipedia
        search_docs = WikipediaLoader(query=search_query.search_query, load_max_docs=2).load()

        # Format results
        if search_docs:
            formatted_search_docs = "\n\n---\n\n".join([
                f'<Document source="{doc.metadata.get("source", "Wikipedia")}" page="{doc.metadata.get("page", "")}"/>\n{doc.page_content}\n</Document>'
                for doc in search_docs
            ])
        else:
            formatted_search_docs = "No Wikipedia results found."

    except Exception as e:
        print(f"Wikipedia search error: {e}")
        formatted_search_docs = f"Wikipedia search error: {str(e)}"

    return {"context": [formatted_search_docs]}

print(" Fixed search functions loaded!")
print(" Now I will need to rebuild the interview graph - so running the 'Build Interview Graph' cell again!")

 Fixed search functions loaded!
 Now I will need to rebuild the interview graph - so running the 'Build Interview Graph' cell again!


## Generating Medical Expert Answers

The medical expert uses web-searched information to answer\ questions:

In [20]:
answer_instructions = """You are a medical expert being interviewed.

Analyst focus: {goals}

Use ONLY the provided medical sources to answer: {context}

Guidelines:
1. Base answers on evidence from the provided sources
2. Cite sources using [1], [2], etc.
3. Be specific and clinical
4. Note limitations if sources are insufficient
5. List sources at the end

Provide accurate, evidence-based medical information."""

def generate_answer(state: InterviewState):
    """Generate expert answer"""
    analyst = state["analyst"]
    messages = state["messages"]
    context = state["context"]

    system_message = answer_instructions.format(goals=analyst.persona, context=context)
    answer = llm.invoke([SystemMessage(content=system_message)] + messages)
    answer.name = "medical_expert"

    return {"messages": [answer]}

def save_interview(state: InterviewState):
    """Save interview transcript"""
    messages = state["messages"]
    interview = get_buffer_string(messages)
    return {"interview": interview}

def route_messages(state: InterviewState, name: str = "medical_expert"):
    """Route between questions and answers"""
    messages = state["messages"]
    max_num_turns = state.get('max_num_turns', 2)

    # Count expert responses
    num_responses = len([m for m in messages if isinstance(m, AIMessage) and m.name == name])

    if num_responses >= max_num_turns:
        return 'save_interview'

    # Check for end signal
    if len(messages) >= 2:
        last_question = messages[-2]
        if "Thank you so much for your help" in last_question.content:
            return 'save_interview'

    return "ask_question"

print("Answer generation system ready!")

Answer generation system ready!


## Building the Interview Graph

In [21]:
from langgraph.graph import StateGraph

# Building the interview graph
interview_builder = StateGraph(InterviewState)
interview_builder.add_node("ask_question", generate_question)
interview_builder.add_node("search_web", search_web)
interview_builder.add_node("search_wikipedia", search_wikipedia)
interview_builder.add_node("answer_question", generate_answer)
interview_builder.add_node("save_interview", save_interview)

# Defining the flow
interview_builder.add_edge(START, "ask_question")
interview_builder.add_edge("ask_question", "search_web")
interview_builder.add_edge("ask_question", "search_wikipedia")
interview_builder.add_edge("search_web", "answer_question")
interview_builder.add_edge("search_wikipedia", "answer_question")
interview_builder.add_conditional_edges("answer_question", route_messages, ["ask_question", "save_interview"])
interview_builder.add_edge("save_interview", END)

interview_graph = interview_builder.compile()

print("Interview graph is compiled!")

Interview graph is compiled!


## Report Generation

Generating the final medical report from all the interviews:

In [22]:
section_writer_instructions = """You are a medical writer creating a section of a medical report.

Topic: {topic}

Your focus: {focus}

Interview transcript: {interview}

Create a well-structured section with:
1. Clear medical terminology
2. Evidence-based information
3. Proper citations [1], [2], etc.
4. Clinical relevance

Write in a professional medical style."""

def write_section(interview: str, analyst: MedicalAnalyst, topic: str):
    """Write a section based on interview"""
    system_message = section_writer_instructions.format(
        topic=topic,
        focus=analyst.description,
        interview=interview
    )
    section = llm.invoke([SystemMessage(content=system_message)])
    return section.content

report_writer_instructions = """You are a medical report compiler creating a comprehensive medical document.

Topic: {topic}

Sections from specialists:
{sections}

Create a comprehensive medical report with:

# {topic}: Complete Medical Overview

## Executive Summary
[Brief overview]

## Detailed Analysis
[Integrate all sections logically]

## Key Takeaways
[Important points]

## Sources
[All citations]

**MEDICAL DISCLAIMER**: This information is for educational purposes only...
"""

def compile_report(topic: str, sections: List[str]):
    """Compile final medical report"""
    formatted_sections = "\n\n".join(sections)
    system_message = report_writer_instructions.format(
        topic=topic,
        sections=formatted_sections
    )
    report = llm.invoke([SystemMessage(content=system_message)])
    return report.content

print("The Report generation system is ready!")

The Report generation system is ready!


## Running the Medical Research

So we will research a medical topic here!
We can change the `medical_topic` below:

In [23]:
medical_topic = "Type 2 Diabetes Mellitus"  # We can change this to any medical condition
num_analysts = 3  # Number of specialist analysts
num_interview_turns = 2  # Questions per analyst

print(f" Researching: {medical_topic}")
print(f" Analysts: {num_analysts}")
print(f" Interview turns: {num_interview_turns}\n")

 Researching: Type 2 Diabetes Mellitus
 Analysts: 3
 Interview turns: 2



### Step 1: Generating the Medical Analysts


In [24]:
# Generate analysts
thread = {"configurable": {"thread_id": "1"}}

for event in analyst_graph.stream(
    {"topic": medical_topic, "max_analysts": num_analysts},
    thread,
    stream_mode="values"
):
    analysts = event.get('analysts', '')
    if analysts:
        print("\n" + "="*80)
        print(" MEDICAL SPECIALIST ANALYSTS GENERATED")
        print("="*80 + "\n")
        for analyst in analysts:
            print(f" {analyst.name}")
            print(f"   Role: {analyst.role}")
            print(f"   Affiliation: {analyst.affiliation}")
            print(f"   Focus: {analyst.description}")
            print("-" * 80)


 MEDICAL SPECIALIST ANALYSTS GENERATED

 Dr. Sarah Thompson
   Role: Symptomatologist
   Affiliation: Endocrinology Department
   Focus: Dr. Thompson specializes in the identification and analysis of symptoms associated with Type 2 Diabetes Mellitus. Her focus is on understanding the early signs of the disease, such as increased thirst, frequent urination, and unexplained weight loss, to aid in timely diagnosis. She is also concerned with the progression of symptoms and their impact on patients' quality of life.
--------------------------------------------------------------------------------
 Dr. Michael Lee
   Role: Treatment Specialist
   Affiliation: Diabetes Treatment Center
   Focus: Dr. Lee is an expert in the management and treatment of Type 2 Diabetes Mellitus. He focuses on developing personalized treatment plans that include lifestyle modifications, oral medications, and insulin therapy. His expertise extends to the latest advancements in diabetes management technologies and

### Step 2: Conducting Interviews with Web Research

In [25]:
import wikipedia
# Getting the analysts from state
final_state = analyst_graph.get_state(thread)
analysts = final_state.values.get('analysts')

# Conducting the interviews
interviews = []

for analyst in analysts:
    print(f"\n Starting the interview with {analyst.name}...")
    print(f" Focus: {analyst.description}\n")

    # Running the interview
    interview_state = {
        "analyst": analyst,
        "messages": [],
        "max_num_turns": num_interview_turns,
        "context": []
    }

    for event in interview_graph.stream(interview_state, stream_mode="updates"):
        node_name = next(iter(event.keys()))
        print(f" {node_name}...")

    # Getting the final state
    interview_result = interview_graph.invoke(interview_state)
    interviews.append(interview_result.get('interview', ''))
    print(f" Interview with {analyst.name} complete!\n")

print("\n" + "="*80)
print("ALL INTERVIEWS ARE COMPLETED")
print("="*80)


 Starting the interview with Dr. Sarah Thompson...
 Focus: Dr. Thompson specializes in the identification and analysis of symptoms associated with Type 2 Diabetes Mellitus. Her focus is on understanding the early signs of the disease, such as increased thirst, frequent urination, and unexplained weight loss, to aid in timely diagnosis. She is also concerned with the progression of symptoms and their impact on patients' quality of life.

 ask_question...
 search_wikipedia...
 search_web...
 answer_question...
 save_interview...
 Interview with Dr. Sarah Thompson complete!


 Starting the interview with Dr. Michael Lee...
 Focus: Dr. Lee is an expert in the management and treatment of Type 2 Diabetes Mellitus. He focuses on developing personalized treatment plans that include lifestyle modifications, oral medications, and insulin therapy. His expertise extends to the latest advancements in diabetes management technologies and their integration into patient care.

 ask_question...
 searc