## ❓ Testing Questions

### 🧠 `research_paper` — Mentions academic/research sources  
These should all be classified as **`research_paper`**:

- "Generate a report using academic studies on climate change."
- "Summarize the latest findings from research papers on AI ethics."
- "Prove your answer with scholarly articles."
- "I want evidence from research papers on the effectiveness of remote learning."
- "Provide insights from peer-reviewed studies about mental health."
- "Explain using research papers how sleep affects productivity."

---

### 📄 `report_generation` — General requests without needing academic sources  
These should be classified as **`report_generation`**:

- "Write a report on the history of the internet."
- "Summarize the impact of social media on teenagers."
- "Generate a report about the benefits of exercise."
- "Create a short report on global warming for school."
- "Write a brief summary of renewable energy technologies."

> Even though these sound formal, none ask for academic or research-backed material — so they should fall under `report_generation`.

---

### 🌐 `google_search` — Basic, factual, or current-event questions  
These should be classified as **`google_search`**:

- "Who is the current Prime Minister of Canada?"
- "What is the population of Tokyo?"
- "Best smartphones under $500 in 2025."
- "When was the iPhone 16 released?"
- "Where is the Eiffel Tower located?"

In [None]:
import os
import sys

sys.path.append(os.path.abspath(".."))

In [None]:
import requests
import xml.etree.ElementTree as ET
from scholarly import scholarly

  m = re.search("cites=[\d+,]*", object["citedby_url"])


In [None]:
from dotenv import load_dotenv
from typing import Literal
from pydantic import BaseModel, Field
from autogen import AssistantAgent
from utils.config_loader import load_config # yaml loader
from langchain_core.tools import tool
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import OllamaLLM

In [7]:
load_dotenv()

True

### Models Loaders

In [8]:
class ConfigLoader:
    def __init__(self):
        print("Loading config...")
        self.config = load_config()

    def __getitem__(self, key):
        return self.config[key]

In [9]:
class ModelLoader(BaseModel): # BaseModel helps with data validation, settings management, and more 
    # it can only be one of these exact strings: "groq", "openai", "ollama-deepseek", "ollama-llama3", or "ollama-mistral"
    # It acts like a validation check: if you try to create a ModelLoader instance with some other string as model_key, Pydantic will raise an error.
    model_key: Literal[
        "groq", 
        "ollama-deepseek", 
        "ollama-llama3", 
        "ollama-mistral"
    ] = "ollama-llama3" # default is ollama-llama3

    config: ConfigLoader = Field(default_factory=ConfigLoader, exclude=True)

    class Config:
        arbitrary_types_allowed = True

    def load_llm(self):
        print("LLM loading...")
        print(f"Loading model with config key: {self.model_key}")

        # Read provider and model_name dynamically from config
        provider = self.config["llm"][self.model_key]["provider"]
        model_name = self.config["llm"][self.model_key]["model_name"]

        if provider == "groq":
            groq_api_key = os.getenv("GROQ_API_KEY")
            print(f"Using Groq model: {model_name}")
            return ChatGroq(model=model_name, api_key=groq_api_key)


        elif provider == "ollama":
            print(f"Using Ollama model: {model_name}")
            return OllamaLLM(model=model_name)

        else:
            raise ValueError(f"Unsupported provider: {provider}")

### tools

### 1.QueryClassifierAgent

In [46]:
from langchain_ollama.llms import OllamaLLM
from prompt_library.prompt import query_classifier_prompt
from langchain_core.tools import Tool

class QueryClassifierAgent:
    def __init__(self, model_name: str = "llama3.2:latest"):
        self.llm = OllamaLLM(model=model_name)
        self.chain = query_classifier_prompt | self.llm

    def classify(self, query: str) -> str:
        response = self.chain.invoke({"query": query})
        return response.strip()
    

def classify_query_tool(query: str) -> str:
    agent = QueryClassifierAgent()
    return agent.classify(query)

In [47]:
query_classifier_tool = Tool(
    name="query_classifier",
    description="Classifies user queries into categories like google_search, research_paper, report_generation, or unknown.",
    func=classify_query_tool,
)

In [40]:
agent = QueryClassifierAgent()

# query_google_search_1 = "What is the capital of Argentina?"
# category_1 = agent.classify(query_google_search_1)
# print("Query:", query_google_search_1)
# print("Predicted Category:", category_1)

# query_google_search_2 = "How do I reset my iPhone?"
# category_2 = agent.classify(query_google_search_2)
# print("Query:", query_google_search_2)
# print("Predicted Category:", category_2)

# query_research_paper_1 = "Can you list recent research papers on reinforcement learning?"
# category_3 = agent.classify(query_research_paper_1)
# print("Query:", query_research_paper_1)
# print("Predicted Category:", category_3)

# query_research_paper_2 = "What are the latest studies on climate change and food security?"
# category_4 = agent.classify(query_research_paper_2)
# print("Query:", query_research_paper_2)
# print("Predicted Category:", category_4)

# query_report_1 = "Generate a report on the growth of the electric vehicle market in India."
# category_5 = agent.classify(query_report_1)
# print("Query:", query_report_1)
# print("Predicted Category:", category_5)

# query_report_2 = "Summarize a report about cyber threats in healthcare."
# category_6 = agent.classify(query_report_2)
# print("Query:", query_report_2)
# print("Predicted Category:", category_6)

# query_unknown_1 = "Tell me a joke about quantum mechanics."
# category_7 = agent.classify(query_unknown_1)
# print("Query:", query_unknown_1)
# print("Predicted Category:", category_7)

query_unknown_2 = "What's your favorite programming language?"
category_8 = agent.classify(query_unknown_2)
print("Query:", query_unknown_2)
print("Predicted Category:", category_8)

Query: What's your favorite programming language?
Predicted Category: google_search


### Architecture

USER
 │
 ▼
┌────────────────────┐
│ QueryClassifierAgent│
└────────────────────┘
        │
        ├───────────────┬────────────────────┬────────────────────────────┐
        │               │                    │                            │
        ▼               ▼                    ▼                            ▼
"google_search"   "report_generation"   "research_paper"             "unknown"
        │               │                    │                            │
        │               ▼                    ▼                            ▼
        │    ┌────────────────────┐   ┌────────────────────┐     ┌────────────────────┐
        │    │QuestionDecomposerAgent│   │ResearchClassifierAgent│     │ UnknownHandlerAgent │
        │    └────────────────────┘   └────────────────────┘     └────────────────────┘
        │               │                    │
        │               ▼                    ├────────────┬───────────────┬─────────────┐
        │    ┌────────────────────┐   ▼            ▼               ▼             ▼
        │    │ GoogleSearchAgent  │ "preprint"  "biomedical"  "multidisciplinary"
        │    └────────────────────┘    │            │               │
        │               │              ▼            ▼               ▼
        │               │    ┌────────────────┐ ┌────────────────┐ ┌────────────────────────┐
        │               │    │ PreprintAgent  │ │ BiomedicalAgent│ │ MultidisciplinaryAgent │
        │               │    └────────────────┘ └────────────────┘ └────────────────────────┘
        │               │         │              │                    │
        │               │         └──────┬───────┴────────┬───────────┘
        │               │                ▼                ▼
        │               │      ┌────────────────────────────┐
        │               │      │   ReportGeneratorAgent     │
        │               │      └────────────────────────────┘
        │               │                │
        └───────────────┴────────────────┘
                         ▼
                     Final Response
                         │
                         ▼
                       USER



---

## Detailed Flow

1. **User Query** arrives and is passed to `QueryClassifierAgent`.

2. **QueryClassifierAgent** classifies intent into:
   - `"google_search"`: Routes directly to `GoogleSearchAgent`.
   - `"report_generation"`: Routes to `QuestionDecomposerAgent`.
   - `"research_paper"`: Routes to `ResearchClassifierAgent`.
   - `"unknown"`: Routes to `UnknownHandlerAgent`.

3. If **google_search**:
   - `GoogleSearchAgent` executes search and returns results to the user.

4. If **report_generation**:
   - `QuestionDecomposerAgent` breaks down query into sub-questions.
   - Sub-questions passed to `GoogleSearchAgent` for info retrieval.
   - Results forwarded to `ReportGeneratorAgent`.
   - Report synthesized and returned.

5. If **research_paper**:
   - `ResearchClassifierAgent` classifies domain into:
     - `preprint` → `PreprintAgent`
     - `biomedical` → `BiomedicalAgent`
     - `multidisciplinary` → `MultidisciplinaryAgent`
   - Selected agent fetches papers.
   - Papers passed to `ReportGeneratorAgent`.
   - Final report returned.

6. If **unknown**:
   - `UnknownHandlerAgent` sends default message to user.

---

## Notes

- Agents use LLMs with specialized prompts.
- The graph is orchestrated with LangGraph `StateGraph`.
- Modular design allows adding/removing agents easily.
- Supports multi-source paper fetching, topic expansion, and report generation.

---

**End of Architecture Documentation**


USER
 │
 ▼
┌────────────────────┐
│ QueryClassifierAgent│
└────────────────────┘
        │
        ├───────────────┬────────────────────┬────────────────────────────┐
        │               │                    │                            │
        ▼               ▼                    ▼                            ▼
"google_search"   "report_generation"   "research_paper"             "unknown"
        │               │                    │                            │
        │               ▼                    │                            ▼
        │    ┌────────────────────┐         │                    ┌────────────────────┐
        │    │QuestionDecomposerAgent│       │                    │ UnknownHandlerAgent │
        │    └────────────────────┘         │                    └────────────────────┘
        │               │                    │
        │               ▼                    ▼
        │    ┌────────────────────┐   ┌────────────────────┐
        │    │ GoogleSearchAgent  │   │ResearchClassifierAgent│
        │    └────────────────────┘   └────────────────────┘
        │               │                    │
        │               │        ┌────────────┬───────────────┬─────────────┐
        │               │        │            │               │             │
        │               │        ▼            ▼               ▼             ▼
        │               │   "preprint"  "biomedical"  "multidisciplinary"
        │               │        │            │               │
        │               │        ▼            ▼               ▼
        │               │  ┌─────────────┐ ┌───────────────┐ ┌────────────────────────┐
        │               │  │ PreprintAgent│ │ BiomedicalAgent│ │ MultidisciplinaryAgent │
        │               │  └─────────────┘ └───────────────┘ └────────────────────────┘
        │               │        │            │               │
        │               │        └──────┬─────┴────────┬──────┘
        │               │               ▼             ▼
        │               │      ┌────────────────────────────┐
        │               │      │   ReportGeneratorAgent     │
        │               │      └────────────────────────────┘
        │               │               │
        └───────────────┴───────────────┘
                        ▼
                 Final Response
                        │
                        ▼
                      USER


In [None]:
class ResearchAgents:
    def __init__(self, model_loader: ModelLoader):
        self.llm = model_loader.load_llm()  # Get the actual LLM instance (ChatGroq or OllamaLLM)

        # Summarizer Agent
        self.summarizer_agent = AssistantAgent(
            name="summarizer_agent",
            system_message=(
                "Summarize the retrieved research papers and present concise summaries to the user. "
                "JUST GIVE THE RELEVANT SUMMARIES OF THE RESEARCH PAPER AND NOT YOUR THOUGHT PROCESS."
            ),
            llm=self.llm,  # Use the loaded model instance here
            human_input_mode="NEVER",
            code_execution_config=False
        )

        # Advantages and Disadvantages Agent
        self.advantages_disadvantages_agent = AssistantAgent(
            name="advantages_disadvantages_agent",
            system_message=(
                "Analyze the summaries of the research papers and provide a list of advantages and disadvantages "
                "for each paper in a pointwise format. JUST GIVE THE ADVANTAGES AND DISADVANTAGES, NOT YOUR THOUGHT PROCESS."
            ),
            llm=self.llm,
            human_input_mode="NEVER",
            code_execution_config=False
        )

        # Search Agent (optional)
        self.search_agent = AssistantAgent(
            name="search_agent",
            system_message="Suggest 3 related research topics for a given query.",
            llm=self.llm,
            human_input_mode="NEVER",
            code_execution_config=False
        )

    def summarize_paper(self, paper_summary):
        """Generates a summary of the research paper."""
        summary_response = self.summarizer_agent.generate_reply(
            messages=[{"role": "user", "content": f"Summarize this paper: {paper_summary}"}]
        )
        return summary_response.get("content", "Summarization failed!") if isinstance(summary_response, dict) else str(summary_response)

    def analyze_advantages_disadvantages(self, paper_summary):
        """Analyzes advantages and disadvantages of the research paper."""
        adv_dis_response = self.advantages_disadvantages_agent.generate_reply(
            messages=[{"role": "user", "content": f"Analyze advantages and disadvantages of this paper: {paper_summary}"}]
        )
        return adv_dis_response.get("content", "Analysis failed!") if isinstance(adv_dis_response, dict) else str(adv_dis_response)


### Multidisciplinary Research Platforms
These cover a broad range of subjects across many academic fields.

Scholarly (Google Scholar scraping)

ScienceDirect (Elsevier)

Semantic Scholar API

Why? They span a wide spectrum of disciplines — from engineering and social sciences to biomedical and computer science.

[User Query]
     ↓
[QueryClassifierAgent]  ←– Classifies intent (search, research, report, etc.)
     ↓
 ┌──────────────┬────────────────────┬─────────────┐
 ↓              ↓                    ↓
[GoogleSearch] [ResearchAgent]    [ReportGeneratorAgent]
                  ↓                        ↓
       ┌──────────┬────────────────────┬─────────────────────┐
       ↓          ↓                    ↓
  [Preprint]  [Multidisciplinary]   [Biomedical & Life Sciences]
   (arXiv)     (Scholar, SD, SS)     (PubMed, Springer)


In [14]:
import requests
from scholarly import scholarly  # Make sure you have scholarly installed
from typing import List, Dict
from langchain_core.tools import tool

load_dotenv()
ScienceDirect_Elsevier_API = os.getenv("ScienceDirect_Elsevier_API")

In [None]:
@tool
class MultidisciplinaryResearchPlatforms:
    def __init__(self, search_agent=None, sciencedirect_api_key=None):
        print("MultidisciplinaryResearchPlatforms Init")
        self.search_agent = search_agent  # Optional LLM-based related topics generator
        self.sciencedirect_api_key = sciencedirect_api_key  # For Elsevier API

    def fetch_google_scholar_papers(self, query: str) -> List[Dict]: # Scholarly (Google Scholar scraping)
        """
        Fetches top 5 research papers from Google Scholar.
        If fewer than 5 papers are found, expands the search using related topics.
        Returns:
            list: A list of dictionaries containing paper details (title, summary, link).
        """
        papers = []
        search_results = scholarly.search_pubs(query)

        for i, paper in enumerate(search_results):
            if i >= 5:
                break
            papers.append({
                "title": paper["bib"]["title"],
                "summary": paper["bib"].get("abstract", "No summary available"),
                "link": paper.get("pub_url", "No link available")
            })

        # Expand search if fewer than 5 papers
        if len(papers) < 5 and self.search_agent:
            related_topics_response = self.search_agent.generate_reply(
                messages=[{"role": "user", "content": f"Suggest 3 related research topics for '{query}'"}]
            )
            related_topics = related_topics_response.get("content", "").split("\n")
            for topic in related_topics:
                topic = topic.strip()
                if topic and len(papers) < 5:
                    new_papers = scholarly.search_pubs(topic)
                    for i, paper in enumerate(new_papers):
                        if len(papers) >= 5:
                            break
                        papers.append({
                            "title": paper["bib"]["title"],
                            "summary": paper["bib"].get("abstract", "No summary available"),
                            "link": paper.get("pub_url", "No link available")
                        })
        return papers

    def fetch_sciencedirect_papers(self, query: str) -> List[Dict]: # ScienceDirect (Elsevier)
        """
        Fetches top 5 papers from ScienceDirect via Elsevier API.
        Requires an API key.
        """
        if not self.ScienceDirect_Elsevier_API:
            raise ValueError("ScienceDirect API key not provided.")

        headers = {
            "X-ELS-APIKey": self.ScienceDirect_Elsevier_API,
            "Accept": "application/json"
        }
        params = {
            "query": query,
            "count": 5,
            "sort": "relevance"
        }

        url = "https://api.elsevier.com/content/search/sciencedirect"
        response = requests.get(url, headers=headers, params=params)

        papers = []
        if response.status_code == 200:
            results = response.json()
            entries = results.get("search-results", {}).get("entry", [])
            for entry in entries:
                papers.append({
                    "title": entry.get("dc:title", "No title"),
                    "summary": entry.get("dc:description", "No summary available"),
                    "link": entry.get("prism:url", "No link available")
                })

        # Expand search if fewer than 5 papers
        if len(papers) < 5 and self.search_agent:
            related_topics_response = self.search_agent.generate_reply(
                messages=[{"role": "user", "content": f"Suggest 3 related research topics for '{query}'"}]
            )
            related_topics = related_topics_response.get("content", "").split("\n")
            for topic in related_topics:
                topic = topic.strip()
                if topic and len(papers) < 5:
                    params["query"] = topic
                    response = requests.get(url, headers=headers, params=params)
                    if response.status_code == 200:
                        results = response.json()
                        entries = results.get("search-results", {}).get("entry", [])
                        for entry in entries:
                            if len(papers) >= 5:
                                break
                            papers.append({
                                "title": entry.get("dc:title", "No title"),
                                "summary": entry.get("dc:description", "No summary available"),
                                "link": entry.get("prism:url", "No link available")
                            })
        return papers

    def fetch_semantic_scholar_papers(self, query: str) -> List[Dict]:
        """
        Fetches top 5 papers from Semantic Scholar API.
        """
        url = "https://api.semanticscholar.org/graph/v1/paper/search"
        params = {
            "query": query,
            "limit": 5,
            "fields": "title,abstract,url"
        }

        response = requests.get(url, params=params)
        papers = []

        if response.status_code == 200:
            data = response.json()
            for paper in data.get("data", []):
                papers.append({
                    "title": paper.get("title", "No title"),
                    "summary": paper.get("abstract", "No summary available"),
                    "link": paper.get("url", "No link available")
                })

        # Expand search if fewer than 5 papers
        if len(papers) < 5 and self.search_agent:
            related_topics_response = self.search_agent.generate_reply(
                messages=[{"role": "user", "content": f"Suggest 3 related research topics for '{query}'"}]
            )
            related_topics = related_topics_response.get("content", "").split("\n")
            for topic in related_topics:
                topic = topic.strip()
                if topic and len(papers) < 5:
                    params["query"] = topic
                    params["limit"] = 5 - len(papers)
                    response = requests.get(url, params=params)
                    if response.status_code == 200:
                        data = response.json()
                        for paper in data.get("data", []):
                            if len(papers) >= 5:
                                break
                            papers.append({
                                "title": paper.get("title", "No title"),
                                "summary": paper.get("abstract", "No summary available"),
                                "link": paper.get("url", "No link available")
                            })
        return papers


### 2. Preprint & Early-stage Research Repositories
These focus on sharing preliminary findings before formal peer review.

arXiv API

Why? arXiv hosts preprints primarily in physics, math, computer science, and related quantitative fields.



In [None]:
@tool
class PreprintEarlyStageResearchRepositories:
    def __init__(self, search_agent=None):
        print("DataLoader Init")
        self.search_agent = search_agent  # Allow search agent to be passed, if needed

    def fetch_arxiv_papers(self, query):
        """
            Fetches top 5 research papers from ArXiv based on the user query.
            If <5 papers are found, expands the search using related topics.
            
            Returns:
                list: A list of dictionaries containing paper details (title, summary, link).
        """
        
        def search_arxiv(query):  # querying the arXiv API to fetch research papers based on a given query
            """Helper function to query ArXiv API."""
            url = f"http://export.arxiv.org/api/query?search_query=all:{query}&start=0&max_results=5"
            response = requests.get(url)
            if response.status_code == 200:
                root = ET.fromstring(response.text)  # Converts XML string into an ElementTree object
                return [
                    {  # {http://www.w3.org/2005/Atom} is a namespace required for parsing XML correctly
                        "title": entry.find("{http://www.w3.org/2005/Atom}title").text,
                        "summary": entry.find("{http://www.w3.org/2005/Atom}summary").text,
                        "link": entry.find("{http://www.w3.org/2005/Atom}id").text
                    }
                    for entry in root.findall("{http://www.w3.org/2005/Atom}entry")
                ]
            return []

        papers = search_arxiv(query) # called automatically during the execution of fetch_arxiv_papers(query)

        if len(papers) < 5 and self.search_agent:  # If fewer than 5 papers and self.search_agent exists, expand search
            # self.search_agent.generate_reply() is likely an LLM-based agent (e.g., GPT) that generates 3 related research topics for 
            # the given query.
            related_topics_response = self.search_agent.generate_reply(  # Ask the search agent for related research topics
                messages=[{"role": "user", "content": f"Suggest 3 related research topics for '{query}'"}]
            )
            related_topics = related_topics_response.get("content", "").split("\n")  # Extracts response and splits by newline into list

            for topic in related_topics:
                topic = topic.strip()  # Remove extra spaces
                if topic and len(papers) < 5:
                    new_papers = search_arxiv(topic)  # Again search_arxiv() for this 3 related research topics generated by LLM-based.
                    papers.extend(new_papers)  # Add new papers to the list
                    papers = papers[:5]  # Ensure max 5 papers

        return papers

    def fetch_google_scholar_papers(self, query):
        """
        Fetches top 5 research papers from Google Scholar.
        If fewer than 5 papers are found, expands the search using related topics.
        Returns:
            list: A list of dictionaries containing paper details (title, summary, link).
        """
        papers = []
        search_results = scholarly.search_pubs(query) #  search for research papers on Google Schola

        # Get the first 5 papers from Google Scholar
        for i, paper in enumerate(search_results):
            if i >= 5:
                break
            papers.append({
                "title": paper["bib"]["title"],
                "summary": paper["bib"].get("abstract", "No summary available"),
                "link": paper.get("pub_url", "No link available")
            })
        
        # If fewer than 5 papers, expand search using related topics
        if len(papers) < 5 and self.search_agent:  # Assuming self.search_agent is defined
            # Use the search agent to suggest related research topics
            related_topics_response = self.search_agent.generate_reply(
                messages=[{"role": "user", "content": f"Suggest 3 related research topics for '{query}'"}]
            )
            related_topics = related_topics_response.get("content", "").split("\n")

            for topic in related_topics:
                topic = topic.strip()  # Clean up the topic text
                if topic and len(papers) < 5:
                    # Re-query Google Scholar using the new related topic
                    new_papers = scholarly.search_pubs(topic)
                    for i, paper in enumerate(new_papers):
                        if len(papers) >= 5:
                            break
                        papers.append({
                            "title": paper["bib"]["title"],
                            "summary": paper["bib"].get("abstract", "No summary available"),
                            "link": paper.get("pub_url", "No link available")
                        })

        return papers


### 3. Biomedical and Life Sciences Databases
These specialize in health, medicine, and biological sciences.

PubMed

Springer Nature Open Access API (with a strong presence in medical and life sciences)

Why? PubMed is a biomedical powerhouse, while Springer Nature provides many peer-reviewed medical and biological journals.

In [None]:
import requests
from typing import List, Dict
from xml.etree import ElementTree as ET

@tool
class BiomedicalResearchPlatforms:
    def __init__(self, search_agent=None, springer_api_key=None):
        print("BiomedicalResearchPlatforms Init")
        self.search_agent = search_agent
        self.springer_api_key = springer_api_key

    def fetch_pubmed_papers(self, query: str) -> List[Dict]:
        """
        Fetches top 5 papers from PubMed using NCBI E-utilities.
        """
        base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
        search_url = f"{base_url}esearch.fcgi"
        fetch_url = f"{base_url}efetch.fcgi"

        params = {
            "db": "pubmed",
            "term": query,
            "retmode": "xml",
            "retmax": 5
        }

        response = requests.get(search_url, params=params)
        ids = []
        if response.status_code == 200:
            root = ET.fromstring(response.text)
            ids = [id_elem.text for id_elem in root.findall(".//Id")]

        papers = []

        if ids:
            fetch_params = {
                "db": "pubmed",
                "id": ",".join(ids),
                "retmode": "xml"
            }
            fetch_response = requests.get(fetch_url, params=fetch_params)
            if fetch_response.status_code == 200:
                fetch_root = ET.fromstring(fetch_response.text)
                for article in fetch_root.findall(".//PubmedArticle"):
                    title_elem = article.find(".//ArticleTitle")
                    abstract_elem = article.find(".//AbstractText")
                    link = f"https://pubmed.ncbi.nlm.nih.gov/{article.find('.//PMID').text}/"

                    papers.append({
                        "title": title_elem.text if title_elem is not None else "No title",
                        "summary": abstract_elem.text if abstract_elem is not None else "No abstract",
                        "link": link
                    })

        # Expand search if needed
        if len(papers) < 5 and self.search_agent:
            related_topics = self._get_related_topics(query)
            for topic in related_topics:
                topic = topic.strip()
                if topic and len(papers) < 5:
                    params["term"] = topic
                    response = requests.get(search_url, params=params)
                    if response.status_code == 200:
                        root = ET.fromstring(response.text)
                        ids = [id_elem.text for id_elem in root.findall(".//Id")]
                        if ids:
                            fetch_params["id"] = ",".join(ids)
                            fetch_response = requests.get(fetch_url, params=fetch_params)
                            if fetch_response.status_code == 200:
                                fetch_root = ET.fromstring(fetch_response.text)
                                for article in fetch_root.findall(".//PubmedArticle"):
                                    if len(papers) >= 5:
                                        break
                                    title_elem = article.find(".//ArticleTitle")
                                    abstract_elem = article.find(".//AbstractText")
                                    link = f"https://pubmed.ncbi.nlm.nih.gov/{article.find('.//PMID').text}/"

                                    papers.append({
                                        "title": title_elem.text if title_elem is not None else "No title",
                                        "summary": abstract_elem.text if abstract_elem is not None else "No abstract",
                                        "link": link
                                    })

        return papers

    def fetch_springer_papers(self, query: str) -> List[Dict]:
        """
        Fetches top 5 papers from Springer Nature Open Access API.
        """
        if not self.springer_api_key:
            raise ValueError("Springer API key is required.")

        base_url = "https://api.springernature.com/openaccess/jats"
        params = {
            "q": query,
            "api_key": self.springer_api_key,
            "p": 5
        }

        response = requests.get(base_url, params=params)
        papers = []

        if response.status_code == 200:
            data = response.json()
            for record in data.get("records", []):
                papers.append({
                    "title": record.get("title", "No title"),
                    "summary": record.get("abstract", "No abstract available"),
                    "link": record.get("url", [{"value": "No link"}])[0]["value"]
                })

        # Expand search if needed
        if len(papers) < 5 and self.search_agent:
            related_topics = self._get_related_topics(query)
            for topic in related_topics:
                topic = topic.strip()
                if topic and len(papers) < 5:
                    params["q"] = topic
                    response = requests.get(base_url, params=params)
                    if response.status_code == 200:
                        data = response.json()
                        for record in data.get("records", []):
                            if len(papers) >= 5:
                                break
                            papers.append({
                                "title": record.get("title", "No title"),
                                "summary": record.get("abstract", "No abstract available"),
                                "link": record.get("url", [{"value": "No link"}])[0]["value"]
                            })

        return papers

    def _get_related_topics(self, query: str) -> List[str]:
        """
        Uses the search_agent to suggest related research topics.
        """
        if not self.search_agent:
            return []

        related_topics_response = self.search_agent.generate_reply(
            messages=[{"role": "user", "content": f"Suggest 3 related research topics for '{query}'"}]
        )
        return related_topics_response.get("content", "").split("\n")


## Our Google: Search Agent

Given a Search term, search for it on the internet and summarize results.

In [None]:
from pydantic import BaseModel
from dotenv import load_dotenv
import asyncio
import os
from IPython.display import display, Markdown
from pprint import pprint
import requests

# See note above about cost of WebSearchTool

HOW_MANY_SEARCHES = 5

INSTRUCTIONS = f"You are a helpful research assistant. Given a query, come up with a set of web searches \
to perform to best answer the query. Output {HOW_MANY_SEARCHES} terms to query for."

# We use Pydantic objects to describe the Schema of the output

class WebSearchItem(BaseModel):
    reason: str
    "Your reasoning for why this search is important to the query."

    query: str
    "The search term to use for the web search."


class WebSearchPlan(BaseModel):
    searches: list[WebSearchItem]
    """A list of web searches to perform to best answer the query."""

# We pass in the Pydantic object to ensure the output follows the schema

planner_agent = Agent(
    name="PlannerAgent",
    instructions=INSTRUCTIONS,
    model="gpt-4.1-mini",
    output_type=WebSearchPlan,
)

[User Query]
     ↓
[QueryClassifierAgent]  ←– Classifies intent (search, research, report, etc.)
     ↓
 ┌──────────────┬────────────────────┬─────────────┐
 ↓              ↓                    ↓
[GoogleSearch] [ResearchAgent]    [ReportGeneratorAgent]
                  ↓                        ↓
       ┌──────────┬────────────────────┬─────────────────────┐
       ↓          ↓                    ↓
  [Preprint]  [Multidisciplinary]   [Biomedical & Life Sciences]
   (arXiv)     (Scholar, SD, SS)     (PubMed, Springer)


In [None]:
1. Query Classifier Agent 
2. Google Search Agent
3. Research Agent
3.1. Preprint Agent
3.2. Multidisciplinary Agent
3.3. Biomedical Agent
4. ReportGeneratorAgent
5. PaperContentExtractorAgent

🧭 OVERALL SYSTEM WORKFLOW
🟢 Step 1: User Enters a Query
The user might ask something like:

“Find me research papers on AI in medical imaging.”

“What’s the latest on ChatGPT?”

“Generate a report on breast cancer detection using AI.”

🟡 Step 2: QueryClassifierAgent
This agent is the brain of the system at the front.
It classifies the query into one of the following intents:

Query Type	Action Taken
🔍 Google/Internet search	Route to GoogleSearchAgent
📄 Research query (papers, articles)	Route to ResearchAgent
📑 Report generation	Route to ReportGeneratorAgent
❓ Other queries (optional)	Can route to FallbackAgent or ask for clarification

🔵 Step 3: Path Selection Based on Classification
✅ If it's a Google-like query:
→ GoogleSearchAgent handles it
(Uses Google Search API, Bing API, or web scraping to answer general questions)

✅ If it's a Research-related query:
→ Route to ResearchAgent
This agent will further classify the research query based on domain:

Domain	Sub-Agent Called
🧪 Preprint / Early-stage Research	PreprintAgent → (arXiv API)
🌐 Multidisciplinary	MultidisciplinaryAgent → (Scholarly, ScienceDirect, Semantic Scholar)
🧬 Biomedical / Life Sciences	BiomedicalAgent → (PubMed, Springer Nature Open Access API)

Each sub-agent:

Queries its relevant platform(s)

Returns paper results (title, abstract, link)

📕 Step 4: If it's a Report Request
2 Scenarios:
✳️ Case A: User directly asks for a report
Example: "Generate a report on Alzheimer's treatment advances."

QueryClassifierAgent routes it to ReportGeneratorAgent

ReportGeneratorAgent:

Analyzes the query

Calls the appropriate sub-agent via ResearchAgent

Fetches research results

Summarizes & organizes them

Outputs a structured report (e.g., in Markdown, PDF)

✳️ Case B: User first asks for papers, then for a report
Example:

User: "Give me papers on blockchain in healthcare" → routed to ResearchAgent

Results are fetched and shown.

User: "Now make a report on that" → routed to ReportGeneratorAgent

ReportGeneratorAgent:

Takes the existing paper list (from memory/context)

Summarizes, clusters, and formats the content into a report

🧱 AGENT RESPONSIBILITIES
Agent	Role
QueryClassifierAgent	Classifies the user's intent
GoogleSearchAgent	Handles general web/internet queries
ResearchAgent	Classifies & routes research queries
PreprintAgent	Fetches from arXiv
MultidisciplinaryAgent	Fetches from Google Scholar, ScienceDirect, Semantic Scholar
BiomedicalAgent	Fetches from PubMed and Springer
ReportGeneratorAgent	Summarizes research and produces structured reports