# PaperMind – Your intelligent assistant for scientific paper discovery and research 🚀🚀🚀

## Overview



This project implements an intelligent research assistant that helps users navigate, understand, and analyze scientific literature using LangGraph and advanced language models. By combining various academic APIs with sophisticated paper processing techniques, it creates a seamless experience for researchers, students, and professionals working with academic papers.

> NOTE: The presented workflow is not domain specific: each step in the graph can be adapted to a different domain by simply changing the prompts.


## Motivation

Research literature review represents a significant time investment in R&D, with studies showing that researchers spend 30-50% of their time reading, analyzing, and synthesizing academic papers. This challenge is universal across the research community. While thorough literature review is crucial for advancing science and technology, the current process remains inefficient and time-consuming.

Key challenges include:
- Extensive time commitment (30-50% of R&D hours) dedicated to reading and processing papers
- Inefficient search processes across fragmented database ecosystems
- Complex task of synthesizing and connecting findings across multiple papers
- Resource-intensive maintenance of comprehensive literature reviews
- Ongoing effort required to stay current with new publications

## Key components

 1. State-Driven Workflow Engine 
    - StateGraph Architecture: Five-node system for orchestrated research 
    - Decision Making Node: Query intent analysis and routing 
    - Planning Node: Research strategy formulation
    - Tool Execution Node: Paper retrieval and processing 
    - Judge Node: Quality validation and improvement cycles 

2. Paper Processing Integration 
    - Source Integration, CORE / arXiv API for comprehensive paper access 
    - Document Processing, PDF content extraction, Text structure preservation 

3. Analysis Workflow 
    - State-aware processing pipeline 
    - Multi-step validation gates 
    - Quality-focused improvement cycles 
    - Human-in-the-loop validation options

An overview of the workflow is shown below:

![image](https://i.ibb.co/0BBzkcb/mermaid-diagram-2024-11-17-195744.png)]

## Method details

1. The system requires 
    - OpenAI API key to access GPT 4o. This model was chosen after comparing its performance with other, open-source alternatives (in particular Llama 3). However, any other LLM with tool calling capabilities can be used.
    - CORE API key for paper retrieval. CORE is one of the larges online repositories for scientific papers, counting over 136 million papers, and offers a free API for personal use. A key can be requested [here](https://core.ac.uk/services/api#form).
    - arXiv does not require an API key for accessing its research paper database. arXiv’s API is open and free to use without registration or authentication.

2. Technical Architecture: 
    - LangGraph for state orchestration.
    - PDFplumber for document processing.
    - Pydantic for structured data handling.

> Acknowledgment: Special thanks to CORE API and arXiv key for enabling academic paper access😊

---

## Setup

This cell installs the required dependencies.

In [None]:
! pip install --upgrade --quiet langchain==0.2.16 langchain-community==0.2.16 langchain-openai==0.1.23 langgraph==0.2.18 langsmith==0.1.114 pdfplumber python-dotenv

This cell imports the required libraries and sets the environment variables.

In [1]:
import io
import json
import os
import urllib3
import time

import pdfplumber
import xml.etree.ElementTree as ET
from dotenv import load_dotenv
from IPython.display import display, Markdown
from langchain_core.messages import BaseMessage, SystemMessage, ToolMessage, AIMessage
from langchain_core.tools import BaseTool, tool
from langchain_openai import ChatOpenAI
from langgraph.graph import END, StateGraph
from langgraph.graph.state import CompiledStateGraph
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field
from typing import Annotated, ClassVar, Sequence, TypedDict, Optional

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
load_dotenv()

# Load environment variables
openai_api_key = os.getenv("OPENAI_API_KEY")
openai_base_url = os.getenv("OPENAI_BASE_URL")
core_api_key = os.getenv("CORE_API_KEY")

## Prompts

This cell contains the prompts used in the workflow.

The `agent_prompt` contains a section explaining how to use complex queries with the CORE API or arXiv API, enabling the agent to solve more complex tasks.

In [2]:
# Prompt for the initial decision making on how to reply to the user
decision_making_prompt = """
You are an experienced scientific researcher.
Your goal is to help the user with their scientific research.

Based on the user query, decide if you need to perform a research or if you can answer the question directly.
- You should perform a research if the user query requires any supporting evidence or information.
- You should answer the question directly only for simple conversational questions, like "how are you?".
"""

# Prompt to create a step by step plan to answer the user query
planning_prompt = """
# IDENTITY AND PURPOSE

You are an experienced scientific researcher.
Your goal is to make a new step by step plan to help the user with their scientific research .

Subtasks should not rely on any assumptions or guesses, but only rely on the information provided in the context or look up for any additional information.

If any feedback is provided about a previous answer, incorportate it in your new planning.


# TOOLS

For each subtask, indicate the external tool required to complete the subtask. 
Tools can be one of the following:
{tools}
"""

# Prompt for the agent to answer the user query with the CORE API
core_agent_prompt = """
# IDENTITY AND PURPOSE

You are an experienced scientific researcher. 
Your goal is to help the user with their scientific research. You have access to a set of external tools to complete your tasks.
Follow the plan you wrote to successfully complete the task.

Add extensive inline citations to support any claim made in the answer.


# EXTERNAL KNOWLEDGE

## CORE API

The CORE API has a specific query language that allows you to explore a vast papers collection and perform complex queries. See the following table for a list of available operators:

| Operator       | Accepted symbols         | Meaning                                                                                      |
|---------------|-------------------------|----------------------------------------------------------------------------------------------|
| And           | AND, +, space          | Logical binary and.                                                                           |
| Or            | OR                     | Logical binary or.                                                                            |
| Grouping      | (...)                  | Used to prioritise and group elements of the query.                                           |
| Field lookup  | field_name:value       | Used to support lookup of specific fields.                                                    |
| Range queries | fieldName(>, <,>=, <=) | For numeric and date fields, it allows to specify a range of valid values to return.         |
| Exists queries| _exists_:fieldName     | Allows for complex queries, it returns all the items where the field specified by fieldName is not empty. |

Use this table to formulate more complex queries filtering for specific papers, for example publication date/year.
Here are the relevant fields of a paper object you can use to filter the results:
{
  "authors": [{"name": "Last Name, First Name"}],
  "documentType": "presentation" or "research" or "thesis",
  "publishedDate": "2019-08-24T14:15:22Z",
  "title": "Title of the paper",
  "yearPublished": "2019"
}

Example queries:
- "machine learning AND yearPublished:2023"
- "maritime biology AND yearPublished>=2023 AND yearPublished<=2024"
- "cancer research AND authors:Vaswani, Ashish AND authors:Bello, Irwan"
- "title:Attention is all you need"
- "mathematics AND _exists_:abstract"
"""

# Prompt for the agent to answer the user query with the arXiv API
arxiv_agent_prompt = """
# IDENTITY AND PURPOSE

You are an experienced scientific researcher.  
Your goal is to help the user with their scientific research.  
You have access to a set of external tools to complete your tasks.  
Follow the plan you wrote to successfully complete the task.  

Add extensive inline citations to support any claim made in the answer.  

# EXTERNAL KNOWLEDGE

## arXiv API  

The arXiv API allows you to search for research papers by keywords, authors, and categories.  
You can use the following syntax to construct search queries:

### **Query Syntax:**
- `"all:{keywords}"` → Searches for `keywords` in **title, abstract, and authors**
- `"ti:{title}"` → Searches for papers with a **specific title**
- `"abs:{abstract_keywords}"` → Searches within the **abstract**
- `"au:{author_name}"` → Searches for a specific **author** (e.g., `"au:Ashish Vaswani"`)
- `"cat:{category}"` → Searches by subject **category** (e.g., `"cat:cs.LG"` for Machine Learning)
- `"submittedDate:[YYYY-MM-DD TO YYYY-MM-DD]"` → Filters by **submission date range**
- **Logical Operators:** `AND`, `OR`, `NOT` can be used for complex queries  
- **Grouping:** Parentheses `(...)` can be used to group terms  

### **Example Queries:**
1. Find recent papers on **LLM agents and reasoning**:  
   - `"all:(LLM AND agent AND reasoning)"`  
2. Find papers on **reinforcement learning by Ashish Vaswani**:  
   - `"all:(reinforcement learning) AND au:Ashish Vaswani"`  
3. Find papers on **neurosymbolic AI submitted after 2024-01-01**:  
   - `"all:(neurosymbolic AI) AND submittedDate:[2024-01-01 TO *]"`  
4. Find **GPT-related** papers in **Machine Learning (cs.LG)**:  
   - `"all:GPT AND cat:cs.LG"`  

Use this format to construct precise queries for retrieving relevant scientific papers.
"""


# Prompt for the judging step to evaluate the quality of the final answer
judge_prompt = """
You are an expert scientific researcher.
Your goal is to review the final answer you provided for a specific user query.

Look at the conversation history between you and the user. Based on it, you need to decide if the final answer is satisfactory or not.

A good final answer should:
- Directly answer the user query. For example, it does not answer a question about a different paper or area of research.
- Answer extensively the request from the user.
- Take into account any feedback given through the conversation.
- Provide inline sources to support any claim made in the answer.

In case the answer is not good enough, provide clear and concise feedback on what needs to be improved to pass the evaluation.
"""

## Utility classes and functions

This cell contains the utility classes and functions used in the workflow. It includes a wrapper around the CORE (arxiv) API, the Pydantic models for the input and output of the nodes, and a few general-purpose functions.

The `CoreAPIWrapper` class includes a retry mechanism to handle transient errors and make the workflow more robust.


In [3]:
class CoreAPIWrapper(BaseModel):
    """Simple wrapper around the CORE API."""
    base_url: ClassVar[str] = "https://api.core.ac.uk/v3"  
    api_key: ClassVar[str] = core_api_key

    top_k_results: int = Field(description = "Top k results obtained by running a query on Core", default = 1)

    def _get_search_response(self, query: str) -> dict:
        http = urllib3.PoolManager()

        # Retry mechanism to handle transient errors
        max_retries = 5    
        for attempt in range(max_retries):
            response = http.request(
                'GET',
                f"{self.base_url}/search/outputs", 
                headers={"Authorization": f"Bearer {self.api_key}"}, 
                fields={"q": query, "limit": self.top_k_results}
            )
            if 200 <= response.status < 300:
                return json.loads(response.data.decode('utf-8'))
            elif attempt < max_retries - 1:
                time.sleep(2 ** (attempt + 2))
            else:
                raise Exception(f"Got non 2xx response from CORE API: {response.status} {response.data}")

    def search(self, query: str) -> str:
        response = self._get_search_response(query)
        results = response.get("results", [])
        if not results:
            return "No relevant results were found"

        # Format the results in a string
        docs = []
        for result in results:
            published_date_str = result.get('publishedDate') or result.get('yearPublished', '')
            authors_str = ' and '.join([item['name'] for item in result.get('authors', [])])
            docs.append((
                f"* ID: {result.get('id', '')},\n"
                f"* Title: {result.get('title', '')},\n"
                f"* Published Date: {published_date_str},\n"
                f"* Authors: {authors_str},\n"
                f"* Abstract: {result.get('abstract', '')},\n"
                f"* Paper URLs: {result.get('sourceFulltextUrls') or result.get('downloadUrl', '')}"
            ))
        return "\n-----\n".join(docs)
    

The `ArxivAPIWrapper` class to interact with the ArXiv API.

In [4]:
class ArxivAPIWrapper(BaseModel):
    """Wrapper around the arXiv API for paper searching."""
    base_url: ClassVar[str] = "http://export.arxiv.org/api/query"

    top_k_results: int = Field(description="Top k results obtained by running a query on arXiv", default=1)

    def _get_search_response(self, query: str) -> str:
        http = urllib3.PoolManager()
        search_query = f"search_query=all:{query}&start=0&max_results={self.top_k_results}&sortBy=submittedDate&sortOrder=descending"

        response = http.request("GET", f"{self.base_url}?{search_query}")
        if response.status != 200:
            raise Exception(f"Failed to fetch data from arXiv: {response.status}")

        return response.data.decode("utf-8")

    def _parse_arxiv_response(self, xml_data: str) -> list[dict]:
        """Parses XML response from arXiv API and extracts relevant paper details."""
        root = ET.fromstring(xml_data)
        namespace = {"arxiv": "http://www.w3.org/2005/Atom"}

        papers = []
        for entry in root.findall("arxiv:entry", namespace):
            title = entry.find("arxiv:title", namespace).text.strip()
            summary = entry.find("arxiv:summary", namespace).text.strip()
            published_date = entry.find("arxiv:published", namespace).text.strip()
            authors = [author.find("arxiv:name", namespace).text.strip() for author in entry.findall("arxiv:author", namespace)]
            link = entry.find("arxiv:id", namespace).text.strip()

            papers.append({
                "title": title,
                "abstract": summary,
                "published_date": published_date,
                "authors": ", ".join(authors),
                "link": link
            })
        
        return papers

    def search(self, query: str) -> str:
        xml_response = self._get_search_response(query)
        papers = self._parse_arxiv_response(xml_response)

        if not papers:
            return "No relevant results were found"

        docs = [
            f"* Title: {paper['title']}\n"
            f"* Published Date: {paper['published_date']}\n"
            f"* Authors: {paper['authors']}\n"
            f"* Abstract: {paper['abstract']}\n"
            f"* Link: {paper['link']}\n"
            for paper in papers
        ]
        return "\n-----\n".join(docs)

In [5]:
class SearchPapersInput(BaseModel):
    """Input object to search papers with the CORE API."""
    query: str = Field(description="The query to search for on the selected archive.")
    max_papers: int = Field(description="The maximum number of papers to return. It's default to 1, but you can increase it up to 10 in case you need to perform a more comprehensive search.", default=1, ge=1, le=10)

class DecisionMakingOutput(BaseModel):
    """Output object of the decision making node."""
    requires_research: bool = Field(description="Whether the user query requires research or not.")
    answer: Optional[str] = Field(default=None, description="The answer to the user query. It should be None if the user query requires research, otherwise it should be a direct answer to the user query.")

class JudgeOutput(BaseModel):
    """Output object of the judge node."""
    is_good_answer: bool = Field(description="Whether the answer is good or not.")
    feedback: Optional[str] = Field(default=None, description="Detailed feedback about why the answer is not good. It should be None if the answer is good.")

def format_tools_description(tools: list[BaseTool]) -> str:
    return "\n\n".join([f"- {tool.name}: {tool.description}\n Input arguments: {tool.args}" for tool in tools])

async def print_stream(app: CompiledStateGraph, input: str) -> Optional[BaseMessage]:
    display(Markdown("## New research running"))
    display(Markdown(f"### Input:\n\n{input}\n\n"))
    display(Markdown("### Stream:\n\n"))

    # Stream the results 
    all_messages = []
    async for chunk in app.astream({"messages": [input]}, stream_mode="updates"):
        for updates in chunk.values():
            if messages := updates.get("messages"):
                all_messages.extend(messages)
                for message in messages:
                    message.pretty_print()
                    print("\n\n")
 
    # Return the last message if any
    if not all_messages:
        return None
    return all_messages[-1]

## Agent state

This cell defines the agent state, which contains the following information:
- `requires_research`: Whether the user query requires research or not.
- `num_feedback_requests`: The number of times the LLM asked for feedback.
- `is_good_answer`: Whether the LLM's final answer is good or not.
- `messages`: The conversation history between the user and the LLM.

In [6]:
class AgentState(TypedDict):
    """The state of the agent during the paper research process."""
    requires_research: bool = False
    num_feedback_requests: int = 0
    is_good_answer: bool = False
    messages: Annotated[Sequence[BaseMessage], add_messages]

## Agent tools

This cell defines the tools available to the agent. The toolkit contains a tool to search for scientific papers using the CORE API, a tool to download a scientific paper from a given URL, and a tool to ask for human feedback.

To make the paper download more robust, the tool includes a retry mechanism, similar to the one used for the CORE API, as well as a mock browser header to avoid 403 errors.

In [7]:
@tool("search-papers", args_schema=SearchPapersInput)
def search_papers(query: str, max_papers: int = 1) -> str:
    """Search for scientific papers using the CORE API.

    Example:
    {"query": "Attention is all you need", "max_papers": 1}

    Returns:
        A list of the relevant papers found with the corresponding relevant information.
    """
    try:
        return ArxivAPIWrapper(top_k_results=max_papers).search(query)
    except Exception as e:
        return f"Error performing paper search: {e}"

@tool("download-paper")
def download_paper(url: str) -> str:
    """Download a specific scientific paper from a given URL.

    Example:
    {"url": "https://sample.pdf"}

    Returns:
        The paper content.
    """
    try:        
        http = urllib3.PoolManager(
            cert_reqs='CERT_NONE',
        )
        
        # Mock browser headers to avoid 403 error
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate, br',
            'Connection': 'keep-alive',
        }
        max_retries = 5
        for attempt in range(max_retries):
            response = http.request('GET', url, headers=headers)
            if 200 <= response.status < 300:
                pdf_file = io.BytesIO(response.data)
                with pdfplumber.open(pdf_file) as pdf:
                    text = ""
                    for page in pdf.pages:
                        text += page.extract_text() + "\n"
                return text
            elif attempt < max_retries - 1:
                time.sleep(2 ** (attempt + 2))
            else:
                raise Exception(f"Got non 2xx when downloading paper: {response.status_code} {response.text}")
    except Exception as e:
        return f"Error downloading paper: {e}"

@tool("ask-human-feedback")
def ask_human_feedback(question: str) -> str:
    """Ask for human feedback. You should call this tool when encountering unexpected errors."""
    return input(question)

tools = [search_papers, download_paper, ask_human_feedback]
tools_dict = {tool.name: tool for tool in tools}

## Workflow nodes

This cell defines the nodes of the workflow. Note how the `judge_node` is configured to end the execution if the LLM failed to provide a good answer twice to keep latency acceptable.

In [8]:
# LLMs
base_llm = ChatOpenAI(model="gpt-4o", temperature=0.0, api_key=openai_api_key, base_url=openai_base_url)
decision_making_llm = base_llm.with_structured_output(DecisionMakingOutput)
agent_llm = base_llm.bind_tools(tools)
judge_llm = base_llm.with_structured_output(JudgeOutput)

# Decision making node
def decision_making_node(state: AgentState):
    """Entry point of the workflow. Based on the user query, the model can either respond directly or perform a full research, routing the workflow to the planning node"""
    system_prompt = SystemMessage(content=decision_making_prompt)
    response: DecisionMakingOutput = decision_making_llm.invoke([system_prompt] + state["messages"])
    output = {"requires_research": response.requires_research}
    if response.answer:
        output["messages"] = [AIMessage(content=response.answer)]
    return output

# Task router function
def router(state: AgentState):
    """Router directing the user query to the appropriate branch of the workflow."""
    if state["requires_research"]:
        return "planning"
    else:
        return "end"

# Planning node
def planning_node(state: AgentState):
    """Planning node that creates a step by step plan to answer the user query."""
    system_prompt = SystemMessage(content=planning_prompt.format(tools=format_tools_description(tools)))
    response = base_llm.invoke([system_prompt] + state["messages"])
    return {"messages": [response]}

# Tool call node
def tools_node(state: AgentState):
    """Tool call node that executes the tools based on the plan."""
    outputs = []
    for tool_call in state["messages"][-1].tool_calls:
        tool_result = tools_dict[tool_call["name"]].invoke(tool_call["args"])
        outputs.append(
            ToolMessage(
                content=json.dumps(tool_result),
                name=tool_call["name"],
                tool_call_id=tool_call["id"],
            )
        )
    return {"messages": outputs}

# Agent call node
def agent_node(state: AgentState):
    """Agent call node that uses the LLM with tools to answer the user query."""
    system_prompt = SystemMessage(content=arxiv_agent_prompt)
    response = agent_llm.invoke([system_prompt] + state["messages"])
    return {"messages": [response]}

# Should continue function
def should_continue(state: AgentState):
    """Check if the agent should continue or end."""
    messages = state["messages"]
    last_message = messages[-1]

    # End execution if there are no tool calls
    if last_message.tool_calls:
        return "continue"
    else:
        return "end"

# Judge node
def judge_node(state: AgentState):
    """Node to let the LLM judge the quality of its own final answer."""
    # End execution if the LLM failed to provide a good answer twice.
    num_feedback_requests = state.get("num_feedback_requests", 0)
    if num_feedback_requests >= 2:
        return {"is_good_answer": True}

    system_prompt = SystemMessage(content=judge_prompt)
    response: JudgeOutput = judge_llm.invoke([system_prompt] + state["messages"])
    output = {
        "is_good_answer": response.is_good_answer,
        "num_feedback_requests": num_feedback_requests + 1
    }
    if response.feedback:
        output["messages"] = [AIMessage(content=response.feedback)]
    return output

# Final answer router function
def final_answer_router(state: AgentState):
    """Router to end the workflow or improve the answer."""
    if state["is_good_answer"]:
        return "end"
    else:
        return "planning"


## Workflow definition

This cell defines the workflow using LangGraph.

In [9]:
# Initialize the StateGraph
workflow = StateGraph(AgentState)

# Add nodes to the graph
workflow.add_node("decision_making", decision_making_node)
workflow.add_node("planning", planning_node)
workflow.add_node("tools", tools_node)
workflow.add_node("agent", agent_node)
workflow.add_node("judge", judge_node)

# Set the entry point of the graph
workflow.set_entry_point("decision_making")

# Add edges between nodes
workflow.add_conditional_edges(
    "decision_making",
    router,
    {
        "planning": "planning",
        "end": END,
    }
)
workflow.add_edge("planning", "agent")
workflow.add_edge("tools", "agent")
workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue": "tools",
        "end": "judge",
    },
)
workflow.add_conditional_edges(
    "judge",
    final_answer_router,
    {
        "planning": "planning",
        "end": END,
    }
)

# Compile the graph
app = workflow.compile()

## Example usecase for PhD academic research

This cell tests the workflow with several example queries. These queries are designed to evaluate the agent on the following aspects:
- Completing tasks that are representative of the work a PhD researcher might need to perform.
- Addressing more specific tasks that require researching papers within a defined timeframe.
- Tackling tasks across multiple areas of research.
- Critically evaluating its own responses by sourcing specific information from the papers.

In [12]:
test_inputs = [
    "Find 5 recent papers on LLM, and reinforcement learning (RL).",
    
    "Find 5 recent papers on LLM, and reasoning."
]

# Run tests and store the results for later visualisation
outputs = []
for test_input in test_inputs:
    final_answer = await print_stream(app, test_input)
    outputs.append(final_answer.content)

## New research running

### Input:

Find 5 recent papers on LLM, agents and reinforcement learning (RL).



### Stream:




Let's search for 5 recent papers on the topic of "LLM, agents, and reinforcement learning (RL)." I will perform the search now.



Tool Calls:
  search-papers (call_pZv0TvhcbICj8EvS0jJA67oS)
 Call ID: call_pZv0TvhcbICj8EvS0jJA67oS
  Args:
    query: LLM AND agents AND reinforcement learning
    max_papers: 5



Name: search-papers

"* Title: Towards Autonomous Reinforcement Learning for Real-World Robotic\n  Manipulation with Large Language Models\n* Published Date: 2025-03-06T10:08:44Z\n* Authors: Niccol\u00f2 Turcato, Matteo Iovino, Aris Synodinos, Alberto Dalla Libera, Ruggero Carli, Pietro Falco\n* Abstract: Recent advancements in Large Language Models (LLMs) and Visual Language\nModels (VLMs) have significantly impacted robotics, enabling high-level\nsemantic motion planning applications. Reinforcement Learning (RL), a\ncomplementary paradigm, enables agents to autonomously optimize complex\nbehaviors through interaction and reward signals. However, designing effective\nreward fu

## New research running

### Input:

Find 5 recent papers on LLM, agents and reasoning.



### Stream:




{"query": "LLM agents reasoning", "max_papers": 5}



Tool Calls:
  search-papers (call_dL6EAIMJoN66e2ntRsld2g8V)
 Call ID: call_dL6EAIMJoN66e2ntRsld2g8V
  Args:
    query: LLM agents reasoning
    max_papers: 5



Name: search-papers

"* Title: LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM\n* Published Date: 2025-03-06T18:59:38Z\n* Authors: Sambal Shikhar, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jean Lahoud, Fahad Khan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal\n* Abstract: Recent advancements in speech-to-speech dialogue systems leverage LLMs for\nmultimodal interactions, yet they remain hindered by fine-tuning requirements,\nhigh computational overhead, and text-speech misalignment. Existing\nspeech-enabled LLMs often degrade conversational quality by modifying the LLM,\nthereby compromising its linguistic capabilities. In contrast, we propose\nLLMVoX, a lightweight 30M-parameter, LLM-agnostic, autoregressive streaming TTS\nsystem that generates

# Display results

This cell displays the results of the test queries for a more compact visualisation of the results.


In [13]:
for input, output in zip(test_inputs, outputs):
    display(Markdown(f"## Input:\n\n{input}\n\n"))
    display(Markdown(f"## Output:\n\n{output}\n\n"))

## Input:

Find 5 recent papers on LLM, agents and reinforcement learning (RL).



## Output:

Here are 5 recent papers on the topic of "LLM, agents, and reinforcement learning (RL)":

1. **Title:** Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models  
   **Authors:** Niccolò Turcato, Matteo Iovino, Aris Synodinos, Alberto Dalla Libera, Ruggero Carli, Pietro Falco  
   **Published Date:** March 6, 2025  
   **Abstract:** This paper introduces ARCHIE, an unsupervised pipeline leveraging GPT-4 to generate reward functions from natural language task descriptions for RL agents. The approach automates task success criteria coding and demonstrates its effectiveness in robotic manipulation tasks.  
   **Link:** [Read the paper](http://arxiv.org/abs/2503.04280v1)  

2. **Title:** Pretrained LLMs as Real-Time Controllers for Robot Operated Serial Production Line  
   **Authors:** Muhammad Waseem, Kshitij Bhatta, Chen Li, Qing Chang  
   **Published Date:** March 5, 2025  
   **Abstract:** This study explores the use of GPT-4 as a control framework for mobile robot scheduling in manufacturing systems. The LLM-based framework achieves performance comparable to state-of-the-art methods like Multi-Agent Reinforcement Learning (MARL) without extensive retraining.  
   **Link:** [Read the paper](http://arxiv.org/abs/2503.03889v1)  

3. **Title:** Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm  
   **Authors:** Hyeonjun Kim, Kanghoon Lee, Junho Park, Jiachen Li, Jinkyoo Park  
   **Published Date:** March 5, 2025  
   **Abstract:** This paper proposes a Reinforcement Learning with Human Feedback (RLHF) approach for multi-agent systems, using LLMs to validate feedback scenarios. The method refines policies for Unmanned Surface Vehicle (USV) swarms, addressing challenges like collision avoidance and task allocation.  
   **Link:** [Read the paper](http://arxiv.org/abs/2503.03796v1)  

4. **Title:** Persuasion at Play: Understanding Misinformation Dynamics in Demographic-Aware Human-LLM Interactions  
   **Authors:** Angana Borah, Rada Mihalcea, Verónica Pérez-Rosas  
   **Published Date:** March 3, 2025  
   **Abstract:** This research investigates the dynamics of misinformation in human-LLM interactions, using a multi-agent LLM framework to analyze demographic-based susceptibility to misinformation and echo chamber behavior.  
   **Link:** [Read the paper](http://arxiv.org/abs/2503.02038v1)  

5. **Title:** Improving Retrospective Language Agents via Joint Policy Gradient Optimization  
   **Authors:** Xueyang Feng, Bo Lan, Quanyu Dai, Lei Wang, Jiakai Tang, Xu Chen, Zhenhua Dong, Ji-Rong Wen  
   **Published Date:** March 3, 2025  
   **Abstract:** The paper introduces RetroAct, a framework that integrates imitation learning and reinforcement learning to enhance task-planning and self-reflective evolution in language agents. It demonstrates significant improvements in task performance and decision-making.  
   **Link:** [Read the paper](http://arxiv.org/abs/2503.01490v1)  

Let me know if you'd like to explore any of these papers in more detail!



## Input:

Find 5 recent papers on LLM, agents and reasoning.



## Output:

Here are 5 recent papers on LLMs, agents, and reasoning:

1. **Title:** [LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM](http://arxiv.org/abs/2503.04724v1)  
   **Authors:** Sambal Shikhar, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, et al.  
   **Abstract:** This paper introduces LLMVoX, a lightweight, LLM-agnostic, autoregressive streaming text-to-speech system. It focuses on improving multimodal interactions in speech-to-speech dialogue systems by decoupling speech synthesis from LLM processing. The system supports infinite-length dialogues and generalizes to new languages with minimal adaptation.  
   **Published Date:** 2025-03-06  

2. **Title:** [Shifting Long-Context LLMs Research from Input to Output](http://arxiv.org/abs/2503.04723v1)  
   **Authors:** Yuhao Wu, Yushi Bai, Zhiqing Hu, et al.  
   **Abstract:** This paper advocates for a shift in research focus from processing extended input contexts to generating long-form outputs. It highlights the challenges in tasks like novel writing, long-term planning, and complex reasoning, emphasizing the need for foundational LLMs tailored for high-quality, long-form text generation.  
   **Published Date:** 2025-03-06  

3. **Title:** [Enough Coin Flips Can Make LLMs Act Bayesian](http://arxiv.org/abs/2503.04722v1)  
   **Authors:** Ritwik Gupta, Rodolfo Corona, Jiaxin Ge, et al.  
   **Abstract:** This study explores whether LLMs perform structured reasoning consistent with Bayesian frameworks. Using biased coin flips as a controlled setting, the authors find that LLMs can update their priors in a Bayesian manner with sufficient in-context learning demonstrations.  
   **Published Date:** 2025-03-06  

4. **Title:** [Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining](http://arxiv.org/abs/2503.04715v1)  
   **Authors:** Houyi Li, Wenzheng Zheng, Jingcheng Hu, et al.  
   **Abstract:** This paper identifies universal scaling laws for hyperparameters in LLM pretraining. It provides insights into optimal learning rates and batch sizes, offering a plug-and-play tool for hyperparameter optimization. The findings are validated across diverse model architectures and data distributions.  
   **Published Date:** 2025-03-06  

5. **Title:** [Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size](http://arxiv.org/abs/2503.04704v1)  
   **Authors:** Alireza Behtash, Marijan Fofonjka, Ethan Baird, et al.  
   **Abstract:** This paper presents Entropy-Weighted Quantization (EWQ), a method for selective model quantization that is independent of model architecture or size. The approach reduces memory usage while maintaining performance, offering a new perspective on efficient LLM deployment.  
   **Published Date:** 2025-03-06  

Let me know if you'd like to explore any of these papers in more detail!



---

## Comparative Analysis

In this comprehensive analysis, we evaluated our Scientific Paper Agent against two leading AI knowledge co pilots : Microsoft Copilot and Perplexity AI. Using a standardized query - "Find 8 papers on quantum machine learning" - we conducted a detailed comparison across multiple dimensions to understand the strengths, limitations, and optimal use cases for each system.


#### Test Case Implementation

We implemented a controlled test using the same research query across all three platforms:
- Query: "Find 8 papers on quantum machine learning"
- Sample Size: Multiple test runs to ensure consistency
- Evaluation Time: Early 2024
- Metrics Tracked: Response time, metadata quality, and result structure

#### Key Findings

While our agent demonstrated superior academic rigor and metadata completeness, taking approximately 30 seconds per query, competitors like Microsoft Copilot (2 seconds) and Perplexity AI (4-5 seconds) showed advantages in response speed. This tradeoff between speed and depth reflects different design philosophies and target use cases.

The comparative analysis reveals a clear differentiation in approaches:
- Our Agent: Optimized for thorough academic research with comprehensive validation
- Microsoft Copilot: Focused on rapid information retrieval and general overview
- Perplexity AI: Balanced approach with emphasis on source verification

### Microsoft  copilot results 

![image](https://i.ibb.co/y4Zf4Pc/Screenshot-2024-11-17-at-21-40-21.png)]

### Perplexity AI results
![image](https://i.ibb.co/n1rr7kW/Screenshot-2024-11-17-at-21-40-42.png)]

### Metrics Comparsion
![image](https://i.ibb.co/5KbTmFq/Screenshot-2024-11-17-at-22-03-43.png)


Here we present a comprehensive comparison between our research assistant agent and leading platforms (Microsoft Copilot and Perplexity AI). Using a standardized query - "Find 8 papers on quantum machine learning" - we evaluated performance across key metrics including response time, metadata quality, and academic value. Our analysis reveals distinct trade-offs: while our agent takes longer to process (30s vs. 2-5s), it provides significantly more detailed metadata, validated sources, and structured academic output. The comparison table above breaks down these differences across multiple dimensions, helping users choose the right tool for their specific research needs - whether it's quick exploration (where Copilot excels) or deep academic research (where our agent shows its strengths).

---



##  Limitations

1. Technical Limitations
    - API rate limits for paper access
    - Handle time for large PDFs
    - Limited to publicly accessible papers
  
2. Functional Limitations
    - No support for image analysis in papers
    - Limited context window for very long papers
    - Cannot perform mathematical derivations
    - Language constraints for non-English papers


## Potential Improvements:

1. Technical Improvements
    - Implement parallel processing for multiple papers
    - Add caching system for frequently accessed papers
    - Integrate multiple academic APIs for broader coverage
    - Implement batch processing for large datasets

2. Functional Improvements
    - Add support for figure and table extraction
    - Implement cross-referencing between papers
    - Add citation network analysis
    - Include domain-specific validation rules
        
3. User Experience
    - Add interactive feedback mechanisms
    - Implement progress tracking
    - Add customizable validation criteria
    - Include export options for research summaries
        
   
## Specific Use Cases:

1. Academic Research, Literature review and paper analysis.
    - Comprehensive search
    - Citation tracking
    - Cross-reference validation

2. Industry Research, Technical documentation and patent analysis.
    - Focused search
    - Technical specification extraction
    - Competitive analysis

3. Educational, Student research assistance.
    - Simplified explanations
    - Learning resource identification
    - Guided research process


---

## Conclusion

This implementation demonstrates how state-driven architectures can transform academic paper analysis. By combining LangGraph's orchestration capabilities with robust API integrations, we've created a system that maintains research rigor while automating key aspects of paper processing. The workflow's emphasis on validation and quality control ensures reliable research outputs while significantly streamlining the paper analysis process.
