A powerful Model Context Protocol (MCP) server for AI-assisted academic research, hosted on Hugging Face Spaces. Enables LLMs to search the web, read webpages, and discover research papers with citations and PDF links β all through a clean SSE interface.
| Service | URL |
|---|---|
| MCP Server (SSE) | https://Codemaster67-ResearchPaperMCP.hf.space/sse |
| LangChain Agent API | https://Codemaster67-GoolgeLangchainAgent.hf.space/ |
This project is split into two independent services that work together:
MCP_SERVER/β A FastMCP server that exposes research tools as MCP-compatible endpoints, hosted on Hugging Face Spaces via SSE.langchain_backend/β A FastAPI service that wraps the MCP server with a LangGraph ReAct agent (powered by Google Gemini), and exposes a REST chat API consumed by a frontend.
MCP_For_Researcher/
β
βββ MCP_SERVER/ # π§ MCP Tool Server (Hugging Face Spaces)
β βββ mcp_server.py # FastMCP server β defines all 5 MCP tools
β βββ Testmcp.py # Quick local test script for MCP tools
β βββ dockerfile # Docker config for HF Spaces deployment
β
βββ langchain_backend/ # π€ LangChain Agent + FastAPI Backend
β βββ agent.py # FastAPI app β LangGraph ReAct agent + /initialize & /chat endpoints
β βββ TestUI.py # Streamlit test UI for the agent backend
β
βββ README.md
The MCP server exposes 5 research tools that any MCP-compatible AI agent can call directly over SSE.
| Tool | Description |
|---|---|
π search_web |
Google search via SerpAPI β returns titles, links, and snippets |
π fetch_web_content |
Extracts full Markdown content from any URL using Jina Reader |
π academic_research |
Queries Semantic Scholar (with automatic OpenAlex fallback) |
π get_paper_id |
Resolves a paper title to its DOI, ArXiv ID, and OpenAlex ID |
π§© find_related_papers |
Finds similar papers by Semantic Scholar / OpenAlex / DOI |
General web + YouTube search using Google (via SerpAPI).
search_web(query="attention is all you need explained", required_links=5)
# Returns: [{ title, link, snippet }, ...]Reads and extracts full Markdown text from any webpage. Powered by Jina Reader. (No YouTube support.)
fetch_web_content(url="https://arxiv.org/abs/1706.03762")
# Returns: Full page text as MarkdownSearches academic databases. Tries Semantic Scholar first; falls back to OpenAlex automatically.
academic_research(query="transformer models NLP", limit=5)
# Returns: [{ title, authors, year, citationCount, abstract, openAccessPdf, externalIds }, ...]Resolves a paper title/keywords to all known identifiers.
get_paper_id(query="BERT pre-training deep bidirectional transformers")
# Returns: { title, paperId, doi, openalex, arxiv, source }Finds recommended/similar papers. Accepts Semantic Scholar ID, OpenAlex ID, or DOI.
find_related_papers(paper_id="204e3073870fae3d05bcbc2f6a8e263d9b72e776", limit=5)
# Returns: [{ title, authors, year, citationCount, url }, ...]cd MCP_SERVER
# Install dependencies
pip install fastmcp requests
# Run the server (SSE on port 7860)
python mcp_server.pycd MCP_SERVER
docker build -t research-mcp .
docker run -p 7860:7860 research-mcpA FastAPI service that acts as the bridge between your frontend and the MCP server. On startup it connects to the live MCP SSE endpoint and fetches all available tools. Users bring their own Gemini API key.
| Method | Endpoint | Description |
|---|---|---|
POST |
/initialize |
Initialize the ReAct agent with your Gemini API key + model |
POST |
/chat |
Send a message (+ optional file) and receive an AI response |
cd langchain_backend
# Install dependencies
pip install fastapi uvicorn langchain-mcp-adapters langchain-google-genai langgraph
# Start the server
python agent.py
# API available at http://localhost:7860curl -X POST http://localhost:7860/initialize \
-F "api_key=YOUR_GOOGLE_GEMINI_API_KEY" \
-F "model_name=gemini-2.5-flash"curl -X POST http://localhost:7860/chat \
-F "message=Find me the top 5 papers on vision transformers with citation counts"You can also attach a file (PDF, image, or text) to the
/chatendpoint as multipart form data.
TestUI.py provides a quick browser-based interface to test the agent backend without a production frontend:
cd langchain_backend
pip install streamlit
streamlit run TestUI.pyThe MCP SSE endpoint can be used directly by any MCP-compatible client β no LangChain required.
from langchain_mcp_adapters.client import MultiServerMCPClient
client = MultiServerMCPClient({
"ResearchAgent": {
"url": "https://Codemaster67-ResearchPaperMCP.hf.space/sse",
"transport": "sse"
}
})
tools = await client.get_tools(){
"mcpServers": {
"ResearchAgent": {
"url": "https://Codemaster67-ResearchPaperMCP.hf.space/sse",
"transport": "sse"
}
}
}| Component | Technology |
|---|---|
| MCP Framework | FastMCP |
| Web Search | SerpAPI (Google engine) |
| Web Reader | Jina Reader (r.jina.ai) |
| Academic Search | Semantic Scholar API + OpenAlex |
| Agent Framework | LangGraph ReAct Agent |
| LLM | Google Gemini (via langchain-google-genai) |
| Agent Backend | FastAPI + Uvicorn |
| Test UI | Streamlit |
| Hosting | Hugging Face Spaces |
| Containerization | Docker (Python 3.10-slim) |
These keys are baked into the server. For production, move them to HF Spaces Secrets or environment variables.
| Variable | Service | Purpose |
|---|---|---|
SERP_API_KEY |
SerpAPI | Google web search |
JINA_API_KEY |
Jina AI | Webpage content extraction |
OPEN_ALEX_API_KEY |
OpenAlex | Fallback academic search polite pool |
The Gemini API key is provided at runtime by the user via /initialize β it is never stored server-side.
Pull requests are welcome! Feel free to open an issue for bugs, feature requests, or new tool ideas.