Research Orchestration System

Synthesizing findings across dozens of papers into a literature review takes researchers weeks. Cross-checking whether a specific claim holds up across the broader literature is equally time-consuming. The Research Orchestration System automates both tasks using three multi-step agents built using Elastic Agent Builder. These are specialized agents that search, synthesise, review, and verify each other's work. Currently running on ~200 Agentic AI papers (~5,000 full-text chunks) indexed in Elasticsearch, the system is corpus-agnostic. The included indexing pipeline converts any collection of PDFs into searchable, embedded chunks.

Architecture

[React Frontend]  ── Vite + Tailwind chat UI (Netlify)
       |
       | HTTPS (SSE streaming)
       v
[FastAPI Backend]  ── GCP VM + Caddy reverse proxy
       |--- /api/research  (REST API — SSE streaming, literature review)
       |--- /api/verify    (REST API — SSE streaming, claim verification)
       |--- /mcp           (MCP endpoint for Claude Desktop)
       |--- /slack/*        (Slack bot — OAuth / slash commands)
       |
       v
[Elastic Cloud]  ── Agent Builder
       |--- Research Agent              (searches corpus, writes reviews)
       |--- Peer Review Agent           (evaluates drafts, issues verdicts)
       |--- Claim Verification Agent    (verifies claims against corpus)
       |--- papers_metadata index       (~200 papers — titles, abstracts, embeddings)
       |--- papers_chunks index         (~5000 chunks — full-text passages + embeddings)

Access Points

Interface	URL / Command
Web App	https://elasticresearchagent.netlify.app
MCP (Claude Desktop)	Add connector: `https://elasticresearchagent.duckdns.org/mcp`
Slack	`/research <topic>` or `/check-claim <claim>` after installing via Add to Slack
REST API	`POST /api/research` (literature review) or `POST /api/verify` (claim verification)

How It Works

Research Agent

Runs a six-step pipeline:

Plans sub-questions from user's query
Scopes the corpus using ES|QL analytics
Identifies key papers by citation count
Retrieves evidence using hybrid keyword + semantic search across full-text chunks
Cross-checks findings for contradictions by running targeted searches for opposing evidence
Synthesizes everything into a structured literature review with inline citations and confidence tags ([SUPPORTED], [CONTESTED], [INSUFFICIENT])

Review Agent

Evaluates research draft through seven verification steps:

Checks structural completeness
Batch-verifies all references exist via ES|QL
Audits confidence tags by verifying each claim
Spot-checks quantitative claims against source text
Identifies missing high-impact papers by comparing cited references against the most cited papers for the topic
Validates contradictions by independently searching corpus
Issues final verdict: 'PASS' or 'REVISION_NEEDED'

The Research Agent and Review Agent operate in an orchestrated loop; if the reviewer finds draft unsatisfactory, it sends back specific, actionable feedback, and the Research Agent revises accordingly, up to two iterations.

Claim Verification Agent

Evaluates a specific claim against the corpus through a five-step pipeline:

Parses the claim into testable statements
Finds relevant papers using search queries with varied terminology
Gathers evidence and classifies each excerpt as SUPPORTS, CONTRADICTS, or QUALIFIES
Assesses nuances by searching for methodological differences and scope limitations
Produces structured verdict with confidence level

Each agent uses 5 custom tools (2 index searches, 3 ES|QL tools) plus default platform tools. A FastAPI backend orchestrates the agent loop, streaming real-time reasoning traces via SSE.

Project Structure

├── server/
│   ├── main.py              # FastAPI entry point (REST + MCP + Slack routes)
│   ├── mcp_server.py         # MCP server (research_literature_review + research_draft + verify_claim tools)
│   ├── config.py              # Environment config, agent IDs, headers
│   ├── routers/
│   │   └── research.py        # POST /api/research + /api/verify — SSE streaming endpoints
│   └── services/
│       ├── agent.py           # Elastic Agent Builder streaming client
│       ├── orchestrator.py    # Research-review loop + claim verification orchestrator
│       └── workflow.py        # Elastic Workflows API client (legacy)
│
├── frontend/                  # React + Vite + Tailwind chat UI
│   └── src/
│       ├── App.tsx            # Main app with chat state management
│       ├── hooks/
│       │   └── useResearchStream.ts  # SSE client hook
│       └── components/        # Header, Sidebar, ChatContainer, ReasoningTrace, etc.
│
├── slack_bot/
│   ├── bolt_app.py            # Slack Bolt app with OAuth (multi-workspace)
│   ├── handlers.py            # /research + /check-claim slash command handlers
│   └── formatting.py          # Markdown → Slack mrkdwn conversion
│
├── workflows/
│   ├── research_review_loop.yaml  # Elastic Workflows YAML (3-iteration loop)
│   └── peer_review.yaml          # Standalone peer review workflow
│
├── config.py                  # Elasticsearch client config (for indexing scripts)
├── setup_indexes.py           # Creates papers_metadata + papers_chunks indexes
├── load_metadata.py           # Indexes paper metadata with abstract embeddings
├── parse_pdfs.py              # Extracts, chunks, and sections PDF text
├── index_chunks.py            # Embeds and indexes parsed chunks
├── run_indexing.py            # Orchestrates the full ingestion pipeline
├── test_search.py             # Validates keyword, vector, and ES|QL queries
├── evaluate_reports.py        # Measures report quality against the corpus
├── app.py                     # Streamlit frontend (legacy, replaced by React app)
│
├── reports/                   # Auto-saved literature review reports
├── tests/
│   └── test_mcp_server.py     # MCP server unit tests
│
├── DEPLOYMENT.md              # Full deployment guide (GCP, Caddy, Netlify, DNS)
├── SLACK.md                   # Slack bot setup and troubleshooting
├── TOOLS.md                   # Custom Elastic Agent Builder tools documentation
├── EVALUATION.md              # Evaluation metrics and interpretation guide
└── PROGRESS.md                # Development progress tracker

Elasticsearch Indexes

Index	Documents	Purpose
papers_metadata	~200	Paper-level data: title, authors, abstract, year, citation count, keywords, abstract embedding (384d)
papers_chunks	~5,000	Chunked full-text passages (~385 words each) with embeddings for hybrid search

Embeddings use all-MiniLM-L6-v2 (384 dimensions, cosine similarity).

Evaluation

evaluate_reports.py cross-references generated reports against the Elasticsearch corpus to measure:

Metric	What It Measures
Citation Accuracy	Whether cited papers exist in the corpus (detects hallucinated references)
Claim Grounding	Whether quantitative claims can be traced to source text
Corpus Coverage	How many papers the agent cited out of the total corpus
Confidence Distribution	Balance of `[SUPPORTED]`, `[CONTESTED]`, `[INSUFFICIENT]` tags
Report Statistics	Word count, sections, references, research gaps, contradictions

See EVALUATION.md for detailed metrics explanation and interpretation.

Local Development

Prerequisites

Python 3.10+
Node.js 18+ (for frontend)
Elasticsearch credentials in .env

Backend

pip install -r requirements.txt
uvicorn server.main:app --host 0.0.0.0 --port 8000

Frontend

cd frontend
npm install
npm run dev

The frontend dev server runs at http://localhost:5173 and proxies API requests to http://localhost:8000.

Data Ingestion (Bring Your Own Corpus)

The indexing pipeline converts any collection of PDFs into searchable, embedded chunks in Elasticsearch. Place your PDFs in the data/ directory and run:

python setup_indexes.py    # Create ES indexes with embedding mappings
python run_indexing.py      # Load metadata, parse PDFs, chunk text, generate embeddings, index
python test_search.py       # Validate keyword, vector, and ES|QL search

The pipeline handles PDF parsing, text chunking (~385 words), embedding generation (all-MiniLM-L6-v2, 384d), and indexing into two Elasticsearch indexes (papers_metadata + papers_chunks). The agents then work over whatever corpus is indexed.

Deployment

The system runs on a GCP e2-micro VM (Always Free tier) with Caddy for HTTPS, DuckDNS for DNS, and Netlify for the frontend. Total cost is ~$0-4/month (mostly GCP external IP). See DEPLOYMENT.md for full details.

Documentation

File	Contents
DEPLOYMENT.md	GCP VM, Caddy, DuckDNS, Netlify, server management
SLACK.md	Slack bot OAuth setup, slash command configuration
TOOLS.md	Custom Elastic Agent Builder tools and search strategy
EVALUATION.md	Report quality metrics, benchmarks, interpretation
PROGRESS.md	Development history and phase-by-phase progress

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Orchestration System

Architecture

Access Points

How It Works

Research Agent

Review Agent

Claim Verification Agent

Project Structure

Elasticsearch Indexes

Evaluation

Local Development

Prerequisites

Backend

Frontend

Data Ingestion (Bring Your Own Corpus)

Deployment

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
frontend		frontend
reports		reports
server		server
slack_bot		slack_bot
tests		tests
workflows		workflows
.gitignore		.gitignore
AGENT.md		AGENT.md
Brief_Description.md		Brief_Description.md
DEPLOYMENT.md		DEPLOYMENT.md
EVALUATION.md		EVALUATION.md
LICENSE		LICENSE
README.md		README.md
SLACK.md		SLACK.md
TOOLS.md		TOOLS.md
VIDEO_GUIDE.md		VIDEO_GUIDE.md
app.py		app.py
config.py		config.py
evaluate_reports.py		evaluate_reports.py
index_chunks.py		index_chunks.py
load_metadata.py		load_metadata.py
mcp_config.example.json		mcp_config.example.json
netlify.toml		netlify.toml
parse_pdfs.py		parse_pdfs.py
parsed_chunks.json		parsed_chunks.json
requirements.txt		requirements.txt
run_indexing.py		run_indexing.py
setup_indexes.py		setup_indexes.py
test_search.py		test_search.py

Folders and files

Latest commit

History

Repository files navigation

Research Orchestration System

Architecture

Access Points

How It Works

Research Agent

Review Agent

Claim Verification Agent

Project Structure

Elasticsearch Indexes

Evaluation

Local Development

Prerequisites

Backend

Frontend

Data Ingestion (Bring Your Own Corpus)

Deployment

Documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages