🔬 VMARO: Vectorless Multi-Agent Research Orchestrator

Feed it a research topic -> Get back a comprehensive thematic tree, parallel methodology evaluations, and a structured, funding-ready grant proposal evaluated for novelty. All without a vector database or embeddings layer.

Overview

VMARO is an advanced 8-stage, multi-agent AI pipeline orchestrating academic research and grant writing. Instead of the traditional, generic RAG mechanism (chunking texts and vector similarity), VMARO utilizes LLM-native structural synthesis to construct an interpretable "Thematic Tree" directly from multiple live academic sources.

The multi-model engine sequentially analyzes literature, detects emerging macro-trends, isolates critical research gaps, pits multiple methodologies against each other in a parallel "challenger" phase, formats the outcomes to specific institutional guidelines (e.g., NIH, NSF, ERC), and finally generates the full-bodied proposal with a quantified novelty score and PDF/LaTeX exports.

Key Features & Architecture Improvements

Vectorless Navigation: No FAISS, no ChromaDB. Replaces black-box semantic retrieval with direct semantic clustering, constructing a visual Thematic Tree directly from high-signal abstracts and metadata.
Intelligent Quality Gates: Built-in "LLM-as-a-Judge" layers validate outputs iteratively between stages. If data is shallow or hallucinatory, the gate will flag it (PASS, REVISE, FAIL).
Parallel Methodology Evaluation: VMARO doesn't just pick the first idea. It drafts a primary methodology, constructs a challenger counter-approach, and objectively evaluates which design has stronger statistical power and feasibility.
Intent-Aware Preprocessing: Raw user input — whether a phrase or a paragraph — is normalized into a structured payload with domain classification, query variants, and explicit research intent (survey_gaps, propose_methodology) before retrieval begins. Prevents garbage-in-garbage-out at the pipeline root.
Institutional Format Matching: Automatically restructures and tunes rhetorical tone to align with rigorous schemas (e.g., NSF, NIH, ERC) using a dedicated Format Matcher. You can upload custom JSON format templates as well.
Stateful Resiliency: All outputs cache natively via utils/cache.py. Process interrupted? The pipeline resumes immediately from the last checkpoint to save API credits.

The 8-Stage Pipeline

[Research Topic]
       ↓
 0️⃣  Topic Normalization       (Intent classification + query variant generation)
       ↓
 1️⃣  Literature Mining         (Multi-API Fetcher: arXiv, PubMed, Scholar + LLM)
       ↓
 2️⃣  Thematic Tree Builder     (Clusters into hierarchical themes) → 🛡️ [Quality Gate 1]
       ↓
 3️⃣  Trend Analysis            (Detects dominant/emerging signals)
       ↓
 4️⃣  Gap Identification        (Auto-detects and ranks multiple research gaps) → 🛡️ [Quality Gate 2]
       ↓
     [User Intervenes: Selects Gap or Defines Custom]
       ↓
 5️⃣  Methodology Evaluator     (Drafts Primary vs Challenger Methodologies -> Selects Winner)
       ↓
 6️⃣  Format Selection          (Matches winning approach to grant styles + User Override)
       ↓
 7️⃣  Grant Writing             (Detailed content generation constrained by format schema)
       ↓
 8️⃣  Novelty Scoring           (Coarse tree pass → Deep paper comparison → 0-100 Score)
       ↓
[Streamlit Dashboard / LaTeX PDF Export]

Dashboard Workflows in Action

1. Command Center / Overview Dashboard

2. Literature Mining & Corpus Generation

3. Thematic Tree Synthesis

4. Gap Identification & Selection

5. Parallel Methodology Evaluation

6. Generated Proposal & Novelty Scoring

Quickstart

1. Clone & Environment

git clone https://github.com/your-org/vmaro.git
cd vmaro

# Create and sync virtual environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Configure API Keys

cp .env.example .env

Edit the .env to map your respective accounts. VMARO leverages multiple providers (Gemini / Groq / AWS) dynamically, handling round-robin request pools to bypass restrictive free-tier rate limits.

# Foundational LLMs
GROQ_API_KEY_1=your_key
GEMINI_4_AWS_KEY_1=your_key

# External sources (optional, standard use bypasses these if not provided)
SEMANTIC_SCHOLAR_KEY=

3. Run via CLI

To let the automated orchestrator handle everything programmatically:

python main.py --topic "Federated Learning in Bioinformatics"

Want to bypass the parallel methodology evaluation? Add the --no-parallel flag.

4. Interactive UI Mode (Recommended)

To utilize the dynamic visualizer (Agraph), manual gap selection intervention, and one-click Format/PDF generation:

streamlit run app.py

Open http://localhost:8501 in your browser.

Repository Structure

vmaro/
├── agents/
│   ├── literature_agent.py      # Agent 1: Multi-API Fetch & Consolidate
│   ├── tree_agent.py            # Agent 2: Hierarchical Clustinger
│   ├── trend_agent.py           # Agent 3: Macro-Signals Identification
│   ├── gap_agent.py             # Agent 4: Target Discovery
│   ├── methodology_agent.py     # Agent 5a: Method generation
│   ├── methodology_evaluator.py # Agent 5b: Primary vs Challenger eval
│   ├── format_matcher.py        # Agent 6: Matching proposal formats
│   ├── grant_agent.py           # Agent 7: Format-compliant Grant Writing
│   └── novelty_agent.py         # Agent 8: Score validation
├── utils/
│   ├── multi_api_fetcher.py     # Scholar, PubMed, Arxiv, CrossRef multiplexer
│   ├── schema.py                # Pydantic-like validations, LLM cleanup & Key rotation
│   ├── quality_gate.py          # Quality evaluator middleware
│   ├── format_loader.py         # Loads and registers JSON schemas for Grants
│   └── latex_exporter.py        # Converts generated outputs to PDF / Tex
├── app.py                       # Modern Streamlit UI application
├── main.py                      # CrewAI Orchestrator Execution script
└── ...

📫 Capabilities vs Limitations

Capabilities:

Deduplication: Multi-API fetches eliminate cross-source duplicates.
Robust Fail-Safes: All keys are iterated cyclically. clean_json_response() parses markdown-polluted LLM responses flawlessly.

Future Items:

Paper count is intentionally bounded at 20 to optimize token efficiency and maintain coherent thematic clustering — larger corpora dilute signal without improving output quality at current LLM context limits.
Deeper automated web-searching in the Methodology generation phase for specific up-to-date Python/R package implementations.

License

MIT License. Feel free to fork, expand, and commercialize.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.streamlit		.streamlit
agents		agents
docs		docs
grant_formats		grant_formats
image		image
mock_data		mock_data
schemas_for_user		schemas_for_user
tests		tests
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
QUICKSTART.md		QUICKSTART.md
README.md		README.md
app.py		app.py
check_groq_models.py		check_groq_models.py
diagnostic.py		diagnostic.py
main.py		main.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
smoke_test.py		smoke_test.py
smoke_test_gemini.py		smoke_test_gemini.py
test.py		test.py
test_429_headers.py		test_429_headers.py
test_arxiv.py		test_arxiv.py
test_models.py		test_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 VMARO: Vectorless Multi-Agent Research Orchestrator

Overview

Key Features & Architecture Improvements

The 8-Stage Pipeline

Dashboard Workflows in Action

Quickstart

1. Clone & Environment

2. Configure API Keys

3. Run via CLI

4. Interactive UI Mode (Recommended)

Repository Structure

📫 Capabilities vs Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔬 VMARO: Vectorless Multi-Agent Research Orchestrator

Overview

Key Features & Architecture Improvements

The 8-Stage Pipeline

Dashboard Workflows in Action

Quickstart

1. Clone & Environment

2. Configure API Keys

3. Run via CLI

4. Interactive UI Mode (Recommended)

Repository Structure

📫 Capabilities vs Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages