LLM Tree Reasoning ◦ Knowledge Graph Multi-Hop ◦ Pixel-Precise Citations ◦ Unmatched Performance
Quick Start • Features • Technical Approach • Docs • 中文
Many approaches have been proposed to go beyond naive chunk-and-embed RAG, but each has fundamental limitations:
| Approach | Strength | Limitation |
|---|---|---|
| Embedding-based (e.g. naive RAG) | Fast semantic search | Similarity ≠ relevance; misses exact-match and structural context |
| Graph-based (e.g. GraphRAG) | Cross-document entity linking | Concept skeleton without source-text evidence; extraction loses details |
| Hybrid graph (e.g. LightRAG) | Dual-level retrieval (local + global) | Answers synthesized from KG summaries, not grounded in original text; higher hallucination risk |
| Reasoning-based (e.g. PageIndex) | High single-doc accuracy | Query latency scales linearly with document count; not production-ready |
When a domain expert encounters a question, they don't scan every page — they instantly recall where relevant information lives, draw on their mental map of how concepts connect, then synthesize a grounded answer from multiple sources. ForgeRAG mirrors this workflow: BM25 + vector search surfaces candidate regions in milliseconds, a knowledge graph provides the conceptual connections across documents, and LLM tree navigation reasons over document structure to pinpoint the exact sections that matter — all fused into a single answer with traceable citations.
To handle multi-hop questions (e.g. "Which suppliers of Apple also supply Samsung?"), we introduce a knowledge graph path that extracts entities and relations at ingestion time, then runs Leiden community detection with LLM-generated summaries to enable high-level thematic retrieval alongside entity-level traversal. Inspired by LightRAG's context assembly, the KG path injects synthesized entity descriptions, relation summaries, and community overviews directly into the generation prompt — giving the LLM a "distilled knowledge layer" on top of raw text chunks.
We evaluate against LightRAG using the UltraDomain benchmark methodology (LLM-as-judge pairwise comparison). Win rates shown as ForgeRAG% / LightRAG%.
🚧 More comprehensive benchmarks against additional RAG systems, domains, and metrics are in progress.
| Domain | Comprehensiveness | Diversity | Empowerment | Overall |
|---|---|---|---|---|
| Agriculture | 58.6 / 41.4 | 47.1 / 52.9 | 52.9 / 47.1 | 56.4 / 43.6 |
| CS | 55.6 / 44.4 | 48.4 / 51.6 | 54.0 / 46.0 | 54.8 / 45.2 |
| Legal | 57.0 / 43.0 | 46.5 / 53.5 | 53.5 / 46.5 | 55.6 / 44.4 |
| Mix | 56.3 / 43.7 | 47.8 / 52.2 | 54.3 / 45.7 | 55.1 / 44.9 |
Judge: qwen3-max · Reproduce
Note on Faithfulness: The UltraDomain benchmark evaluates Comprehensiveness, Diversity, and Empowerment — but not factual accuracy. ForgeRAG provides pixel-precise
[c_N]citations for every claim, enabling verification against source text. LightRAG synthesizes answers from knowledge graph summaries without traceable citations, which scores well on breadth but carries higher hallucination risk.
Compared to heavier platforms like RAGFlow, ForgeRAG focuses on core pipeline design — a lean retrieval-answering chain you can deploy out of the box.
🔍 Dual-reasoning retrieval · BM25 + vector pre-filter → LLM tree nav + KG, fused via RRF
📌 Pixel-precise citations · Every claim links to exact page + bounding box, click to highlight
🔗 Full retrieval tracing · Inspect path scores, expansion decisions, and merge logic per query
💬 Multi-turn conversations · Context-aware follow-ups with conversation history
📄 Multi-format ingestion · PDF, DOCX, PPTX, XLSX, HTML, Markdown, TXT
🔌 Pluggable & web config · Swap any backend via Web UI, apply & restart in one click
🏆 Outperforms LightRAG · 55.48% overall win rate on UltraDomain benchmark
📸 Screenshots
Chat · Structured answers with pixel-precise citations
Ingestion · Document processing pipeline with tree building
Knowledge Graph · Entity-relation visualization
- Python 3.10+
- Node.js 18+ (for building the frontend)
- An LLM API key (OpenAI, DeepSeek, or any LiteLLM-compatible provider)
- Recommended: 4+ CPU cores, 8GB+ RAM (16GB+ for large documents with KG extraction)
git clone https://github.com/deeplethe/ForgeRAG.git
cd ForgeRAG
# Python dependencies
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Frontend
cd web && npm install && npm run build && cd ..
# Configure (interactive wizard: pick provider, set keys, done)
python scripts/setup.py
# Run (use multiple workers for responsive UI during ingestion)
python main.py --workers 4Open http://localhost:8000 — the web UI is served automatically.
Note: We recommend running with
--workers 4(or more). Document ingestion involves heavy LLM calls (tree building, KG extraction, embedding) that can block the API if only one worker is running. Multiple workers ensure the web UI stays responsive while documents are being processed in the background.
git clone https://github.com/deeplethe/ForgeRAG.git
cd ForgeRAG
python scripts/docker_setup.py # Interactive wizard: pick provider, set keys, done
docker compose up -d # PostgreSQL + pgvector + ForgeRAG, ready to goOpen http://localhost:8000. See Deployment Guide for details.
Tip: We strongly recommend enabling MinerU — it significantly improves document structure parsing accuracy, especially for PDFs with complex layouts, tables, and formulas. Enable it in the web UI settings after startup.
| Component | Options |
|---|---|
| PDF Parser | PyMuPDF (fast) → MinerU (layout-aware, table/formula) → VLM (vision-language) |
| Relational DB | SQLite (default), PostgreSQL, MySQL |
| Vector Store | ChromaDB (default), pgvector (PostgreSQL), Qdrant, Milvus, Weaviate |
| Blob Storage | Local filesystem (default), Amazon S3, Alibaba OSS |
| Graph Store | NetworkX in-memory (default), Neo4j |
| LLM / Embeddings | Any LiteLLM-supported provider: OpenAI, Azure, Anthropic, Ollama, DeepSeek, Cohere, etc. |
| Flag | Default | Description |
|---|---|---|
--config |
auto-detect | Path to forgerag.yaml |
--host |
0.0.0.0 |
Bind address (or $FORGERAG_HOST) |
--port |
8000 |
Bind port (or $FORGERAG_PORT) |
--reload |
off | Hot-reload for development |
--workers |
4 |
Uvicorn workers |
The diagram above shows the complete data flow. For detailed pipeline documentation with per-node annotations, see Architecture Overview.
The REST API is available at /api/v1/. Interactive docs:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Key endpoints:
| Endpoint | Description |
|---|---|
POST /api/v1/query |
Ask a question (streaming SSE or sync) |
POST /api/v1/documents |
Upload and ingest a document |
GET /api/v1/documents/{id}/tree |
Document hierarchical structure |
GET /api/v1/graph |
Knowledge graph visualization |
PUT /api/v1/settings/key/{key} |
Update config at runtime |
- Getting Started — Installation, first document, step-by-step guide
- Architecture Overview — How ingestion, retrieval, and answering pipelines work
- Configuration Reference — Every config option with defaults and examples
- API Reference — REST API endpoints, request/response formats, SSE streaming
- Deployment Guide — Docker deploy, production checklist, Nginx, Ollama
- Development Guide — Dev setup, testing, adding new backends
ForgeRAG/
├── api/ # FastAPI routes and schemas
├── answering/ # Answer generation pipeline
├── config/ # Pydantic configuration models
├── embedder/ # Embedding backends (LiteLLM, sentence-transformers)
├── graph/ # Knowledge graph stores (NetworkX, Neo4j)
├── ingestion/ # Document ingestion pipeline + format conversion
├── parser/ # PDF parsing, chunking, tree building
├── persistence/ # Database layer (relational, vector, blob)
├── retrieval/ # Retrieval pipeline (BM25, vector, tree, KG, merge)
├── scripts/ # CLI utilities (setup wizard, Docker setup, batch ingest)
├── web/ # Vue 3 frontend
├── docs/ # Detailed documentation
├── main.py # Application entry point
└── forgerag.yaml # Your local config (git-ignored)
- 🧪 More benchmarks against additional RAG systems and domains
- 🔄 Scale to 1M+ documents · incremental indexing, async KG
- 🌐 Multi-language retrieval · cross-lingual query and document support
- 📦 Python SDK ·
pip install forgerag-sdk - 🛠️ Config panel hints & diagnostics · Missing provider warnings, validation feedback
- ⚡ Performance optimization · Faster ingestion, query caching, async embedding
We welcome contributions of all kinds — bug fixes, new features, documentation improvements, and more.
Please read our Contributing Guide before submitting a pull request.


