Document RAG

Local document RAG with hybrid retrieval, cross-encoder reranking, and inline clickable citations. Single-user, fully containerised.

Stack: Docling · Crawl4AI · faster-whisper · BGE-M3 · Qdrant · BGE-reranker-v2-m3 · OpenAI · FastAPI · Next.js · Celery · Postgres · Redis · Ragas

Layout

apps/
  api/        FastAPI: ingest, chat (SSE), library, jobs (SSE), originals
  worker/     Celery: parse → chunk → embed → upsert
  web/        Next.js + Tailwind: Sources, Library, Chat, Settings
packages/
  rag-core/   Reusable Python library: parsers, chunking, embedding,
              vectorstore, reranker, retrieval, generator, pipeline
infra/
  docker-compose.yml + Dockerfile.gpu + Dockerfile.web
eval/         Ragas golden-set runner + run differ
data/
  originals/  Uploaded source files (one dir per document)
  models/     Embedder/reranker/Whisper weights cache
docs/         architecture, ingestion, retrieval, eval, runbook

Quickstart

Prereqs: Docker (for infra only), uv, pnpm, make, an OpenAI API key, ~16 GB VRAM on the host for the embedder + reranker (+ Whisper for audio). No NVIDIA Container Toolkit needed — api/worker run on the host venv with direct GPU access.

cp .env.example .env       # set OPENAI_API_KEY
make install               # uv sync + pnpm install
make up                    # qdrant + postgres + redis (in Docker)
make migrate               # apply Postgres schema (once)

Then run the app on the host (three terminals):

make api          # uvicorn on :8000 (GPU)
make worker       # celery worker (GPU)
make web          # next dev on :3000

Open http://localhost:3000.

Fully containerised stack (optional): if you do have the NVIDIA Container Toolkit and prefer everything in Docker, run make up-full instead — that builds the api/worker/web containers and starts them alongside the infra.

Highlights

One model, two views. BGE-M3 produces dense + sparse vectors in a single forward pass. Qdrant uses both with RRF fusion — no separate BM25 service.
Cross-encoder rerank. BGE-reranker-v2-m3 reorders the top-50 to top-8. Toggleable for ablation.
Citations are real. Generator is forced to cite [n] from passed sources; the UI parses these as you stream and turns them into chips that open a side drawer with the chunk preview and a link to the original file.
Multilingual. EN / RU / HE supported by BGE-M3 + reranker; Hebrew renders RTL automatically.
Audio + video + images + websites. Docling handles documents and OCR; Crawl4AI fetches single URLs or recursively crawls; faster-whisper transcribes.
Eval from day one. make eval runs Ragas (faithfulness, context precision/recall, answer relevancy) plus a deterministic citation-recall check against eval/golden.jsonl.

Documentation

Architecture — components and data flow
Ingestion — parsers and chunking
Retrieval — hybrid search, rerank, citations
Eval — golden set + Ragas + diff workflow
Runbook — operations, failure modes, backups

Configuration

All runtime knobs live in .env. See .env.example for the full list. Common toggles:

Variable	Default	What it does
`OPENAI_MODEL`	`gpt-5`	Generator
`RETRIEVAL_TOP_K`	`8`	Final passages sent to the LLM
`RETRIEVAL_PREFETCH`	`50`	Hybrid candidates before rerank
`RERANK_ENABLED`	`true`	Cross-encoder rerank stage
`QUERY_REWRITING_ENABLED`	`true`	Optional query rewrite for retrieval
`EMBEDDING_MODEL`	`BAAI/bge-m3`
`RERANKER_MODEL`	`BAAI/bge-reranker-v2-m3`
`WHISPER_MODEL`	`distil-large-v3`	Audio/video transcription

Development

make fmt        # ruff format
make lint       # ruff check + mypy
make test       # pytest (rag-core unit tests)

API: apps/api/app/main.py. Worker: apps/worker/worker/celery_app.py. Web dev server: pnpm --filter web dev.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apps		apps
data		data
docs		docs
eval		eval
infra		infra
packages/rag-core		packages/rag-core
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document RAG

Layout

Quickstart

Highlights

Documentation

Configuration

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document RAG

Layout

Quickstart

Highlights

Documentation

Configuration

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages