A distributed Retrieval-Augmented Generation (RAG) system that answers user questions by retrieving relevant documents from a vector store and generating answers using an LLM. The system is split into separate services (orchestrator, retriever, generator) so each can be scaled and deployed independently.
- Domain-aware retrieval: Queries are categorized into domains (e.g.
overview,concepts); the orchestrator routes to the appropriate retriever. - Vector search: Documents are embedded with Sentence Transformers and indexed with pgvector for fast similarity search.
- LLM generation: Answers are generated via Groq (configurable model).
- PostgreSQL-backed documents: Document metadata and content are stored in PostgreSQL; the retriever loads by domain and builds hnsw index stored in pgvector.
flowchart TB
Client["Client (Browser)"]
subgraph orchestrator["Orchestrator (FastAPI)"]
O["Categorize query to domain → Route to domain-specific retriever → Call generator with contexts + prompt"]
end
subgraph retrievers["Retrievers (FastAPI) — distributed by domain"]
R1["Retriever: overview"]
R2["Retriever: tech-concepts"]
R3["Retriever: operations+trouble_shooting"]
end
subgraph generator["Generator (FastAPI)"]
G["Build prompt → Groq API call → Return answer"]
end
subgraph db["PostgreSQL"]
D["documents: id, title, content, domain, ..."]
end
Client -->|"POST /query"| orchestrator
orchestrator -->|"route by domain"| R1
orchestrator -->|"route by domain"| R2
orchestrator -->|"route by domain"| R3
orchestrator -->|"POST /generate (contexts, prompt)"| generator
R1 & R2 & R3 -->|"SQL (documents by domain)"| db
- Orchestrator: Single entry point. Classifies the query into a domain, calls the right retriever, then the generator; returns the final answer.
- Retriever: Per-domain (or shared) service. Reads documents from PostgreSQL for its
DOMAIN, embeds them, builds an index, and serves/queryfor semantic search. - Generator: Accepts a list of contexts and the user prompt, builds a RAG prompt, calls the LLM (Groq), returns
{ answer, model, latency_ms }. - PostgreSQL: Stores the
documentstable and acts as the source of truth; schema inretriever/init_db.sql.
├── orchestrator/ # Query routing, domain classification, orchestration
├── retriever/ # Document loading, embedding, /query API
├── generator/ # Prompt building, Groq LLM, /generate API
├── client/ # Optional React + Vite chat UI (proxies to orchestrator)
├── docs/ # Architecture, PostgreSQL, troubleshooting
├── scripts/ # setup-venv.sh, run-all.sh, etc.
└── docker-compose.yml # postgres(pgvector), retriever, generator, orchestrator
-
Orchestrator: Create
orchestrator/.envwith:GENERATOR_URL=http://generator:8000/generateRETRIEVER_OVERVIEW_URL=http://retriever:8000/queryRETRIEVER_CONCEPTS_URL=http://retriever:8000/query
-
Generator: Create
generator/.envwith:GROQ_API_KEY=<your-groq-api-key>MODEL_NAME(optional, default used if not set)
-
Run stack:
docker compose up -d
- Orchestrator: http://localhost:8000
- Generator: http://localhost:8001
- Retriever: http://localhost:8002
- Postgres: localhost:5432
-
Query:
curl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -d '{"query_text": "What is this project about?", "top_k": 5}'
- PostgreSQL: Run Postgres (e.g.
docker compose up -d postgres) and applyretriever/init_db.sqlif needed. - Python env:
./scripts/setup-venv.shthen activate the venv. - Environment: Set
.envinorchestrator/,generator/,retriever/(see each service anddocs/POSTGRES_CONNECTION.md). - Run services:
./scripts/run-all.sh(or run orchestrator, retriever, generator in separate terminals).
- Dev: From
client/, runnpm install && npm run dev(e.g. http://localhost:5173). Point the app at the orchestrator (e.g. Vite proxy/api→http://localhost:8000). - Prod: Build and serve the client (e.g. nginx) and proxy
/apito the orchestrator.
| Component | Key env vars |
|---|---|
| Orchestrator | GENERATOR_URL, RETRIEVER_OVERVIEW_URL, RETRIEVER_CONCEPTS_URL |
| Retriever | DATABASE_URL, DOMAIN (e.g. overview, concepts) |
| Generator | GROQ_API_KEY, MODEL_NAME |
Domain classification uses orchestrator/domain_samples.json: each key is a domain name, and the value is a list of example queries used to compute domain embeddings. The orchestrator embeds the user query and picks the domain with the highest similarity.