A production-grade domain-specific RAG engine — ask questions against your documents and receive precise, citation-grounded answers.
┌─────────────────────────────────────────────────────────────────┐
│ RAGPipeline │
│ │
│ Query ──► HybridRetriever ──────────────────────────────────► │
│ │ │
│ ├── Dense: EmbedderProtocol → DocumentStore ANN │
│ ├── Sparse: BM25Okapi (in-store corpus) │
│ ├── Fuse: RRF (K=60, Cormack et al. 2009) │
│ ├── Expand: child → parent context resolution │
│ └── Rerank: CrossEncoder ms-marco-MiniLM │
│ │
│ Chunks ──► CitationGroundedGenerator ──────────────────────► │
│ │ │
│ ├── Context block: [src N] (source p.PAGE) │
│ ├── Claude API call with citation-enforcing prompt │
│ └── Parse + validate [src N] markers │
│ │
│ ◄──────────────────────────── CitedAnswer (answer + citations) │
└─────────────────────────────────────────────────────────────────┘
Document ingestion:
Source (PDF / URL / dir) → Loader → HierarchicalChunker
→ BGEEmbedder / OpenAIEmbedder → DocumentStore (Chroma | Weaviate)
| Layer | Default | Production | Rationale |
|---|---|---|---|
| Embedding | OpenAI text-embedding-3-small | BAAI/bge-large-en-v1.5 | Zero-infra default; BGE opt-in via [local-embed] |
| Vector store | Chroma (in-process) | Weaviate 1.27.0 | Zero Docker for dev; HNSW tuning + gRPC for prod |
| Retrieval fusion | BM25 + ANN → RRF (K=60) | Same | Parameter-free, robust to miscalibrated retrievers |
| Re-ranker | cross-encoder/ms-marco-MiniLM-L-6-v2 | Same | Strong precision at low latency (~50ms CPU) |
| LLM | claude-sonnet-4-6 | Same | Citation-enforcing prompt, structured output |
| Tracing | Langfuse (opt-in no-op) | Langfuse cloud | Set STRATUM_LANGFUSE_SECRET_KEY to enable; system runs identically without it |
| API | FastAPI + uvicorn | Same | /query endpoint, /health liveness probe |
| UI | Streamlit | Same | Chat interface with citation rendering |
| Eval | DeepEval + Ollama judge | Same | Pytest-native, zero API cost for local judge |
- Python 3.11+
- An Anthropic API key
git clone https://github.com/adedaramola/stratum.git
cd stratum
pip install -e ".[dev]"# .env (create in the stratum/ directory)
STRATUM_ANTHROPIC_API_KEY=sk-ant-...
# Optional — defaults to OpenAI embeddings if omitted
STRATUM_OPENAI_API_KEY=sk-proj-...# Single PDF
stratum-ingest --source /path/to/document.pdf
# Directory of PDFs
stratum-ingest --source /path/to/docs/
# Web page
stratum-ingest --source https://example.com/pagemake api
# → FastAPI running at http://localhost:8000
# → Docs at http://localhost:8000/docsmake ui
# → Streamlit running at http://localhost:8501from rag.pipeline import build_pipeline
from rag.config import get_settings
pipeline = build_pipeline(get_settings())
result = pipeline.query("What is the purpose of multi-head attention?")
print(result.answer)
for c in result.citations:
print(f" [{c.index}] {c.source} p.{c.page}")curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the purpose of multi-head attention?"}'pip install -e ".[local-embed]"
# .env
STRATUM_EMBED_BACKEND=localRequires ~2GB RAM. Uses BAAI/bge-large-en-v1.5 (1024-dim).
make docker-up # starts Weaviate on localhost:8080
# .env
STRATUM_STORE_BACKEND=weaviateRe-ingest after switching backends — vectors are not portable between Chroma and Weaviate.
# .env
STRATUM_EVAL_JUDGE_BACKEND=openai
STRATUM_OPENAI_API_KEY=sk-proj-...Default judge is local Ollama (llama3.1:8b) at zero API cost.
pip install -e ".[observability]"
# .env
STRATUM_LANGFUSE_PUBLIC_KEY=pk-lf-...
STRATUM_LANGFUSE_SECRET_KEY=sk-lf-...
# Optional — defaults to Langfuse cloud
STRATUM_LANGFUSE_HOST=https://cloud.langfuse.comWhen keys are absent the system runs identically — all trace calls are no-ops. Instruments retrieval (dense hits, RRF scores, rerank), generation (prompt tokens, latency), and end-to-end query spans.
The terraform/ directory provisions a production-ready deployment on AWS.
Internet
│
▼
ALB (stratum-prod-alb)
├── :80 → FastAPI (port 8000)
└── :8501 → Streamlit UI (port 8501)
│
▼
EC2 t3.medium (stratum-prod-api)
├── stratum-api.service (uvicorn, 2 workers)
└── stratum-ui.service (streamlit)
│
▼ (private network)
EC2 t3.medium (stratum-prod-weaviate)
└── Weaviate 1.27.0 (Docker, 20GB EBS data volume)
S3 (stratum-prod-docs-*)
└── Raw document storage
- AWS CLI configured (
aws configure) - Terraform >= 1.5
- An SSH key pair created in AWS EC2
cd terraform/
# Copy and fill in secrets
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars — add API keys, key pair name, GitHub token
terraform init
terraform applyTerraform outputs the live URLs when complete:
api_endpoint = "http://<alb-dns>/query"
api_docs_url = "http://<alb-dns>/docs"
ui_url = "http://<alb-dns>:8501"
# 1. Upload PDF to S3
aws s3 cp my-document.pdf s3://$(terraform output -raw documents_bucket_name)/raw/
# 2. SSH into the API instance
ssh -i ~/.ssh/<key>.pem ec2-user@$(terraform output -raw api_instance_public_ip)
# 3. Download and ingest
sudo aws s3 cp s3://<bucket>/raw/my-document.pdf /opt/stratum/data/raw/my-document.pdf
sudo chown stratum:stratum /opt/stratum/data/raw/my-document.pdf
sudo -u stratum bash -c 'cd /opt/stratum && /opt/stratum/.venv/bin/stratum-ingest \
--source /opt/stratum/data/raw/my-document.pdf'
# 4. Restart services to pick up the new BM25 corpus
sudo systemctl restart stratum-api stratum-uiterraform destroy| Command | Description |
|---|---|
make install |
Install all dev dependencies + pre-commit hooks |
make lint |
ruff check + format check |
make format |
ruff format + autofix |
make typecheck |
mypy strict mode |
make test-unit |
Unit tests (no network, no API keys) |
make test-integration |
Integration tests (Chroma: no Docker; Weaviate: skipped without Docker) |
make eval |
DeepEval gate (requires Ollama + golden dataset) |
make ingest SOURCE=path |
Ingest a document or directory |
make api |
Start FastAPI on port 8000 |
make ui |
Start Streamlit UI on port 8501 |
make docker-up |
Start Weaviate service container |
make ci |
Full CI: lint + typecheck + unit tests |
make clean |
Remove build artefacts and caches |
stratum/
├── src/rag/ # All package code (PEP 517 src layout)
│ ├── interfaces/ # typing.Protocol definitions — the public contract
│ ├── ingestion/ # Loaders, chunker, embedders
│ ├── store/ # Chroma, Weaviate, factory
│ ├── retrieval/ # HybridRetriever (BM25 + dense + RRF + rerank)
│ ├── generation/ # CitationGroundedGenerator
│ ├── evaluation/ # DeepEval harness + Ollama judge adapter
│ ├── api/ # FastAPI app (main.py)
│ ├── scripts/ # CLI entry points (stratum-ingest)
│ ├── config.py # Pydantic Settings — one object rules all config
│ ├── exceptions.py # Domain exception hierarchy
│ └── pipeline.py # RAGPipeline + build_pipeline() factory
├── tests/
│ ├── unit/ # No network, no Docker — fast (94 tests)
│ ├── integration/ # Chroma: no Docker; Weaviate: skipped without Docker
│ └── e2e/ # DeepEval gate (weekly CI schedule)
├── terraform/ # AWS infrastructure (EC2, ALB, Weaviate, S3)
├── docs/ # ADRs and evaluation guide
├── config/ # eval_thresholds.yaml
├── app.py # Streamlit chat UI
└── data/golden/ # Golden QA pairs for DeepEval evaluation
| Phase | Scope | Status |
|---|---|---|
| Phase 1 | Foundation: config, exceptions, interfaces | ✅ Complete |
| Phase 2 | Stores, ingestion, retrieval, generation, unit tests | ✅ Complete |
| Phase 3 | FastAPI, Streamlit UI, AWS deployment, DeepEval gate | ✅ Complete |
| Phase 4 | Golden dataset curation, baseline establishment, eval gate activation | 🔄 In progress |
| Phase 5 | Learned fusion weights, streaming responses, multi-tenancy | 📋 Planned |
DeepEval runs weekly (Monday 06:00 UTC) and on manual trigger — not on every PR. Uses a local Ollama judge by default (zero API cost). See docs/evaluation.md.
# Pull the judge model
make ollama-pull
# Run locally
make eval| # | Decision | Status |
|---|---|---|
| ADR-001 | Hierarchical chunking (parent/child) | Accepted |
| ADR-002 | Dual store backend (Chroma + Weaviate) | Accepted |
| ADR-003 | Separate parent/child collections | Accepted |
| ADR-004 | BM25 corpus co-located with vector store | Accepted |
| ADR-005 | Drop unstructured dependency |
Accepted |
| ADR-006 | Citation grounding enforced at generation time | Accepted |
| ADR-007 | DeepEval over RAGAS with local Ollama judge | Accepted |
MIT