Stratum

A production-grade domain-specific RAG engine — ask questions against your documents and receive precise, citation-grounded answers.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                         RAGPipeline                              │
│                                                                  │
│  Query ──► HybridRetriever ──────────────────────────────────►  │
│               │                                                  │
│               ├── Dense:  EmbedderProtocol → DocumentStore ANN  │
│               ├── Sparse: BM25Okapi (in-store corpus)           │
│               ├── Fuse:   RRF (K=60, Cormack et al. 2009)       │
│               ├── Expand: child → parent context resolution     │
│               └── Rerank: CrossEncoder ms-marco-MiniLM          │
│                                                                  │
│  Chunks ──► CitationGroundedGenerator ──────────────────────►   │
│               │                                                  │
│               ├── Context block: [src N] (source p.PAGE)        │
│               ├── Claude API call with citation-enforcing prompt │
│               └── Parse + validate [src N] markers              │
│                                                                  │
│  ◄──────────────────────────── CitedAnswer (answer + citations) │
└─────────────────────────────────────────────────────────────────┘

Document ingestion:
  Source (PDF / URL / dir) → Loader → HierarchicalChunker
  → BGEEmbedder / OpenAIEmbedder → DocumentStore (Chroma | Weaviate)

Stack

Layer	Default	Production	Rationale
Embedding	OpenAI text-embedding-3-small	BAAI/bge-large-en-v1.5	Zero-infra default; BGE opt-in via `[local-embed]`
Vector store	Chroma (in-process)	Weaviate 1.27.0	Zero Docker for dev; HNSW tuning + gRPC for prod
Retrieval fusion	BM25 + ANN → RRF (K=60)	Same	Parameter-free, robust to miscalibrated retrievers
Re-ranker	cross-encoder/ms-marco-MiniLM-L-6-v2	Same	Strong precision at low latency (~50ms CPU)
LLM	claude-sonnet-4-6	Same	Citation-enforcing prompt, structured output
Tracing	Langfuse (opt-in no-op)	Langfuse cloud	Set `STRATUM_LANGFUSE_SECRET_KEY` to enable; system runs identically without it
API	FastAPI + uvicorn	Same	`/query` endpoint, `/health` liveness probe
UI	Streamlit	Same	Chat interface with citation rendering
Eval	DeepEval + Ollama judge	Same	Pytest-native, zero API cost for local judge

Local Quickstart

Prerequisites

Python 3.11+
An Anthropic API key

1. Clone and install

git clone https://github.com/adedaramola/stratum.git
cd stratum
pip install -e ".[dev]"

2. Configure

# .env (create in the stratum/ directory)
STRATUM_ANTHROPIC_API_KEY=sk-ant-...

# Optional — defaults to OpenAI embeddings if omitted
STRATUM_OPENAI_API_KEY=sk-proj-...

3. Ingest documents

# Single PDF
stratum-ingest --source /path/to/document.pdf

# Directory of PDFs
stratum-ingest --source /path/to/docs/

# Web page
stratum-ingest --source https://example.com/page

4. Start the API

make api
# → FastAPI running at http://localhost:8000
# → Docs at http://localhost:8000/docs

5. Start the UI

make ui
# → Streamlit running at http://localhost:8501

6. Query via Python

from rag.pipeline import build_pipeline
from rag.config import get_settings

pipeline = build_pipeline(get_settings())
result = pipeline.query("What is the purpose of multi-head attention?")
print(result.answer)
for c in result.citations:
    print(f"  [{c.index}] {c.source} p.{c.page}")

7. Query via API

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the purpose of multi-head attention?"}'

Optional Backends

Local embeddings (BGE — no API key required)

pip install -e ".[local-embed]"

# .env
STRATUM_EMBED_BACKEND=local

Requires ~2GB RAM. Uses BAAI/bge-large-en-v1.5 (1024-dim).

Weaviate (production vector store)

make docker-up   # starts Weaviate on localhost:8080

# .env
STRATUM_STORE_BACKEND=weaviate

Re-ingest after switching backends — vectors are not portable between Chroma and Weaviate.

OpenAI eval judge

# .env
STRATUM_EVAL_JUDGE_BACKEND=openai
STRATUM_OPENAI_API_KEY=sk-proj-...

Default judge is local Ollama (llama3.1:8b) at zero API cost.

Langfuse tracing (opt-in)

pip install -e ".[observability]"

# .env
STRATUM_LANGFUSE_PUBLIC_KEY=pk-lf-...
STRATUM_LANGFUSE_SECRET_KEY=sk-lf-...
# Optional — defaults to Langfuse cloud
STRATUM_LANGFUSE_HOST=https://cloud.langfuse.com

When keys are absent the system runs identically — all trace calls are no-ops. Instruments retrieval (dense hits, RRF scores, rerank), generation (prompt tokens, latency), and end-to-end query spans.

AWS Deployment

The terraform/ directory provisions a production-ready deployment on AWS.

Infrastructure

Internet
   │
   ▼
ALB (stratum-prod-alb)
   ├── :80   → FastAPI  (port 8000)
   └── :8501 → Streamlit UI (port 8501)
   │
   ▼
EC2 t3.medium (stratum-prod-api)
   ├── stratum-api.service  (uvicorn, 2 workers)
   └── stratum-ui.service   (streamlit)
   │
   ▼ (private network)
EC2 t3.medium (stratum-prod-weaviate)
   └── Weaviate 1.27.0 (Docker, 20GB EBS data volume)

S3 (stratum-prod-docs-*)
   └── Raw document storage

Requirements

AWS CLI configured (aws configure)
Terraform >= 1.5
An SSH key pair created in AWS EC2

Deploy

cd terraform/

# Copy and fill in secrets
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars — add API keys, key pair name, GitHub token

terraform init
terraform apply

Terraform outputs the live URLs when complete:

api_endpoint = "http://<alb-dns>/query"
api_docs_url = "http://<alb-dns>/docs"
ui_url       = "http://<alb-dns>:8501"

Ingest documents on AWS

# 1. Upload PDF to S3
aws s3 cp my-document.pdf s3://$(terraform output -raw documents_bucket_name)/raw/

# 2. SSH into the API instance
ssh -i ~/.ssh/<key>.pem ec2-user@$(terraform output -raw api_instance_public_ip)

# 3. Download and ingest
sudo aws s3 cp s3://<bucket>/raw/my-document.pdf /opt/stratum/data/raw/my-document.pdf
sudo chown stratum:stratum /opt/stratum/data/raw/my-document.pdf
sudo -u stratum bash -c 'cd /opt/stratum && /opt/stratum/.venv/bin/stratum-ingest \
  --source /opt/stratum/data/raw/my-document.pdf'

# 4. Restart services to pick up the new BM25 corpus
sudo systemctl restart stratum-api stratum-ui

Tear down

terraform destroy

Development Workflow

Command	Description
`make install`	Install all dev dependencies + pre-commit hooks
`make lint`	ruff check + format check
`make format`	ruff format + autofix
`make typecheck`	mypy strict mode
`make test-unit`	Unit tests (no network, no API keys)
`make test-integration`	Integration tests (Chroma: no Docker; Weaviate: skipped without Docker)
`make eval`	DeepEval gate (requires Ollama + golden dataset)
`make ingest SOURCE=path`	Ingest a document or directory
`make api`	Start FastAPI on port 8000
`make ui`	Start Streamlit UI on port 8501
`make docker-up`	Start Weaviate service container
`make ci`	Full CI: lint + typecheck + unit tests
`make clean`	Remove build artefacts and caches

Project Layout

stratum/
├── src/rag/                  # All package code (PEP 517 src layout)
│   ├── interfaces/           # typing.Protocol definitions — the public contract
│   ├── ingestion/            # Loaders, chunker, embedders
│   ├── store/                # Chroma, Weaviate, factory
│   ├── retrieval/            # HybridRetriever (BM25 + dense + RRF + rerank)
│   ├── generation/           # CitationGroundedGenerator
│   ├── evaluation/           # DeepEval harness + Ollama judge adapter
│   ├── api/                  # FastAPI app (main.py)
│   ├── scripts/              # CLI entry points (stratum-ingest)
│   ├── config.py             # Pydantic Settings — one object rules all config
│   ├── exceptions.py         # Domain exception hierarchy
│   └── pipeline.py           # RAGPipeline + build_pipeline() factory
├── tests/
│   ├── unit/                 # No network, no Docker — fast (94 tests)
│   ├── integration/          # Chroma: no Docker; Weaviate: skipped without Docker
│   └── e2e/                  # DeepEval gate (weekly CI schedule)
├── terraform/                # AWS infrastructure (EC2, ALB, Weaviate, S3)
├── docs/                     # ADRs and evaluation guide
├── config/                   # eval_thresholds.yaml
├── app.py                    # Streamlit chat UI
└── data/golden/              # Golden QA pairs for DeepEval evaluation

Implementation Phases

Phase	Scope	Status
Phase 1	Foundation: config, exceptions, interfaces	✅ Complete
Phase 2	Stores, ingestion, retrieval, generation, unit tests	✅ Complete
Phase 3	FastAPI, Streamlit UI, AWS deployment, DeepEval gate	✅ Complete
Phase 4	Golden dataset curation, baseline establishment, eval gate activation	🔄 In progress
Phase 5	Learned fusion weights, streaming responses, multi-tenancy	📋 Planned

Evaluation

DeepEval runs weekly (Monday 06:00 UTC) and on manual trigger — not on every PR. Uses a local Ollama judge by default (zero API cost). See docs/evaluation.md.

# Pull the judge model
make ollama-pull

# Run locally
make eval

ADR Index

#	Decision	Status
ADR-001	Hierarchical chunking (parent/child)	Accepted
ADR-002	Dual store backend (Chroma + Weaviate)	Accepted
ADR-003	Separate parent/child collections	Accepted
ADR-004	BM25 corpus co-located with vector store	Accepted
ADR-005	Drop `unstructured` dependency	Accepted
ADR-006	Citation grounding enforced at generation time	Accepted
ADR-007	DeepEval over RAGAS with local Ollama judge	Accepted

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
.streamlit		.streamlit
config		config
data/golden		data/golden
docs		docs
src/rag		src/rag
terraform		terraform
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Makefile		Makefile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Stratum

Architecture Overview

Stack

Local Quickstart

Prerequisites

1. Clone and install

2. Configure

3. Ingest documents

4. Start the API

5. Start the UI

6. Query via Python

7. Query via API

Optional Backends

Local embeddings (BGE — no API key required)

Weaviate (production vector store)

OpenAI eval judge

Langfuse tracing (opt-in)

AWS Deployment

Infrastructure

Requirements

Deploy

Ingest documents on AWS

Tear down

Development Workflow

Project Layout

Implementation Phases

Evaluation

ADR Index

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages