Skip to content

adedaramola/stratum

Repository files navigation

Stratum

CI DeepEval

A production-grade domain-specific RAG engine — ask questions against your documents and receive precise, citation-grounded answers.


Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                         RAGPipeline                              │
│                                                                  │
│  Query ──► HybridRetriever ──────────────────────────────────►  │
│               │                                                  │
│               ├── Dense:  EmbedderProtocol → DocumentStore ANN  │
│               ├── Sparse: BM25Okapi (in-store corpus)           │
│               ├── Fuse:   RRF (K=60, Cormack et al. 2009)       │
│               ├── Expand: child → parent context resolution     │
│               └── Rerank: CrossEncoder ms-marco-MiniLM          │
│                                                                  │
│  Chunks ──► CitationGroundedGenerator ──────────────────────►   │
│               │                                                  │
│               ├── Context block: [src N] (source p.PAGE)        │
│               ├── Claude API call with citation-enforcing prompt │
│               └── Parse + validate [src N] markers              │
│                                                                  │
│  ◄──────────────────────────── CitedAnswer (answer + citations) │
└─────────────────────────────────────────────────────────────────┘

Document ingestion:
  Source (PDF / URL / dir) → Loader → HierarchicalChunker
  → BGEEmbedder / OpenAIEmbedder → DocumentStore (Chroma | Weaviate)

Stack

Layer Default Production Rationale
Embedding OpenAI text-embedding-3-small BAAI/bge-large-en-v1.5 Zero-infra default; BGE opt-in via [local-embed]
Vector store Chroma (in-process) Weaviate 1.27.0 Zero Docker for dev; HNSW tuning + gRPC for prod
Retrieval fusion BM25 + ANN → RRF (K=60) Same Parameter-free, robust to miscalibrated retrievers
Re-ranker cross-encoder/ms-marco-MiniLM-L-6-v2 Same Strong precision at low latency (~50ms CPU)
LLM claude-sonnet-4-6 Same Citation-enforcing prompt, structured output
Tracing Langfuse (opt-in no-op) Langfuse cloud Set STRATUM_LANGFUSE_SECRET_KEY to enable; system runs identically without it
API FastAPI + uvicorn Same /query endpoint, /health liveness probe
UI Streamlit Same Chat interface with citation rendering
Eval DeepEval + Ollama judge Same Pytest-native, zero API cost for local judge

Local Quickstart

Prerequisites

  • Python 3.11+
  • An Anthropic API key

1. Clone and install

git clone https://github.com/adedaramola/stratum.git
cd stratum
pip install -e ".[dev]"

2. Configure

# .env (create in the stratum/ directory)
STRATUM_ANTHROPIC_API_KEY=sk-ant-...

# Optional — defaults to OpenAI embeddings if omitted
STRATUM_OPENAI_API_KEY=sk-proj-...

3. Ingest documents

# Single PDF
stratum-ingest --source /path/to/document.pdf

# Directory of PDFs
stratum-ingest --source /path/to/docs/

# Web page
stratum-ingest --source https://example.com/page

4. Start the API

make api
# → FastAPI running at http://localhost:8000
# → Docs at http://localhost:8000/docs

5. Start the UI

make ui
# → Streamlit running at http://localhost:8501

6. Query via Python

from rag.pipeline import build_pipeline
from rag.config import get_settings

pipeline = build_pipeline(get_settings())
result = pipeline.query("What is the purpose of multi-head attention?")
print(result.answer)
for c in result.citations:
    print(f"  [{c.index}] {c.source} p.{c.page}")

7. Query via API

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the purpose of multi-head attention?"}'

Optional Backends

Local embeddings (BGE — no API key required)

pip install -e ".[local-embed]"

# .env
STRATUM_EMBED_BACKEND=local

Requires ~2GB RAM. Uses BAAI/bge-large-en-v1.5 (1024-dim).

Weaviate (production vector store)

make docker-up   # starts Weaviate on localhost:8080

# .env
STRATUM_STORE_BACKEND=weaviate

Re-ingest after switching backends — vectors are not portable between Chroma and Weaviate.

OpenAI eval judge

# .env
STRATUM_EVAL_JUDGE_BACKEND=openai
STRATUM_OPENAI_API_KEY=sk-proj-...

Default judge is local Ollama (llama3.1:8b) at zero API cost.

Langfuse tracing (opt-in)

pip install -e ".[observability]"

# .env
STRATUM_LANGFUSE_PUBLIC_KEY=pk-lf-...
STRATUM_LANGFUSE_SECRET_KEY=sk-lf-...
# Optional — defaults to Langfuse cloud
STRATUM_LANGFUSE_HOST=https://cloud.langfuse.com

When keys are absent the system runs identically — all trace calls are no-ops. Instruments retrieval (dense hits, RRF scores, rerank), generation (prompt tokens, latency), and end-to-end query spans.


AWS Deployment

The terraform/ directory provisions a production-ready deployment on AWS.

Infrastructure

Internet
   │
   ▼
ALB (stratum-prod-alb)
   ├── :80   → FastAPI  (port 8000)
   └── :8501 → Streamlit UI (port 8501)
   │
   ▼
EC2 t3.medium (stratum-prod-api)
   ├── stratum-api.service  (uvicorn, 2 workers)
   └── stratum-ui.service   (streamlit)
   │
   ▼ (private network)
EC2 t3.medium (stratum-prod-weaviate)
   └── Weaviate 1.27.0 (Docker, 20GB EBS data volume)

S3 (stratum-prod-docs-*)
   └── Raw document storage

Requirements

  • AWS CLI configured (aws configure)
  • Terraform >= 1.5
  • An SSH key pair created in AWS EC2

Deploy

cd terraform/

# Copy and fill in secrets
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars — add API keys, key pair name, GitHub token

terraform init
terraform apply

Terraform outputs the live URLs when complete:

api_endpoint = "http://<alb-dns>/query"
api_docs_url = "http://<alb-dns>/docs"
ui_url       = "http://<alb-dns>:8501"

Ingest documents on AWS

# 1. Upload PDF to S3
aws s3 cp my-document.pdf s3://$(terraform output -raw documents_bucket_name)/raw/

# 2. SSH into the API instance
ssh -i ~/.ssh/<key>.pem ec2-user@$(terraform output -raw api_instance_public_ip)

# 3. Download and ingest
sudo aws s3 cp s3://<bucket>/raw/my-document.pdf /opt/stratum/data/raw/my-document.pdf
sudo chown stratum:stratum /opt/stratum/data/raw/my-document.pdf
sudo -u stratum bash -c 'cd /opt/stratum && /opt/stratum/.venv/bin/stratum-ingest \
  --source /opt/stratum/data/raw/my-document.pdf'

# 4. Restart services to pick up the new BM25 corpus
sudo systemctl restart stratum-api stratum-ui

Tear down

terraform destroy

Development Workflow

Command Description
make install Install all dev dependencies + pre-commit hooks
make lint ruff check + format check
make format ruff format + autofix
make typecheck mypy strict mode
make test-unit Unit tests (no network, no API keys)
make test-integration Integration tests (Chroma: no Docker; Weaviate: skipped without Docker)
make eval DeepEval gate (requires Ollama + golden dataset)
make ingest SOURCE=path Ingest a document or directory
make api Start FastAPI on port 8000
make ui Start Streamlit UI on port 8501
make docker-up Start Weaviate service container
make ci Full CI: lint + typecheck + unit tests
make clean Remove build artefacts and caches

Project Layout

stratum/
├── src/rag/                  # All package code (PEP 517 src layout)
│   ├── interfaces/           # typing.Protocol definitions — the public contract
│   ├── ingestion/            # Loaders, chunker, embedders
│   ├── store/                # Chroma, Weaviate, factory
│   ├── retrieval/            # HybridRetriever (BM25 + dense + RRF + rerank)
│   ├── generation/           # CitationGroundedGenerator
│   ├── evaluation/           # DeepEval harness + Ollama judge adapter
│   ├── api/                  # FastAPI app (main.py)
│   ├── scripts/              # CLI entry points (stratum-ingest)
│   ├── config.py             # Pydantic Settings — one object rules all config
│   ├── exceptions.py         # Domain exception hierarchy
│   └── pipeline.py           # RAGPipeline + build_pipeline() factory
├── tests/
│   ├── unit/                 # No network, no Docker — fast (94 tests)
│   ├── integration/          # Chroma: no Docker; Weaviate: skipped without Docker
│   └── e2e/                  # DeepEval gate (weekly CI schedule)
├── terraform/                # AWS infrastructure (EC2, ALB, Weaviate, S3)
├── docs/                     # ADRs and evaluation guide
├── config/                   # eval_thresholds.yaml
├── app.py                    # Streamlit chat UI
└── data/golden/              # Golden QA pairs for DeepEval evaluation

Implementation Phases

Phase Scope Status
Phase 1 Foundation: config, exceptions, interfaces ✅ Complete
Phase 2 Stores, ingestion, retrieval, generation, unit tests ✅ Complete
Phase 3 FastAPI, Streamlit UI, AWS deployment, DeepEval gate ✅ Complete
Phase 4 Golden dataset curation, baseline establishment, eval gate activation 🔄 In progress
Phase 5 Learned fusion weights, streaming responses, multi-tenancy 📋 Planned

Evaluation

DeepEval runs weekly (Monday 06:00 UTC) and on manual trigger — not on every PR. Uses a local Ollama judge by default (zero API cost). See docs/evaluation.md.

# Pull the judge model
make ollama-pull

# Run locally
make eval

ADR Index

# Decision Status
ADR-001 Hierarchical chunking (parent/child) Accepted
ADR-002 Dual store backend (Chroma + Weaviate) Accepted
ADR-003 Separate parent/child collections Accepted
ADR-004 BM25 corpus co-located with vector store Accepted
ADR-005 Drop unstructured dependency Accepted
ADR-006 Citation grounding enforced at generation time Accepted
ADR-007 DeepEval over RAGAS with local Ollama judge Accepted

License

MIT

About

Production-grade RAG system with hybrid retrieval, cross-encoder reranking, and citation-grounded generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors