A complete, production-grade reference implementation of modern Retrieval-Augmented Generation systems for 2025 and beyond.
This repository accompanies the article series "The Enterprise RAG Engineering Playbook (2025 Edition)" and provides modular, end-to-end components for building high-accuracy, scalable, and secure RAG pipelines.
Build production-grade systems for:
- Enterprise Search β Find information across thousands of documents
- Compliance Assistants β Policy Q&A with audit trails
- Legal Reasoning Tools β Contract analysis with citation
- Customer Support Copilots β Grounded answers from knowledge bases
- Multi-hop Knowledge Systems β Complex queries requiring graph traversal
This repository includes reference implementations of all core RAG subsystems:
| # | Module | Description |
|---|---|---|
| 1 | Document Ingestion | OCR cleanup, boilerplate removal, metadata extraction |
| 2 | Semantic Chunking | Title-aware, section-aware, token-budget segmentation |
| 3 | Embeddings | Versioning, drift detection, deterministic preprocessing |
| 4 | Hybrid Retrieval | Vector (HNSW) + BM25 + metadata filtering + score fusion |
| 5 | Reranking | Cross-encoder reranking, batch inference, deduplication |
| 6 | Context Assembly | Structured prompts, token budgeting, inline citations |
| 7 | Verification | Deterministic checks + LLM critic validation |
| 8 | Graph-RAG | Entity linking, knowledge graph traversal, multi-hop evidence |
| 9 | Monitoring | Recall@k, latency dashboards, drift detection |
| 10 | Deployment | FastAPI, circuit breakers, Docker, Kubernetes |
| 11 | Security | ACL filtering, PII redaction, audit logging |
| 12 | Cost Optimization | Model routing, caching, GPU batching |
enterprise-rag-stack/
β
βββ README.md
βββ LICENSE
βββ requirements.txt
β
βββ docs/
β βββ architecture_overview.md
β βββ diagrams/
β βββ article_series_index.md
β
βββ config/
β βββ embedding_config.yaml
β βββ retrieval_config.yaml
β βββ model_routing.yaml
β βββ index_config.yaml
β βββ reranker_config.yaml
β βββ logging.yaml
β
βββ data/
β βββ sample_documents/
β βββ golden_eval_set.json
β βββ pii_redaction_rules.json
β
βββ ingestion/ # Document normalization pipeline
β βββ normalize.py
β βββ ingest_pipeline.py
β βββ dedupe.py
β
βββ chunking/ # Semantic chunking strategies
β βββ semantic_chunker.py
β βββ sentence_splitter.py
β βββ chunk_eval_tools.py
β
βββ embeddings/ # Embedding with versioning
β βββ embedder.py
β βββ drift_monitor.py
β
βββ retrieval/ # Hybrid retrieval system
β βββ hybrid_retriever.py
β βββ vector_retriever.py
β βββ lexical_retriever.py
β βββ score_fusion.py
β
βββ reranking/ # Cross-encoder reranking
β βββ cross_encoder_reranker.py
β βββ batch_reranking.py
β
βββ context/ # Context assembly & prompts
β βββ context_builder.py
β βββ context_budgeting.py
β βββ prompt_templates/
β
βββ verification/ # Answer verification
β βββ deterministic_checks.py
β βββ critic_llm.py
β βββ verification_pipeline.py
β
βββ graph_rag/ # Knowledge graph integration
β βββ entity_linking.py
β βββ graph_builder.py
β βββ graph_traversal.py
β βββ kg_summarizer.py
β
βββ monitoring/ # Evaluation & metrics
β βββ eval_runner.py
β βββ recall_metrics.py
β βββ latency_metrics.py
β βββ evaluation_schema.md
β
βββ deployment/ # FastAPI, Docker, K8s
β βββ fastapi_app.py
β βββ circuit_breaker.py
β βββ docker/
β β βββ Dockerfile
β β βββ docker-compose.yaml
β βββ k8s/
β
βββ security/ # ACLs, PII, audit
β βββ pii_redaction.py
β βββ acl_filters.py
β βββ audit_logging.py
β
βββ cost_optimization/ # Model routing, caching
βββ model_router.py
βββ caching_layer.py
βββ gpu_batcher.py
βββ unit_economics_calculator.py
# Clone the repository
git clone https://github.com/AdnanSattar/enterprise-rag-stack.git
cd enterprise-rag-stack
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export OPENAI_API_KEY="your-api-key"
export CHROMA_PATH="./data/chroma"# Ingest sample documents
python ingestion/ingest_pipeline.py --input data/sample_documents --tenant default
# Run hybrid retriever
python retrieval/hybrid_retriever.py --query "What are the payment terms?"# Run FastAPI server
python deployment/fastapi_app.py
# Or with uvicorn
uvicorn deployment.fastapi_app:app --reload --port 8000# Build and start all services
docker-compose up -d
# Check logs
docker-compose logs -f rag-apiAccuracy = Quality(Ingestion) Γ Recall(Retrieval) Γ Precision(Reranking) Γ Grounding(Verification)
score = vector_weight Γ s_vector + lexical_weight Γ s_bm25Why hybrid? Pure vector fails for keyword-heavy queries (SKU numbers, codes). BM25 captures exact matches.
| Level | Method | What It Catches |
|---|---|---|
| 1 | Deterministic | Numbers, dates, currencies not in context |
| 2 | LLM Critic | Unsupported claims, hallucinations |
# CRITICAL: When model changes, re-index everything
index_name = f"docs_v1_{model_slug}_{embedding_version}"ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Enterprise RAG Stack β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β Ingest ββββΆβ Chunk ββββΆβ Embed ββββΆβ Index β β
β β & Clean β βSemantic β β Version β β HNSW β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β Query ββββΆβ Hybrid ββββΆβ Rerank ββββΆβ Context β β
β βClassify β βRetrieve β β Top-K β βAssemble β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β
β βββββββββββ βββββββββββ βββββββββββ β
β β LLM ββββΆβ Verify ββββΆβResponse β β
β βGenerate β β Answer β β + Cache β β
β βββββββββββ βββββββββββ βββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Metric | Target | Alert Threshold |
|---|---|---|
| Recall@5 | > 0.85 | < 0.75 |
| Precision@5 | > 0.70 | < 0.60 |
| P95 Latency | < 2000ms | > 5000ms |
| Hallucination Rate | < 5% | > 10% |
| Cache Hit Rate | > 40% | < 20% |
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key | Required |
CHROMA_PATH |
ChromaDB storage path | ./data/chroma |
COLLECTION_NAME |
Vector collection name | docs_v1 |
REDIS_URL |
Redis connection URL | redis://localhost:6379 |
LOG_LEVEL |
Logging level | INFO |
config/embedding_config.yamlβ Embedding model, versioning, drift detectionconfig/retrieval_config.yamlβ Hybrid weights, BM25 params, rerankingconfig/model_routing.yamlβ LLM selection based on query complexityconfig/index_config.yamlβ Vector store and ANN index settings (HNSW/IVF)config/reranker_config.yamlβ Cross-encoder and lightweight reranking configconfig/logging.yamlβ Structured logging configuration
This repository accompanies "The Enterprise RAG Engineering Playbook" article series on Medium:
| Part | Title | Link | Modules Covered |
|---|---|---|---|
| Part 1 | Building a High-Quality Retrieval Foundation | Read on Medium | ingestion/, chunking/, embeddings/ |
| Part 2 | High Recall Retrieval and Precision Reranking | Read on Medium | retrieval/, reranking/ |
| Part 3 | Context Assembly and Grounded Prompting | Read on Medium | context/ |
| Part 4 | Verification Layers and Graph-RAG | Read on Medium | verification/, graph_rag/ |
| Part 5 | Monitoring, Evaluation and Lifecycle Management | Read on Medium | monitoring/, deployment/ |
| Part 6 | Security, Compliance and Cost Optimization | Read on Medium | security/, cost_optimization/ |
π Full article series index: docs/article_series_index.md
from monitoring.eval_runner import RAGEvaluator
from data.golden_eval_set import load_eval_set
evaluator = RAGEvaluator(pipeline=my_pipeline)
results = evaluator.run_evaluation(load_eval_set())
print(f"Recall@5: {results.mean_recall:.3f}")
print(f"P95 Latency: {results.p95_latency_ms:.1f}ms")# Run against golden evaluation set
python monitoring/eval_runner.py --eval-set data/golden_eval_set.json- Row-level ACLs via metadata filters
- PII redaction before indexing
- Namespace isolation for multi-tenancy
- Audit logging for compliance
- Encrypted connections (TLS)
Export metrics to Prometheus/Grafana:
curl http://localhost:8000/metricsKey metrics:
rag_query_latency_msβ End-to-end latencyrag_retrieval_recallβ Recall@krag_cache_hit_rateβ Cache effectivenessrag_hallucination_rateβ Verification failures
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Please ensure:
- Code is formatted with
black - Type hints are included
- Tests pass
- Documentation is updated
This project is licensed under the MIT License - see the LICENSE file for details.
Enterprise RAG Stack β Built with β€οΈ for production RAG systems
Companion code for the Medium article series: The Enterprise RAG Engineering Playbook