A production-grade, multi-tenant Retrieval-Augmented Generation (RAG) platform built on a microservices architecture. The system integrates:
- Weaviate for scalable vector storage and similarity search.
- ArangoDB for managing and querying knowledge graphs, enabling rich semantic relationships and context-aware retrieval.
- Agentic orchestration for coordinating retrieval, reasoning, and response generation across services and tenants.
The architecture ensures isolation and scalability for multiple tenants, supports domain-specific customization, and is designed for high availability, observability, and extensibility in enterprise environments.
- Watch Novus System Demo on YouTube - Complete walkthrough of the Novus Enterprise RAG System features and capabilities
- Novus RAG Factory Presentation - Comprehensive slide deck covering the Novus RAG system architecture, features, and implementation details
- Novus System Documentation (PDF) - Complete technical documentation for the Novus Enterprise RAG System
- Services Documentation - Detailed documentation for all microservices
- System Scripts Guide - Complete guide for system management scripts
- Design Document - Architectural design and technical specifications
- Monitoring Guide - Observability and monitoring setup
- Event-driven communication via Kafka
- Independent scaling for each service
- Fault isolation and resilience
- Blue-green deployments with zero downtime
- Streaming processing for large files (>1GB)
- Semantic chunking with hierarchical structure
- Table extraction and OCR support
- Content classification and entity extraction
- Multi-model support (OpenAI + local models)
- GC-PROD meta embeddings for graph-aware retrieval
- Progressive distillation framework
- Automatic fallback and quality validation
- Version-aware routing with atomic switching
- Dynamic Weaviate collections based on content type
- Blue-green indexing for updates
- Automatic cleanup and retention policies
- Vector search (dense embeddings)
- Sparse retrieval (BM25/SPLADE)
- Knowledge graph traversal
- Cross-encoder reranking
- LLM synthesis with provenance
- Prometheus metrics for all services with per-service resource monitoring
- Grafana dashboards for visualization with pre-built production dashboards
- Distributed tracing with Jaeger and OpenTelemetry integration
- Centralized logging with Loki and trace correlation
- cAdvisor for container-level infrastructure metrics
- Pushgateway for Kafka worker metrics
- Comprehensive alerting with 50+ production-ready alert rules
- Real-time monitoring of CPU, memory, disk I/O, and network I/O per service
- Query stage breakdown for performance optimization (vector DB, embedding, LLM)
- Health checks and automated incident detection
- Docker & Docker Compose
- 16GB+ RAM recommended (8GB minimum)
- 50GB free disk space for data and metrics storage
- OpenAI API key (optional, for best performance)
git clone <repository-url>
cd Novus
export OPENAI_API_KEY="your-openai-api-key-here"
# If enterprise OPENAI API KEY is used then add the OPENAI_API_URL
export OPENAI_API_URL="your-openai-api-url-here"Create persistent storage directories for all services:
# Create all data directories for services and monitoring
mkdir -p data/{weaviate,arangodb,arangodb-apps,opensearch,redis,mongo,minio,upload-temp,prometheus,grafana,loki,pushgateway}
# Set appropriate permissions
chmod -R 777 data/Data Directory Structure:
data/weaviate- Vector database storagedata/arangodb- Knowledge graph databasedata/arangodb-apps- ArangoDB applicationsdata/opensearch- Full-text search indexdata/redis- Cache and session datadata/mongo- User data and metadatadata/minio- Object storage for PDFsdata/upload-temp- Temporary upload filesdata/prometheus- Metrics time-series datadata/grafana- Dashboards and visualizationsdata/loki- Log aggregation storagedata/pushgateway- Kafka worker metrics
# Start with existing images
./start-system.sh
# Or rebuild and start (first time or after code changes)
./start-system.sh --build
# Clean rebuild (fresh start, removes all data)
./start-system.sh --clean --build--build- Force rebuild all Docker images--clean- Clean all volumes and data (WARNING: destructive)--no-monitoring- Skip monitoring stack (Prometheus, Grafana, etc.)--no-chat- Skip chat UI--dev- Development mode (verbose logging)--help- Show help message
The script starts services in the correct order:
Infrastructure (started first):
- Zookeeper & Kafka (Message Queue)
- Redis (Cache)
- MongoDB (Auth & Metadata)
- MinIO (Object Storage)
- Weaviate (Vector Database)
- ArangoDB (Graph Database)
- OpenSearch (Search Engine)
Core Services:
- API Gateway (Port 8000)
- Upload Service (Port 8001)
- Query Service (Port 8002)
- Collection Registry (Port 8003)
- PDF Processing Service (Port 8004)
- Embedding Service
- Weaviate Manager
- ArangoDB Updater
- Memory Service (Port 8005)
Monitoring (optional):
- Prometheus (Port 9090)
- Grafana (Port 3000)
- Jaeger (Port 16686)
- cAdvisor, Pushgateway, Loki, Promtail
UI (optional):
- Novus Chat (Port 4200)
# Quick status overview
./system-status.sh
# Detailed health checks
./system-status.sh --health
# View resource usage
./system-status.sh --metrics
# View logs for a specific service
./system-status.sh --logs api-gateway# Graceful shutdown
./stop-system.sh
# Force stop
./stop-system.sh --force
# Stop and remove all data
./stop-system.sh --clean# Scale to 3 workers for faster processing
./scale-pdf-workers.sh 3
# Scale back to 1 worker
./scale-pdf-workers.sh 1# Restart a specific service
docker-compose restart api-gateway
# View logs for all services
docker-compose logs -f
# View logs for specific service
docker-compose logs -f embedding-service
# Check running containers
docker-compose ps
# Execute command in container
docker exec -it api-gateway bash- Chat UI: http://localhost:4200
- API Gateway: http://localhost:8000
- Grafana Monitoring: http://localhost:3000 (admin/admin)
- Prometheus Metrics: http://localhost:9090
- Jaeger Tracing: http://localhost:16686
- Upload Service: http://localhost:8001
curl -X POST "http://localhost:8001/upload" \
-F "file=@your-document.pdf" \
-F "tenant_id=demo" \
-F "topic=Device" \
-F "device=TechCorp8000"# OpenAI Configuration
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o-mini"
export EMBEDDER="text-embedding-3-large"
# Service URLs (auto-configured in Docker)
export WEAVIATE_URL="http://weaviate:8080"
export ARANGO_URL="http://arangodb:8529"
export KAFKA_BOOTSTRAP_SERVERS="kafka:9092"Each service has its own configuration file:
- Upload Service:
services/upload-service/app/config.py - PDF Processing:
services/pdf-processing-service/app/config.py - Embedding Service:
services/embedding-service/app/config.py - And so on...
The Novus system includes a comprehensive three-pillar observability solution:
- Metrics (Prometheus) - Quantitative system measurements
- Traces (Jaeger) - Request flow through distributed services
- Logs (Loki) - Detailed event records with trace correlation
Access Grafana at http://localhost:3000 (admin/admin):
- Production Overview: System-wide health, request rates, error rates, and latency
- Query Service Performance: Detailed query stage breakdown (vector DB, embedding, LLM)
- Ingestion Pipeline: Document processing throughput and worker metrics
- Resource Utilization: CPU, memory, disk I/O, and network I/O per service
- Tracing Correlation: Link metrics, logs, and traces for root cause analysis
Access Prometheus at http://localhost:9090:
- Per-service resource metrics (CPU, memory, I/O)
- HTTP request metrics (rate, latency, errors)
- Query stage-level metrics
- Database operation metrics
- Container metrics from cAdvisor
- Kafka worker metrics via Pushgateway
Access Jaeger at http://localhost:16686:
- Trace complete request flows across all services
- Identify performance bottlenecks
- Correlate traces with metrics and logs via trace_id
- Analyze query stage latencies in detail
# View all service logs
docker-compose logs -f
# View specific service logs
docker-compose logs -f query-service
# Search logs with trace correlation in Loki
# Access via Grafana β Explore β Loki
# Query: {service_name="query-service"} |= "trace:"
# Follow processing pipeline
docker-compose logs -f pdf-processing-service embedding-service weaviate-manager50+ production-ready alert rules configured in monitoring/alerting-rules.yml:
- Service health alerts (down, high error rate)
- Resource alerts (CPU, memory, disk I/O)
- Performance alerts (high latency, slow queries)
- Pipeline alerts (backlog, processing failures)
- Business metrics alerts (low throughput, high failure rate)
- Quick Start:
monitoring/README.md- Overview and setup - Observability Guide:
monitoring/docs/OBSERVABILITY_GUIDE.md- Complete architecture and usage - Metrics Reference:
monitoring/docs/METRICS_REFERENCE.md- Complete metrics catalog - Jaeger Tracing:
monitoring/docs/JAEGER_GUIDE.md- Distributed tracing setup
# Scale processing services for high load
docker-compose up -d --scale pdf-processing-service=3
docker-compose up -d --scale embedding-service=5
# Monitor resource utilization in Grafana to guide scaling decisionsPOST /upload
Content-Type: multipart/form-data
Parameters:
- file: PDF file
- tenant_id: Tenant identifier
- topic: Document topic (optional)
- device: Device type (optional)
- version: Version (optional)POST /query
Content-Type: application/json
{
"query": "How to configure BGP on TechCorp 8000?",
"tenant_id": "demo",
"filters": {
"device": "TechCorp8000",
"topic": "Configuration"
}
}# List collections
GET /collections?tenant_id=demo
# Get collection info
GET /collections/{logical_name}
# Resolve collection
GET /resolve/{logical_name}- Create service directory in
services/ - Implement with FastAPI or async worker pattern
- Add to
docker-compose.yaml - Update monitoring configuration
- Add model configuration to
embedding-service/app/config.py - Implement model loader in
embedding_generator.py - Update dimension mappings
- Add entity extractors in
arangodb-updater/ - Define new node/edge types
- Update graph traversal queries
- JWT-based authentication via API Gateway
- Tenant isolation at all levels
- Role-based access control (RBAC)
- Client-side embedding option for sensitive data
- PII detection and redaction
- Data residency controls
- Immutable audit logs for all operations
- Provenance tracking for all answers
- Right-to-be-forgotten support
- HNSW index tuning:
ef_construction,ef_search,max_connections - Batch size optimization for embeddings
- Collection sharding strategies
- ArangoDB SmartGraphs for large datasets
- Query optimization with indexes
- Caching frequent traversals
- Parallel PDF processing workers
- Embedding batch size tuning
- Memory management for large documents
# Check all service health
curl http://localhost:8000/health
curl http://localhost:8001/health
curl http://localhost:8002/health
curl http://localhost:8003/health- Built on top of excellent open-source projects:
- Weaviate - Vector database
- ArangoDB - Multi-model database
- LangChain - LLM framework
- FastAPI - Web framework
- Apache Kafka - Message streaming

