Skip to content

SunilMohan13/Novus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Novus Enterprise RAG System

A production-grade, multi-tenant Retrieval-Augmented Generation (RAG) platform built on a microservices architecture. The system integrates:

  • Weaviate for scalable vector storage and similarity search.
  • ArangoDB for managing and querying knowledge graphs, enabling rich semantic relationships and context-aware retrieval.
  • Agentic orchestration for coordinating retrieval, reasoning, and response generation across services and tenants.

The architecture ensures isolation and scalability for multiple tenants, supports domain-specific customization, and is designed for high availability, observability, and extensibility in enterprise environments.

πŸ“š Documentation

Video Demo

Slide Deck & Presentations

Technical Documentation

Additional Resources

πŸ—οΈ Novus Architecture Overview

Novus Architecture Overview

Novus RAG Architecture Overview

Novus RAG Architecture Overview

✨ Key Features

πŸ”„ Microservices Architecture

  • Event-driven communication via Kafka
  • Independent scaling for each service
  • Fault isolation and resilience
  • Blue-green deployments with zero downtime

πŸ“„ Advanced PDF Processing

  • Streaming processing for large files (>1GB)
  • Semantic chunking with hierarchical structure
  • Table extraction and OCR support
  • Content classification and entity extraction

🧠 Intelligent Embeddings

  • Multi-model support (OpenAI + local models)
  • GC-PROD meta embeddings for graph-aware retrieval
  • Progressive distillation framework
  • Automatic fallback and quality validation

πŸ—ƒοΈ Dynamic Collection Management

  • Version-aware routing with atomic switching
  • Dynamic Weaviate collections based on content type
  • Blue-green indexing for updates
  • Automatic cleanup and retention policies

πŸ” Multi-Modal Retrieval

  • Vector search (dense embeddings)
  • Sparse retrieval (BM25/SPLADE)
  • Knowledge graph traversal
  • Cross-encoder reranking
  • LLM synthesis with provenance

πŸ“Š Production-Ready Observability

  • Prometheus metrics for all services with per-service resource monitoring
  • Grafana dashboards for visualization with pre-built production dashboards
  • Distributed tracing with Jaeger and OpenTelemetry integration
  • Centralized logging with Loki and trace correlation
  • cAdvisor for container-level infrastructure metrics
  • Pushgateway for Kafka worker metrics
  • Comprehensive alerting with 50+ production-ready alert rules
  • Real-time monitoring of CPU, memory, disk I/O, and network I/O per service
  • Query stage breakdown for performance optimization (vector DB, embedding, LLM)
  • Health checks and automated incident detection

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • 16GB+ RAM recommended (8GB minimum)
  • 50GB free disk space for data and metrics storage
  • OpenAI API key (optional, for best performance)

1. Clone and Setup

git clone <repository-url>
cd Novus
export OPENAI_API_KEY="your-openai-api-key-here"
# If enterprise OPENAI API KEY is used then add the OPENAI_API_URL
export OPENAI_API_URL="your-openai-api-url-here"

2. Create Data Directories

Create persistent storage directories for all services:

# Create all data directories for services and monitoring
mkdir -p data/{weaviate,arangodb,arangodb-apps,opensearch,redis,mongo,minio,upload-temp,prometheus,grafana,loki,pushgateway}

# Set appropriate permissions
chmod -R 777 data/

Data Directory Structure:

  • data/weaviate - Vector database storage
  • data/arangodb - Knowledge graph database
  • data/arangodb-apps - ArangoDB applications
  • data/opensearch - Full-text search index
  • data/redis - Cache and session data
  • data/mongo - User data and metadata
  • data/minio - Object storage for PDFs
  • data/upload-temp - Temporary upload files
  • data/prometheus - Metrics time-series data
  • data/grafana - Dashboards and visualizations
  • data/loki - Log aggregation storage
  • data/pushgateway - Kafka worker metrics

3. Start the System

Quick Start

# Start with existing images
./start-system.sh

# Or rebuild and start (first time or after code changes)
./start-system.sh --build

# Clean rebuild (fresh start, removes all data)
./start-system.sh --clean --build

Start Options

  • --build - Force rebuild all Docker images
  • --clean - Clean all volumes and data (WARNING: destructive)
  • --no-monitoring - Skip monitoring stack (Prometheus, Grafana, etc.)
  • --no-chat - Skip chat UI
  • --dev - Development mode (verbose logging)
  • --help - Show help message

What Gets Started

The script starts services in the correct order:

Infrastructure (started first):

  • Zookeeper & Kafka (Message Queue)
  • Redis (Cache)
  • MongoDB (Auth & Metadata)
  • MinIO (Object Storage)
  • Weaviate (Vector Database)
  • ArangoDB (Graph Database)
  • OpenSearch (Search Engine)

Core Services:

  • API Gateway (Port 8000)
  • Upload Service (Port 8001)
  • Query Service (Port 8002)
  • Collection Registry (Port 8003)
  • PDF Processing Service (Port 8004)
  • Embedding Service
  • Weaviate Manager
  • ArangoDB Updater
  • Memory Service (Port 8005)

Monitoring (optional):

  • Prometheus (Port 9090)
  • Grafana (Port 3000)
  • Jaeger (Port 16686)
  • cAdvisor, Pushgateway, Loki, Promtail

UI (optional):

  • Novus Chat (Port 4200)

4. System Management

Check System Status

# Quick status overview
./system-status.sh

# Detailed health checks
./system-status.sh --health

# View resource usage
./system-status.sh --metrics

# View logs for a specific service
./system-status.sh --logs api-gateway

Stop the System

# Graceful shutdown
./stop-system.sh

# Force stop
./stop-system.sh --force

# Stop and remove all data
./stop-system.sh --clean

Scale PDF Workers

# Scale to 3 workers for faster processing
./scale-pdf-workers.sh 3

# Scale back to 1 worker
./scale-pdf-workers.sh 1

Common Operations

# Restart a specific service
docker-compose restart api-gateway

# View logs for all services
docker-compose logs -f

# View logs for specific service
docker-compose logs -f embedding-service

# Check running containers
docker-compose ps

# Execute command in container
docker exec -it api-gateway bash

5. Access the System

6. Upload Documents

curl -X POST "http://localhost:8001/upload" \
  -F "file=@your-document.pdf" \
  -F "tenant_id=demo" \
  -F "topic=Device" \
  -F "device=TechCorp8000"

πŸ”§ Configuration

Environment Variables

# OpenAI Configuration
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o-mini"
export EMBEDDER="text-embedding-3-large"

# Service URLs (auto-configured in Docker)
export WEAVIATE_URL="http://weaviate:8080"
export ARANGO_URL="http://arangodb:8529"
export KAFKA_BOOTSTRAP_SERVERS="kafka:9092"

Service Configuration

Each service has its own configuration file:

  • Upload Service: services/upload-service/app/config.py
  • PDF Processing: services/pdf-processing-service/app/config.py
  • Embedding Service: services/embedding-service/app/config.py
  • And so on...

πŸ“ˆ Monitoring & Observability

Enhanced Observability Stack

The Novus system includes a comprehensive three-pillar observability solution:

  1. Metrics (Prometheus) - Quantitative system measurements
  2. Traces (Jaeger) - Request flow through distributed services
  3. Logs (Loki) - Detailed event records with trace correlation

Grafana Dashboards

Access Grafana at http://localhost:3000 (admin/admin):

  • Production Overview: System-wide health, request rates, error rates, and latency
  • Query Service Performance: Detailed query stage breakdown (vector DB, embedding, LLM)
  • Ingestion Pipeline: Document processing throughput and worker metrics
  • Resource Utilization: CPU, memory, disk I/O, and network I/O per service
  • Tracing Correlation: Link metrics, logs, and traces for root cause analysis

Prometheus Metrics

Access Prometheus at http://localhost:9090:

  • Per-service resource metrics (CPU, memory, I/O)
  • HTTP request metrics (rate, latency, errors)
  • Query stage-level metrics
  • Database operation metrics
  • Container metrics from cAdvisor
  • Kafka worker metrics via Pushgateway

Distributed Tracing

Access Jaeger at http://localhost:16686:

  • Trace complete request flows across all services
  • Identify performance bottlenecks
  • Correlate traces with metrics and logs via trace_id
  • Analyze query stage latencies in detail

Log Management

# View all service logs
docker-compose logs -f

# View specific service logs
docker-compose logs -f query-service

# Search logs with trace correlation in Loki
# Access via Grafana β†’ Explore β†’ Loki
# Query: {service_name="query-service"} |= "trace:"

# Follow processing pipeline
docker-compose logs -f pdf-processing-service embedding-service weaviate-manager

Alerting

50+ production-ready alert rules configured in monitoring/alerting-rules.yml:

  • Service health alerts (down, high error rate)
  • Resource alerts (CPU, memory, disk I/O)
  • Performance alerts (high latency, slow queries)
  • Pipeline alerts (backlog, processing failures)
  • Business metrics alerts (low throughput, high failure rate)

Monitoring Documentation

  • Quick Start: monitoring/README.md - Overview and setup
  • Observability Guide: monitoring/docs/OBSERVABILITY_GUIDE.md - Complete architecture and usage
  • Metrics Reference: monitoring/docs/METRICS_REFERENCE.md - Complete metrics catalog
  • Jaeger Tracing: monitoring/docs/JAEGER_GUIDE.md - Distributed tracing setup

Scaling Services

# Scale processing services for high load
docker-compose up -d --scale pdf-processing-service=3
docker-compose up -d --scale embedding-service=5

# Monitor resource utilization in Grafana to guide scaling decisions

πŸ” API Documentation

Upload API

POST /upload
Content-Type: multipart/form-data

Parameters:
- file: PDF file
- tenant_id: Tenant identifier
- topic: Document topic (optional)
- device: Device type (optional)
- version: Version (optional)

Query API

POST /query
Content-Type: application/json

{
  "query": "How to configure BGP on TechCorp 8000?",
  "tenant_id": "demo",
  "filters": {
    "device": "TechCorp8000",
    "topic": "Configuration"
  }
}

Collection Management

# List collections
GET /collections?tenant_id=demo

# Get collection info
GET /collections/{logical_name}

# Resolve collection
GET /resolve/{logical_name}

πŸ—οΈ Development

Adding New Services

  1. Create service directory in services/
  2. Implement with FastAPI or async worker pattern
  3. Add to docker-compose.yaml
  4. Update monitoring configuration

Custom Embedding Models

  1. Add model configuration to embedding-service/app/config.py
  2. Implement model loader in embedding_generator.py
  3. Update dimension mappings

Extending Knowledge Graph

  1. Add entity extractors in arangodb-updater/
  2. Define new node/edge types
  3. Update graph traversal queries

πŸ”’ Security & Compliance

Authentication

  • JWT-based authentication via API Gateway
  • Tenant isolation at all levels
  • Role-based access control (RBAC)

Data Privacy

  • Client-side embedding option for sensitive data
  • PII detection and redaction
  • Data residency controls

Audit & Compliance

  • Immutable audit logs for all operations
  • Provenance tracking for all answers
  • Right-to-be-forgotten support

πŸ“Š Performance Tuning

Vector Search Optimization

  • HNSW index tuning: ef_construction, ef_search, max_connections
  • Batch size optimization for embeddings
  • Collection sharding strategies

Knowledge Graph Performance

  • ArangoDB SmartGraphs for large datasets
  • Query optimization with indexes
  • Caching frequent traversals

Processing Pipeline

  • Parallel PDF processing workers
  • Embedding batch size tuning
  • Memory management for large documents

Health Checks

# Check all service health
curl http://localhost:8000/health
curl http://localhost:8001/health
curl http://localhost:8002/health
curl http://localhost:8003/health

πŸ™ Acknowledgments


About

Novus is an enterprise-grade, event-driven RAG architecture that combines multi-agent orchestration, polyglot persistence, and ML-driven intelligence to deliver scalable, reliable, and production-ready Retrieval-Augmented Generation systems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors