Enterprise-grade RAG pipeline toolkit for Node.js β Build production-ready Retrieval-Augmented Generation systems with modular plugins, streaming support, and comprehensive observability.
@devilsdev/rag-pipeline-utils
is a modular toolkit for building scalable RAG (Retrieval-Augmented Generation) pipelines in Node.js. Designed for enterprise applications, it provides a plugin-based architecture with built-in streaming, performance optimization, observability, and comprehensive testing utilities.
- Modular Components: Swap loaders, embedders, retrievers, LLMs, and rerankers without code changes
- Contract Validation: Runtime and CI verification of plugin interfaces
- Plugin Marketplace: Discover and publish community plugins
- Hot-swappable: Change components via configuration without restarts
- Streaming Support: Real-time token streaming for LLM responses
- Parallel Processing: Concurrent embedding and retrieval operations
- Memory Safeguards: Automatic backpressure and memory management
- Benchmarking Tools: Built-in performance measurement and optimization
- Structured Logging: Comprehensive event tracking and debugging
- Metrics Collection: Performance counters, histograms, and gauges
- Distributed Tracing: OpenTelemetry-compatible request tracing
- Health Monitoring: Built-in diagnostics and system health checks
- CLI Tools: Full-featured command-line interface
- Interactive Wizard: Guided pipeline setup and configuration
- Plugin Scaffolding: Generate new plugins with best practices
- Comprehensive Testing: Unit, integration, and contract testing utilities
- Schema Validation: Strict configuration validation with JSON Schema
- Error Handling: Robust error recovery and reporting
- Type Safety: Full TypeScript support and JSDoc annotations
- CI/CD Integration: GitHub Actions workflows and automated testing
- Zero Critical Vulnerabilities: 98β17 vulnerabilities eliminated (83% reduction)
- Automated Security Monitoring: GitHub Dependabot with weekly vulnerability scans
- CI/CD Security Integration: Build failure on critical vulnerabilities
- Compliance Ready: OWASP, NIST, and CIS security standards
- Dependency Validation: Automated license and security compliance checking
- Security Audit Tools: Built-in
npm run security:audit
and reporting
- Node.js 18.0.0 or higher
- npm or yarn package manager
npm install -g @devilsdev/rag-pipeline-utils
npm install @devilsdev/rag-pipeline-utils
rag-pipeline init
This launches an interactive wizard to configure your pipeline with preferred plugins.
Create a .ragrc.json
configuration file:
{
"loader": {
"pdf": "@devilsdev/pdf-loader",
"markdown": "@devilsdev/markdown-loader"
},
"embedder": {
"openai": "@devilsdev/openai-embedder"
},
"retriever": {
"chroma": "@devilsdev/chroma-retriever"
},
"llm": {
"openai": "@devilsdev/openai-llm"
},
"pipeline": {
"loader": "pdf",
"embedder": "openai",
"retriever": "chroma",
"llm": "openai"
},
"performance": {
"maxConcurrency": 5,
"enableStreaming": true,
"enableObservability": true
}
}
# Ingest documents with automatic plugin detection
rag-pipeline ingest ./docs --loader pdf --embedder openai --retriever chroma
# Ingest with streaming and performance monitoring
rag-pipeline ingest ./docs --stream --benchmark --trace
# Batch ingest multiple document types
rag-pipeline ingest ./docs/**/*.{pdf,md,txt} --parallel --batch-size 10
# Basic query
rag-pipeline query "What is vector search?" --llm openai
# Streaming query with real-time responses
rag-pipeline query "Explain RAG architecture" --llm openai --stream
# Query with custom retrieval parameters
rag-pipeline query "How does embedding work?" --top-k 5 --similarity-threshold 0.8
# Run complex DAG pipelines
rag-pipeline dag run ./examples/academic-rag.yaml
# Interactive pipeline builder
rag-pipeline wizard
# System diagnostics and health check
rag-pipeline doctor
# Plugin management
rag-pipeline plugin list
rag-pipeline plugin install @community/custom-embedder
rag-pipeline plugin scaffold my-custom-loader
Each plugin type implements a standardized interface with runtime validation:
Plugin Type | Required Methods | Optional Methods | Description |
---|---|---|---|
Loader | load(filePath) |
validate() , getMetadata() |
Document ingestion and parsing |
Embedder | embed(texts) , embedQuery(query) |
getDimensions() , getBatchSize() |
Text vectorization |
Retriever | store(vectors) , retrieve(queryVector) |
delete() , update() |
Vector storage and similarity search |
LLM | generate(prompt) , stream(prompt) |
getTokenCount() , getModels() |
Language model inference |
Reranker | rerank(query, documents) |
getScore() |
Result relevance optimization |
// Example: Custom embedder plugin
export class MyCustomEmbedder {
constructor(options = {}) {
this.apiKey = options.apiKey;
this.model = options.model || 'text-embedding-ada-002';
}
async embed(texts) {
// Implementation for batch embedding
return vectors;
}
async embedQuery(query) {
// Implementation for single query embedding
return vector;
}
// Plugin metadata (required)
static metadata = {
name: 'my-custom-embedder',
version: '1.0.0',
type: 'embedder',
description: 'Custom embedding implementation'
};
}
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β CLI Interface ββββββ Pipeline Engine ββββββ Plugin Registry β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βββββββββΌβββββββ ββββββββΌβββββββ βββββββΌββββββ
β Observabilityβ βPerformance β β DAG β
β System β β Optimizer β β Engine β
ββββββββββββββββ βββββββββββββββ βββββββββββββ
β β β
βββββββββΌβββββββ ββββββββΌβββββββ βββββββΌββββββ
β Logging β β Streaming β βWorkflow β
β Tracing β β Parallel β βExecution β
β Metrics β β Memory β βValidation β
ββββββββββββββββ βββββββββββββββ βββββββββββββ
@devilsdev/rag-pipeline-utils/
βββ bin/
β βββ cli.js # CLI entry point
βββ src/
β βββ cli/ # Command-line interface
β β βββ enhanced-cli-commands.js
β β βββ interactive-wizard.js
β β βββ doctor-command.js
β β βββ plugin-marketplace-commands.js
β βββ core/ # Core pipeline engine
β β βββ create-pipeline.js
β β βββ plugin-registry.js
β β βββ plugin-contracts.js
β β βββ observability/ # Monitoring & logging
β β βββ performance/ # Optimization tools
β β βββ plugin-marketplace/
β βββ config/ # Configuration management
β β βββ load-config.js
β β βββ enhanced-ragrc-schema.js
β βββ dag/ # DAG workflow engine
β β βββ dag-engine.js
β βββ utils/ # Utility functions
β β βββ logger.js
β β βββ retry.js
β β βββ plugin-scaffolder.js
β βββ mocks/ # Development mocks
βββ __tests__/ # Test suites
β βββ unit/
β βββ integration/
β βββ fixtures/
βββ docs/ # Documentation
βββ examples/ # Usage examples
βββ scripts/ # Build & maintenance
βββ .ragrc.schema.json # Configuration schema
βββ package.json
βββ README.md
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=gpt-4
# Vector Database Configuration
PINECONE_API_KEY=your_pinecone_key
PINECONE_ENVIRONMENT=us-west1-gcp
# Performance Settings
RAG_MAX_CONCURRENCY=5
RAG_BATCH_SIZE=10
RAG_ENABLE_STREAMING=true
{
"pipeline": {
"loader": "pdf",
"embedder": "openai",
"retriever": "chroma",
"llm": "openai",
"reranker": "cross-encoder"
},
"performance": {
"maxConcurrency": 5,
"batchSize": 10,
"enableStreaming": true,
"enableObservability": true,
"maxMemoryMB": 512,
"tokenLimit": 100000
},
"observability": {
"enableLogging": true,
"enableTracing": true,
"enableMetrics": true,
"logLevel": "info",
"exportFormat": "json"
},
"plugins": {
"marketplace": {
"registryUrl": "https://registry.rag-pipeline.dev",
"autoUpdate": false,
"allowPrerelease": false
}
}
}
- Legal Document Analysis: Process contracts, agreements, and legal documents
- Technical Documentation: Index API docs, manuals, and knowledge bases
- Research Papers: Academic literature search and analysis
- Customer Support: FAQ automation and ticket resolution
- Code Documentation: Generate and maintain code documentation
- API Integration: Semantic search across API documentation
- Knowledge Management: Team knowledge base and onboarding
- Content Generation: Automated content creation and editing
- Healthcare: Medical literature search and clinical decision support
- Finance: Financial document analysis and compliance
- Education: Personalized learning and content recommendation
- E-commerce: Product search and recommendation systems
We welcome contributions from the community! Here's how you can help:
# Clone the repository
git clone https://github.com/DevilsDev/rag-pipeline-utils.git
cd rag-pipeline-utils
# Install dependencies
npm install
# Run tests
npm test
# Run linting
npm run lint
# Start development server
npm run dev
- Plugin Development: Create new plugins following our Plugin Developer Guide
- Bug Reports: Use GitHub Issues with detailed reproduction steps
- Feature Requests: Discuss new features in GitHub Discussions
- Documentation: Help improve docs and examples
- Testing: Add tests for new features and bug fixes
- π¬ GitHub Discussions - Questions and community chat
- π GitHub Issues - Bug reports and feature requests
- π Documentation - Comprehensive guides and API docs
- π Plugin Marketplace - Community plugins and extensions
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Built with β€οΈ by Ali Kahwaji and the DevilsDev team
- Inspired by the open-source AI/ML community
- Special thanks to all contributors