Skip to content

DevilsDev/rag-pipeline-utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

@DevilsDev/rag-pipeline-utils

CI npm version Node.js Version License codecov Downloads

Enterprise-grade RAG pipeline toolkit for Node.js β€” Build production-ready Retrieval-Augmented Generation systems with modular plugins, streaming support, and comprehensive observability.

Overview

@devilsdev/rag-pipeline-utils is a modular toolkit for building scalable RAG (Retrieval-Augmented Generation) pipelines in Node.js. Designed for enterprise applications, it provides a plugin-based architecture with built-in streaming, performance optimization, observability, and comprehensive testing utilities.

✨ Key Features

πŸ”Œ Plugin Architecture

  • Modular Components: Swap loaders, embedders, retrievers, LLMs, and rerankers without code changes
  • Contract Validation: Runtime and CI verification of plugin interfaces
  • Plugin Marketplace: Discover and publish community plugins
  • Hot-swappable: Change components via configuration without restarts

πŸš€ Performance & Scalability

  • Streaming Support: Real-time token streaming for LLM responses
  • Parallel Processing: Concurrent embedding and retrieval operations
  • Memory Safeguards: Automatic backpressure and memory management
  • Benchmarking Tools: Built-in performance measurement and optimization

πŸ“Š Enterprise Observability

  • Structured Logging: Comprehensive event tracking and debugging
  • Metrics Collection: Performance counters, histograms, and gauges
  • Distributed Tracing: OpenTelemetry-compatible request tracing
  • Health Monitoring: Built-in diagnostics and system health checks

πŸ› οΈ Developer Experience

  • CLI Tools: Full-featured command-line interface
  • Interactive Wizard: Guided pipeline setup and configuration
  • Plugin Scaffolding: Generate new plugins with best practices
  • Comprehensive Testing: Unit, integration, and contract testing utilities

πŸ”’ Production Ready

  • Schema Validation: Strict configuration validation with JSON Schema
  • Error Handling: Robust error recovery and reporting
  • Type Safety: Full TypeScript support and JSDoc annotations
  • CI/CD Integration: GitHub Actions workflows and automated testing

πŸ›‘οΈ Enterprise Security

  • Zero Critical Vulnerabilities: 98β†’17 vulnerabilities eliminated (83% reduction)
  • Automated Security Monitoring: GitHub Dependabot with weekly vulnerability scans
  • CI/CD Security Integration: Build failure on critical vulnerabilities
  • Compliance Ready: OWASP, NIST, and CIS security standards
  • Dependency Validation: Automated license and security compliance checking
  • Security Audit Tools: Built-in npm run security:audit and reporting

πŸ“¦ Installation

Prerequisites

  • Node.js 18.0.0 or higher
  • npm or yarn package manager

Install via npm

npm install -g @devilsdev/rag-pipeline-utils

Install as project dependency

npm install @devilsdev/rag-pipeline-utils

πŸš€ Quick Start

1. Initialize a new RAG pipeline

rag-pipeline init

This launches an interactive wizard to configure your pipeline with preferred plugins.

2. Configure your pipeline

Create a .ragrc.json configuration file:

{
  "loader": {
    "pdf": "@devilsdev/pdf-loader",
    "markdown": "@devilsdev/markdown-loader"
  },
  "embedder": {
    "openai": "@devilsdev/openai-embedder"
  },
  "retriever": {
    "chroma": "@devilsdev/chroma-retriever"
  },
  "llm": {
    "openai": "@devilsdev/openai-llm"
  },
  "pipeline": {
    "loader": "pdf",
    "embedder": "openai",
    "retriever": "chroma",
    "llm": "openai"
  },
  "performance": {
    "maxConcurrency": 5,
    "enableStreaming": true,
    "enableObservability": true
  }
}

πŸ–₯️ CLI Usage

Document Ingestion

# Ingest documents with automatic plugin detection
rag-pipeline ingest ./docs --loader pdf --embedder openai --retriever chroma

# Ingest with streaming and performance monitoring
rag-pipeline ingest ./docs --stream --benchmark --trace

# Batch ingest multiple document types
rag-pipeline ingest ./docs/**/*.{pdf,md,txt} --parallel --batch-size 10

Querying

# Basic query
rag-pipeline query "What is vector search?" --llm openai

# Streaming query with real-time responses
rag-pipeline query "Explain RAG architecture" --llm openai --stream

# Query with custom retrieval parameters
rag-pipeline query "How does embedding work?" --top-k 5 --similarity-threshold 0.8

Advanced Workflows

# Run complex DAG pipelines
rag-pipeline dag run ./examples/academic-rag.yaml

# Interactive pipeline builder
rag-pipeline wizard

# System diagnostics and health check
rag-pipeline doctor

# Plugin management
rag-pipeline plugin list
rag-pipeline plugin install @community/custom-embedder
rag-pipeline plugin scaffold my-custom-loader

πŸ”Œ Plugin Architecture

Plugin Contracts

Each plugin type implements a standardized interface with runtime validation:

Plugin Type Required Methods Optional Methods Description
Loader load(filePath) validate(), getMetadata() Document ingestion and parsing
Embedder embed(texts), embedQuery(query) getDimensions(), getBatchSize() Text vectorization
Retriever store(vectors), retrieve(queryVector) delete(), update() Vector storage and similarity search
LLM generate(prompt), stream(prompt) getTokenCount(), getModels() Language model inference
Reranker rerank(query, documents) getScore() Result relevance optimization

Plugin Development

// Example: Custom embedder plugin
export class MyCustomEmbedder {
  constructor(options = {}) {
    this.apiKey = options.apiKey;
    this.model = options.model || 'text-embedding-ada-002';
  }

  async embed(texts) {
    // Implementation for batch embedding
    return vectors;
  }

  async embedQuery(query) {
    // Implementation for single query embedding
    return vector;
  }

  // Plugin metadata (required)
  static metadata = {
    name: 'my-custom-embedder',
    version: '1.0.0',
    type: 'embedder',
    description: 'Custom embedding implementation'
  };
}

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   CLI Interface │────│  Pipeline Engine │────│ Plugin Registry β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚               β”‚               β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
        β”‚ Observabilityβ”‚ β”‚Performance  β”‚ β”‚   DAG     β”‚
        β”‚   System     β”‚ β”‚ Optimizer   β”‚ β”‚  Engine   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚               β”‚               β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
        β”‚   Logging    β”‚ β”‚  Streaming  β”‚ β”‚Workflow   β”‚
        β”‚   Tracing    β”‚ β”‚  Parallel   β”‚ β”‚Execution  β”‚
        β”‚   Metrics    β”‚ β”‚  Memory     β”‚ β”‚Validation β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

@devilsdev/rag-pipeline-utils/
β”œβ”€β”€ bin/
β”‚   └── cli.js                 # CLI entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ cli/                   # Command-line interface
β”‚   β”‚   β”œβ”€β”€ enhanced-cli-commands.js
β”‚   β”‚   β”œβ”€β”€ interactive-wizard.js
β”‚   β”‚   β”œβ”€β”€ doctor-command.js
β”‚   β”‚   └── plugin-marketplace-commands.js
β”‚   β”œβ”€β”€ core/                  # Core pipeline engine
β”‚   β”‚   β”œβ”€β”€ create-pipeline.js
β”‚   β”‚   β”œβ”€β”€ plugin-registry.js
β”‚   β”‚   β”œβ”€β”€ plugin-contracts.js
β”‚   β”‚   β”œβ”€β”€ observability/     # Monitoring & logging
β”‚   β”‚   β”œβ”€β”€ performance/       # Optimization tools
β”‚   β”‚   └── plugin-marketplace/
β”‚   β”œβ”€β”€ config/                # Configuration management
β”‚   β”‚   β”œβ”€β”€ load-config.js
β”‚   β”‚   └── enhanced-ragrc-schema.js
β”‚   β”œβ”€β”€ dag/                   # DAG workflow engine
β”‚   β”‚   └── dag-engine.js
β”‚   β”œβ”€β”€ utils/                 # Utility functions
β”‚   β”‚   β”œβ”€β”€ logger.js
β”‚   β”‚   β”œβ”€β”€ retry.js
β”‚   β”‚   └── plugin-scaffolder.js
β”‚   └── mocks/                 # Development mocks
β”œβ”€β”€ __tests__/                 # Test suites
β”‚   β”œβ”€β”€ unit/
β”‚   β”œβ”€β”€ integration/
β”‚   └── fixtures/
β”œβ”€β”€ docs/                      # Documentation
β”œβ”€β”€ examples/                  # Usage examples
β”œβ”€β”€ scripts/                   # Build & maintenance
β”œβ”€β”€ .ragrc.schema.json        # Configuration schema
β”œβ”€β”€ package.json
└── README.md

πŸ“š Configuration

Environment Variables

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=gpt-4

# Vector Database Configuration
PINECONE_API_KEY=your_pinecone_key
PINECONE_ENVIRONMENT=us-west1-gcp

# Performance Settings
RAG_MAX_CONCURRENCY=5
RAG_BATCH_SIZE=10
RAG_ENABLE_STREAMING=true

Advanced Configuration

{
  "pipeline": {
    "loader": "pdf",
    "embedder": "openai",
    "retriever": "chroma",
    "llm": "openai",
    "reranker": "cross-encoder"
  },
  "performance": {
    "maxConcurrency": 5,
    "batchSize": 10,
    "enableStreaming": true,
    "enableObservability": true,
    "maxMemoryMB": 512,
    "tokenLimit": 100000
  },
  "observability": {
    "enableLogging": true,
    "enableTracing": true,
    "enableMetrics": true,
    "logLevel": "info",
    "exportFormat": "json"
  },
  "plugins": {
    "marketplace": {
      "registryUrl": "https://registry.rag-pipeline.dev",
      "autoUpdate": false,
      "allowPrerelease": false
    }
  }
}

πŸš€ Use Cases

Enterprise Document Processing

  • Legal Document Analysis: Process contracts, agreements, and legal documents
  • Technical Documentation: Index API docs, manuals, and knowledge bases
  • Research Papers: Academic literature search and analysis
  • Customer Support: FAQ automation and ticket resolution

Development Workflows

  • Code Documentation: Generate and maintain code documentation
  • API Integration: Semantic search across API documentation
  • Knowledge Management: Team knowledge base and onboarding
  • Content Generation: Automated content creation and editing

Industry Applications

  • Healthcare: Medical literature search and clinical decision support
  • Finance: Financial document analysis and compliance
  • Education: Personalized learning and content recommendation
  • E-commerce: Product search and recommendation systems

🀝 Contributing

We welcome contributions from the community! Here's how you can help:

Development Setup

# Clone the repository
git clone https://github.com/DevilsDev/rag-pipeline-utils.git
cd rag-pipeline-utils

# Install dependencies
npm install

# Run tests
npm test

# Run linting
npm run lint

# Start development server
npm run dev

Contribution Guidelines

  • Plugin Development: Create new plugins following our Plugin Developer Guide
  • Bug Reports: Use GitHub Issues with detailed reproduction steps
  • Feature Requests: Discuss new features in GitHub Discussions
  • Documentation: Help improve docs and examples
  • Testing: Add tests for new features and bug fixes

Community

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Built with ❀️ by Ali Kahwaji and the DevilsDev team
  • Inspired by the open-source AI/ML community
  • Special thanks to all contributors

Ready to build your next RAG application?
Get Started β€’ Documentation β€’ Community

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •