Multi-Agent RAG Workflow Orchestrator

This repository contains the specifications and plans for a Multi-Agent Retrieval-Augmented Generation (RAG) Workflow Orchestrator. The project leverages n8n for visual workflow orchestration, Ollama for local Large Language Model (LLM) inference, and Qdrant as a vector database for efficient semantic search.

The goal is to create a robust, observable, and cost-effective system that enables natural language querying over extensive document knowledge bases, providing accurate answers with source citations.

🆕 What's New in v4.2?

Version 4.2 represents a major evolution based on analysis of 8 production-grade systems (Dify, Flowise, AnythingLLM, MCP Servers, Browser-Use, Jan, AutoGen, Verba). Key enhancements include:

🎯 Critical Improvements

40-60% Better RAG Accuracy: Hybrid search (semantic + keyword) with reranking pipeline
3-5x Faster Web Agents: Browser-Use pattern with stealth browsers and LLM-as-judge validation
Production Observability: LLMOps feedback loop with PostgreSQL logging and real-time cost tracking
100% Privacy Compliance: MCP server integration for GDPR/HIPAA-compliant local-only mode
99.9% Uptime: Provider abstraction layer with health checks and automatic failover
25-40% Additional Cost Savings: Streaming-first design, hard budget limits, early termination

📊 Expected Impact

Metric	Improvement
RAG Retrieval Precision	+30-46%
Web Agent Success Rate	+50-58%
API Cost Reduction	25-40%
System Uptime	95% → 99.9%
Privacy Compliance	0% → 100%

👉 See v4.2-ENHANCEMENT-PLAN.md for complete implementation details

🌟 Core Principles (Project Constitution)

The development of this orchestrator adheres to the following non-negotiable principles:

Visual-First Design: All agentic workflows are designed to be visually representable in n8n, ensuring clarity, intuitive debugging, and accessibility for both technical and non-technical stakeholders.
LLM-Agnostic Architecture: The system supports multiple LLM providers interchangeably (Ollama, Anthropic, OpenAI), abstracting LLM calls and enabling cost optimization, vendor lock-in avoidance, and offline capabilities.
Fail-Safe Operations: Robust error handling, automatic retries with exponential backoff, and graceful degradation are implemented for every LLM call, tool execution, and data retrieval to ensure system resilience.
Cost-Conscious Design: Token usage and API costs are actively managed through monitoring, prompt optimization, caching, and prioritizing local Ollama inference.
Observable Agent Flows: Every step of the agent workflow is traceable and debuggable with structured logging, capture of LLM prompts/responses, RAG retrieval details, and comprehensive execution metrics.

✨ Features

The Multi-Agent RAG Workflow Orchestrator provides the following key functionalities:

Document Knowledge Base Query (P1): Users can submit natural language questions via webhook and receive accurate, cited answers from the knowledge base.
Document Ingestion and Indexing (P1): System administrators can upload documents (PDF, Markdown, TXT) for automatic processing, chunking, embedding, and indexing into the vector database.
Multi-Agent Workflow with Tool Calling (P2): Agents can perform multi-step reasoning, deciding whether to retrieve additional context, call external tools (e.g., Google Search), or generate final responses.
LLM Provider Fallback Chain (P3): Automatic failover to cloud LLM providers (Anthropic, OpenAI) if the primary local Ollama service is unavailable or slow.
Observable Workflow Execution (P3): Administrators gain detailed insights into workflow executions, including LLM prompts, retrieved chunks, token usage, and error traces.

🏗️ Architecture

The system is built upon a modern, containerized stack:

Orchestration: n8n (v1.x, self-hosted via Docker) for visual workflow design and execution.
LLM Inference: Ollama (latest) for running local LLMs (e.g., llama3.1:8b for generation, nomic-embed-text for embeddings).
Vector Database: Qdrant (v1.7+, self-hosted via Docker) for efficient storage and retrieval of document embeddings.
Queue Management: Redis (v7.x) to support n8n's queue mode for concurrent workflow execution.
Deployment: Docker Compose (v2.x) for easy setup and management of the multi-container environment.

Key Architectural Decisions:

Visual-First Approach: All agent logic is implemented as n8n node configurations, minimizing custom code and maximizing visual clarity.
Explicit State Management: Agent state is passed explicitly between n8n nodes using JSON objects, avoiding hidden global states.
Robust Error Handling: Native n8n retry logic, error branches, and timeouts are configured for all external API calls.
Cost Optimization: Prioritization of local Ollama inference, query caching, and token usage tracking.

🚀 Getting Started

To set up and run the Multi-Agent RAG Workflow Orchestrator locally, follow these high-level steps:

Clone the Repository:

git clone https://github.com/your-org/your-repo.git
cd your-repo

Prepare Environment:
- Ensure Docker and Docker Compose are installed.
- Create a .env file based on .env.example and configure necessary API keys (for Anthropic/OpenAI fallbacks) and URLs.
Start Services:
```
docker-compose up -d
```
This will launch n8n, Ollama, Qdrant, and Redis containers.

Pull Ollama Models:

ollama pull llama3.1:8b
ollama pull nomic-embed-text

Initialize Qdrant Collection: Run the provided script to create the knowledge_base collection in Qdrant:
```
./scripts/setup-qdrant-collection.sh
```
Import n8n Workflows: Import the pre-defined n8n workflows from the workflows/ directory into your n8n instance (accessible at http://localhost:5678).
```
./scripts/import-workflows.sh
```
Configure Credentials: Within n8n, configure your Anthropic and OpenAI API keys in the Credentials section.
Test Document Ingestion: Trigger the Document Ingestion workflow in n8n to process sample documents from test-data/sample-documents/ and index them into Qdrant.
Test Query Processing: Use the webhook endpoint for the Query Processing workflow to submit natural language queries and receive answers.

For detailed instructions, refer to docs/quickstart.md (to be created).

⚙️ Workflows

The project is structured around several key n8n workflows:

Document Ingestion (001-document-ingestion.json): Handles reading documents, text extraction, chunking, embedding generation via Ollama, and insertion into Qdrant.
Query Processing (002-query-processing.json): Receives user queries, generates embeddings, performs semantic search in Qdrant, augments LLM prompts, and generates answers using Ollama.
Multi-Agent Orchestration (003-multi-agent-orchestration.json): Implements the ReAct pattern, allowing agents to decide on tool usage (e.g., Google Search) and iterate for complex reasoning.
Provider Fallback (004-provider-fallback.json): Manages the LLM provider chain, automatically switching from Ollama to Anthropic or OpenAI in case of failures or timeouts.
Metrics Aggregation (005-metrics-aggregation.json): A scheduled workflow that fetches n8n execution logs, extracts metrics (latency, token usage, status), and exports them for observability.

📋 Spec-Driven Development

This project follows a Spec-Driven Development (SDD) approach, utilizing the GitHub Spec Kit. This methodology ensures that requirements, motivations, and technical aspects are clearly outlined before implementation. Key artifacts include:

Constitution (0.constitution.md): Defines core project principles and values.
Specification (01-spec.md): Details functional and non-functional requirements, user stories, and acceptance criteria.
Plan (02-plan.md): Outlines the technical implementation strategy, architecture decisions, and technology stack.
Research (03-research.md): Documents technology research, decisions, and rationale.
Tasks (04-tasks.md): Breaks down the implementation into actionable, prioritized tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
0.spec		0.spec
1.ops		1.ops
mcp-servers		mcp-servers
n8n-llm-orchestrator		n8n-llm-orchestrator
servers/webserver		servers/webserver
src		src
tests		tests
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent RAG Workflow Orchestrator

🆕 What's New in v4.2?

🎯 Critical Improvements

📊 Expected Impact

🌟 Core Principles (Project Constitution)

✨ Features

🏗️ Architecture

Key Architectural Decisions:

🚀 Getting Started

⚙️ Workflows

📋 Spec-Driven Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent RAG Workflow Orchestrator

🆕 What's New in v4.2?

🎯 Critical Improvements

📊 Expected Impact

🌟 Core Principles (Project Constitution)

✨ Features

🏗️ Architecture

Key Architectural Decisions:

🚀 Getting Started

⚙️ Workflows

📋 Spec-Driven Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages