Skip to content

RetrievalLabs/rag_control

Repository files navigation

rag_control

A runtime governance, security, and execution control layer for Retrieval-Augmented Generation (RAG) systems.

rag_control provides enterprise-grade policy enforcement, security governance, and observability for RAG applications. Control what your RAG system retrieves, how it generates responses, and enforce compliance policies at runtime.

Overview

RAG systems are powerful but can be risky in production:

  • Hallucinations: LLMs may generate content not grounded in retrieved documents
  • Data Leakage: Sensitive information might be retrieved or exposed
  • Compliance: Regulations require audit trails and enforcement controls
  • Cost: Token usage and retrieval operations need optimization

rag_control addresses these challenges with:

  • Policy-Based Generation: Define and enforce generation policies (temperature, output length, citation requirements, external knowledge restrictions)
  • Runtime Enforcement: Validate responses against policies before returning them to users
  • Governance & Security: Apply organization-level rules, role-based access control, and data classification filters
  • Comprehensive Audit Logging: Track all requests, decisions, and denials for compliance
  • Distributed Tracing: Understand execution flow and identify performance bottlenecks
  • Metrics & Observability: 18+ metrics covering throughput, latency, quality, costs, and errors

Key Features

🛡️ Policy Enforcement

  • Define multiple policies with different strictness levels
  • Control temperature, max output tokens, reasoning depth
  • Enforce citation requirements and validation
  • Prevent external knowledge generation
  • Apply context-aware fallback strategies

🔐 Governance & Security

  • Organization-level access control
  • Retrieval filtering by data classification and metadata
  • User context validation
  • Policy resolution based on org rules and data sensitivity

📊 Observability

  • Audit Logging: Full request/response lifecycle tracking
  • Distributed Tracing: OpenTelemetry integration for flow analysis
  • Metrics: Token usage, latency, error rates, policy decisions

🚀 Production Ready

  • Exception-swallowing pattern ensures governance failures never break request flow
  • Comprehensive error handling with custom exception types
  • Type-safe with mypy strict mode compliance
  • 100% code coverage with extensive test suite

Installations

pip install rag_control openai_adapter pinecone_adapter

Requirements

  • Python 3.10+

Quick Start

1. Define Policies

Create a policy_config.yaml:

policies:
  - name: strict_citations
    description: Strict policy with citation enforcement
    generation:
      reasoning_level: limited
      allow_external_knowledge: false
      require_citations: true
      temperature: 0.0
    enforcement:
      validate_citations: true
      block_on_missing_citations: true
      prevent_external_knowledge: true
      max_output_tokens: 512
    logging:
      level: full

  - name: soft_research
    description: Relaxed policy for exploratory research
    generation:
      reasoning_level: full
      allow_external_knowledge: false
      require_citations: true
      temperature: 0.1
    enforcement:
      validate_citations: true
      block_on_missing_citations: false
      prevent_external_knowledge: true
      max_output_tokens: 512
    logging:
      level: full

filters:
  - name: enterprise_only
    condition:
      field: org_tier
      operator: equals
      value: enterprise
      source: user
orgs:
  - org_id: acme_corp
    description: Acme Corporation with strict citation requirements
    default_policy: strict_citations
    document_policy:
      top_k: 8
    policy_rules:
      - name: deny_untrusted_document_source
        description: Deny if any retrieved document comes from untrusted source
        priority: 60
        effect: deny
        when:
          any:
            - field: metadata.source
              operator: equals
              value: public-web
              source: documents
              document_match: any
      - name: enforce_strict_citations
        description: Enforce strict citations for enterprise queries
        priority: 50
        effect: enforce
        when:
          all:
            - field: org_tier
              operator: equals
              value: enterprise
              source: user
        policy: strict_citations

2. Initialize the Engine

from rag_control import RAGControl
from rag_control.models import UserContext
from openai_adapter import OpenAILLMAdapter, OpenAIQueryEmbeddingAdapter
from pinecone_adapter import PineconeVectorStoreAdapter

# Initialize adapters
llm_adapter = OpenAILLMAdapter(
    api_key="sk-your-openai-key",
    model="gpt-4"
)

embedding_adapter = OpenAIQueryEmbeddingAdapter(
    api_key="sk-your-openai-key",
    model="text-embedding-3-small"
)

vector_store = PineconeVectorStoreAdapter(
    api_key="your-pinecone-key",
    index_name="documents",
    embedding_model="text-embedding-3-small"
)

# Initialize rag_control
engine = RAGControl(
    llm=llm_adapter,
    query_embedding=embedding_adapter,
    vector_store=vector_store,
    config_path="policy_config.yaml"
)

# Create a user context
user_context = UserContext(
    org_id="default",
    user_id="user-123",
    attributes={
      "namespace": "demo",
      "dept": "hr"
    },
)

3. Run Queries

# Execute a query with governance and policy enforcement
result = engine.run(
    query="What are the key findings from our Q1 report?",
    user_context=user_context
)

print(f"Policy applied: {result.policy_name}")
print(f"Enforcement passed: {result.enforcement_passed}")
print(f"Response: {result.response.content}")
print(f"Tokens used: {result.response.token_count}")

# Or stream responses
stream_result = engine.stream(
    query="Summarize the financial impact...",
    user_context=user_context
)

for chunk in stream_result.response:
    print(chunk.content, end="", flush=True)

Architecture

Core Components

  • Engine: Orchestrates the RAG execution pipeline with governance and policy enforcement
  • Policy Registry: Manages generation and enforcement policies
  • Governance Registry: Applies organization-level rules and access control
  • Filter Registry: Manages data classification and retrieval filters
  • Adapters: Pluggable interfaces for LLMs, embeddings, and vector stores

Execution Flow

1. Validate org identity from user context
   ↓
2. Resolve org and apply retrieval filters
   ↓
3. Embed query
   ↓
4. Retrieve documents with org-level top_k
   ↓
5. Resolve policy via governance rules
   ↓
6. Build prompt with policy context
   ↓
7. Call LLM with policy-controlled parameters
   ↓
8. Apply enforcement checks (citations, knowledge, etc.)
   ↓
9. Emit audit events and traces
   ↓
10. Return response or raise policy violation

Observability

Audit Logging

Every request generates audit events:

{
    "event": "request.received",
    "request_id": "req-abc123",
    "org_id": "acme-corp",
    "user_id": "user-123",
    "timestamp": "2026-03-04T10:30:00Z"
}

Distributed Tracing

OpenTelemetry integration tracks execution stages:

request_span
├── org_lookup_span
├── embedding_span
├── retrieval_span
├── policy_resolution_span
├── llm_generation_span
└── enforcement_span

Metrics (18 total)

  • Throughput: Request count, throughput per second
  • Latency: Request duration, stage durations
  • Quality: Retrieved document scores, top-k metrics
  • LLM: Token counts, efficiency ratios
  • Errors: Error types, error categories, denial reasons
  • Custom: Policy resolutions, embedding dimensions

Documentation

For extensive documentation, guides, and API references, visit the docs directory:

  • Getting Started guides and quick start tutorials
  • Core concepts and architecture
  • API reference and adapters documentation
  • Observability and monitoring guides
  • Configuration reference

Quick links to spec documents:

Examples

See the examples/ directory for:

  • controller-config.yaml: Complete policy configuration example

Security

  • Exception-swallowing pattern ensures governance failures are handled gracefully
  • All external inputs validated with Pydantic
  • Type-safe with strict mypy enforcement
  • Regular security audits and dependency updates

Contributing

See CONTRIBUTING.md for guidelines:

  • Issues: Anyone can open issues, bugs, and feature requests
  • Pull Requests: RetrievalLabs team members only
  • Code Standards: 100% coverage, type checking, formatting compliance required

Support

  • Check DEVELOPMENT.md for setup issues
  • Review spec documentation in rag_control/spec/ for detailed contracts
  • Open an issue for bugs and feature requests

License

This project is licensed under the RetrievalLabs Business-Restricted License (RBRL).

  • Personal/Non-Commercial Use: Permitted
  • Business/Commercial Use: Prohibited without a written contract with RetrievalLabs Co.
  • Modifications/Derivative Works: Prohibited without a written contract with RetrievalLabs Co.

See LICENSE for full terms.


Built by RetrievalLabs — Enterprise RAG Governance and Security

About

A runtime governance, security and execution control layer for Retrieval-Augmented Generation (RAG) systems

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors