A runtime governance, security, and execution control layer for Retrieval-Augmented Generation (RAG) systems.
rag_control provides enterprise-grade policy enforcement, security governance, and observability for RAG applications. Control what your RAG system retrieves, how it generates responses, and enforce compliance policies at runtime.
RAG systems are powerful but can be risky in production:
- Hallucinations: LLMs may generate content not grounded in retrieved documents
- Data Leakage: Sensitive information might be retrieved or exposed
- Compliance: Regulations require audit trails and enforcement controls
- Cost: Token usage and retrieval operations need optimization
rag_control addresses these challenges with:
- Policy-Based Generation: Define and enforce generation policies (temperature, output length, citation requirements, external knowledge restrictions)
- Runtime Enforcement: Validate responses against policies before returning them to users
- Governance & Security: Apply organization-level rules, role-based access control, and data classification filters
- Comprehensive Audit Logging: Track all requests, decisions, and denials for compliance
- Distributed Tracing: Understand execution flow and identify performance bottlenecks
- Metrics & Observability: 18+ metrics covering throughput, latency, quality, costs, and errors
- Define multiple policies with different strictness levels
- Control temperature, max output tokens, reasoning depth
- Enforce citation requirements and validation
- Prevent external knowledge generation
- Apply context-aware fallback strategies
- Organization-level access control
- Retrieval filtering by data classification and metadata
- User context validation
- Policy resolution based on org rules and data sensitivity
- Audit Logging: Full request/response lifecycle tracking
- Distributed Tracing: OpenTelemetry integration for flow analysis
- Metrics: Token usage, latency, error rates, policy decisions
- Exception-swallowing pattern ensures governance failures never break request flow
- Comprehensive error handling with custom exception types
- Type-safe with mypy strict mode compliance
- 100% code coverage with extensive test suite
pip install rag_control openai_adapter pinecone_adapter- Python 3.10+
Create a policy_config.yaml:
policies:
- name: strict_citations
description: Strict policy with citation enforcement
generation:
reasoning_level: limited
allow_external_knowledge: false
require_citations: true
temperature: 0.0
enforcement:
validate_citations: true
block_on_missing_citations: true
prevent_external_knowledge: true
max_output_tokens: 512
logging:
level: full
- name: soft_research
description: Relaxed policy for exploratory research
generation:
reasoning_level: full
allow_external_knowledge: false
require_citations: true
temperature: 0.1
enforcement:
validate_citations: true
block_on_missing_citations: false
prevent_external_knowledge: true
max_output_tokens: 512
logging:
level: full
filters:
- name: enterprise_only
condition:
field: org_tier
operator: equals
value: enterprise
source: user
orgs:
- org_id: acme_corp
description: Acme Corporation with strict citation requirements
default_policy: strict_citations
document_policy:
top_k: 8
policy_rules:
- name: deny_untrusted_document_source
description: Deny if any retrieved document comes from untrusted source
priority: 60
effect: deny
when:
any:
- field: metadata.source
operator: equals
value: public-web
source: documents
document_match: any
- name: enforce_strict_citations
description: Enforce strict citations for enterprise queries
priority: 50
effect: enforce
when:
all:
- field: org_tier
operator: equals
value: enterprise
source: user
policy: strict_citations
from rag_control import RAGControl
from rag_control.models import UserContext
from openai_adapter import OpenAILLMAdapter, OpenAIQueryEmbeddingAdapter
from pinecone_adapter import PineconeVectorStoreAdapter
# Initialize adapters
llm_adapter = OpenAILLMAdapter(
api_key="sk-your-openai-key",
model="gpt-4"
)
embedding_adapter = OpenAIQueryEmbeddingAdapter(
api_key="sk-your-openai-key",
model="text-embedding-3-small"
)
vector_store = PineconeVectorStoreAdapter(
api_key="your-pinecone-key",
index_name="documents",
embedding_model="text-embedding-3-small"
)
# Initialize rag_control
engine = RAGControl(
llm=llm_adapter,
query_embedding=embedding_adapter,
vector_store=vector_store,
config_path="policy_config.yaml"
)
# Create a user context
user_context = UserContext(
org_id="default",
user_id="user-123",
attributes={
"namespace": "demo",
"dept": "hr"
},
)# Execute a query with governance and policy enforcement
result = engine.run(
query="What are the key findings from our Q1 report?",
user_context=user_context
)
print(f"Policy applied: {result.policy_name}")
print(f"Enforcement passed: {result.enforcement_passed}")
print(f"Response: {result.response.content}")
print(f"Tokens used: {result.response.token_count}")
# Or stream responses
stream_result = engine.stream(
query="Summarize the financial impact...",
user_context=user_context
)
for chunk in stream_result.response:
print(chunk.content, end="", flush=True)- Engine: Orchestrates the RAG execution pipeline with governance and policy enforcement
- Policy Registry: Manages generation and enforcement policies
- Governance Registry: Applies organization-level rules and access control
- Filter Registry: Manages data classification and retrieval filters
- Adapters: Pluggable interfaces for LLMs, embeddings, and vector stores
1. Validate org identity from user context
↓
2. Resolve org and apply retrieval filters
↓
3. Embed query
↓
4. Retrieve documents with org-level top_k
↓
5. Resolve policy via governance rules
↓
6. Build prompt with policy context
↓
7. Call LLM with policy-controlled parameters
↓
8. Apply enforcement checks (citations, knowledge, etc.)
↓
9. Emit audit events and traces
↓
10. Return response or raise policy violation
Every request generates audit events:
{
"event": "request.received",
"request_id": "req-abc123",
"org_id": "acme-corp",
"user_id": "user-123",
"timestamp": "2026-03-04T10:30:00Z"
}OpenTelemetry integration tracks execution stages:
request_span
├── org_lookup_span
├── embedding_span
├── retrieval_span
├── policy_resolution_span
├── llm_generation_span
└── enforcement_span
- Throughput: Request count, throughput per second
- Latency: Request duration, stage durations
- Quality: Retrieved document scores, top-k metrics
- LLM: Token counts, efficiency ratios
- Errors: Error types, error categories, denial reasons
- Custom: Policy resolutions, embedding dimensions
For extensive documentation, guides, and API references, visit the docs directory:
- Getting Started guides and quick start tutorials
- Core concepts and architecture
- API reference and adapters documentation
- Observability and monitoring guides
- Configuration reference
Quick links to spec documents:
- DEVELOPMENT.md: Development setup, testing, quality standards
- CONTRIBUTING.md: Contribution guidelines
- Execution Contract: Engine behavior specification
- Audit Log Contract: Audit logging specification
- Metrics Contract: Metrics specification
- Tracing Contract: Tracing specification
- Control Plane Config Contract: Configuration specification
See the examples/ directory for:
controller-config.yaml: Complete policy configuration example
- Exception-swallowing pattern ensures governance failures are handled gracefully
- All external inputs validated with Pydantic
- Type-safe with strict mypy enforcement
- Regular security audits and dependency updates
See CONTRIBUTING.md for guidelines:
- Issues: Anyone can open issues, bugs, and feature requests
- Pull Requests: RetrievalLabs team members only
- Code Standards: 100% coverage, type checking, formatting compliance required
- Check DEVELOPMENT.md for setup issues
- Review spec documentation in
rag_control/spec/for detailed contracts - Open an issue for bugs and feature requests
This project is licensed under the RetrievalLabs Business-Restricted License (RBRL).
- Personal/Non-Commercial Use: Permitted
- Business/Commercial Use: Prohibited without a written contract with RetrievalLabs Co.
- Modifications/Derivative Works: Prohibited without a written contract with RetrievalLabs Co.
See LICENSE for full terms.
Built by RetrievalLabs — Enterprise RAG Governance and Security