-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Phase
Phase 0 — Foundations | Track 0.3 — AI Principles Framework | Priority: P0
Summary
Create the guardrails module with base classes for action validation, content filtering, and policy enforcement.
What
Create operator_use/guardrails/ package:
operator_use/guardrails/
├── __init__.py # Public API exports
├── base.py # Abstract base classes
├── action_validator.py # Pre-execution tool call validation
├── content_filter.py # Post-execution output filtering
├── policy_engine.py # Risk classification and policy decisions
└── registry.py # Guardrail registration and lookup
Base classes:
Guardrail— abstract base withcheck(context) -> GuardrailResultActionValidator— validates tool calls before execution (block/allow/confirm)ContentFilter— filters LLM output before sending to userPolicyEngine— evaluates risk level of an action given contextGuardrailResult— result object with action (allow/block/confirm), reason, severity
Integration points:
- Hook into agent tool execution loop (before tool call)
- Hook into response pipeline (before sending to user)
- Configurable per-agent (some agents may have stricter policies)
Why
This module is the backbone for all Phase 2 guardrails. Having clean base classes and a registry pattern means each guardrail is a plugin — easy to add, test, and configure independently.
Acceptance Criteria
-
operator_use/guardrails/package exists with all base classes -
ActionValidatorhas abstractvalidate(tool_name, args, context)method -
ContentFilterhas abstractfilter(content, context)method -
PolicyEnginehasclassify_risk(action)returningsafe|review|dangerous - Registry pattern for registering and looking up guardrails
- Unit tests for base classes in
tests/security/test_guardrails_base.py
References
- OWASP LLM Top 10
- Design Doc:
docs/plans/2026-03-29-security-ai-guardrails-performance-design.md
Blocks
All Phase 1 issues (guardrail rules registered here), All Phase 2 issues
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels