Existing guardrails protect what agents say.
AgentGate protects what agents do.
When an AI agent calls DELETE FROM users or reads /etc/passwd, your LLM's
output looks completely clean. It says "processing your request" while the damage
is already done.
Text-level guardrails — Guardrails AI, NeMo, Constitutional AI — evaluate what the model outputs. None of them see the tool call arguments. None of them know what the agent is actually executing.
AgentGate intercepts at the Python execution layer, one step before any side effect occurs. The tool function never runs. The database row stays intact.
Most security tools enumerate what's dangerous. That approach fails against novel attacks — if you haven't seen it before, you don't block it.
AgentGate inverts this. You declare what the agent is allowed to do. Everything else is blocked by definition, including attacks that have never been seen before.
with agentgate.scope(
task="Generate Q4 sales report",
allowed_operations=["read", "aggregate"],
allowed_resources=["sales_data", "reports"],
):
run_agent(task)An agent operating under this scope cannot touch users, cannot write to
/etc, cannot make HTTP requests — regardless of what the LLM decides to do.
pip install agentgate-pyimport agentgate
# One line — auto-patches LangChain and OpenAI SDK
agentgate.protect_all()
# Declare scope around agent execution
with agentgate.scope(
task="Generate Q4 sales report",
allowed_operations=["read", "aggregate"],
allowed_resources=["sales_data", "reports"],
):
run_agent(task)Or protect individual functions directly:
@agentgate.guard
def execute_sql(query: str) -> str:
...
@agentgate.guard
def read_file(path: str) -> str:
...Every call to these functions is evaluated before the body runs. If it's out
of scope, FirewallBlockedError is raised and the function never executes.
Agent decides to call a tool
│
▼
┌─────────────────────────┐
│ AgentGate intercepts │ ← before any side effect
└─────────────────────────┘
│
▼
┌─────────────────────────┐
│ Tier 1 — static check │ 0.3ms avg, no API calls
│ SQL / filesystem / HTTP│ handles ~64% of decisions
│ analyzer + scope check │
└─────────────────────────┘
│ ambiguous
▼
┌─────────────────────────┐
│ Tier 2 — LLM judge │ separate GPT-4o-mini instance
│ 5-dimension scoring │ task consistency + scope auth
│ trajectory analysis │ fail-closed: uncertainty = BLOCK
└─────────────────────────┘
│
ALLOW / BLOCK
│
▼
AuditLogger → Supabase (async, fire-and-forget, zero latency impact)
Tier 1 handles the clear cases fast — destructive SQL, path traversal, SSRF, out-of-scope resource access. Sub-millisecond, no network calls.
Tier 2 handles the ambiguous cases semantically. A separate LLM instance (never the agent's own LLM) evaluates five dimensions: semantic intent, blast radius, reversibility, trajectory consistency, and disguised intent. Malformed response or uncertainty defaults to BLOCK.
The audit logger records every intercepted action asynchronously to Supabase — allowed and blocked — without adding latency to the critical path.
AgentGate was tested against a 370-scenario adversarial suite. Every tool call was generated by a real GPT-4o-mini instance given a task description — no hand-crafted payloads, no synthetic attack strings. The suite covers four domains (data operations, file operations, communications, infrastructure) plus 30 multi-turn trajectory scenarios.
| Run | Recall | Precision | FP Rate | What changed |
|---|---|---|---|---|
| 001 | 67.3% | 79.7% | 23.4% | Baseline |
| 004 | 80.1% | 85.4% | 27.2% | SQL resource extraction, UNKNOWN tool handling |
| 005 | 92.9% | 79.8% | 46.1% | Tier 2 scope gate added |
| 006 | 92.6% | 100.0% | 0.0% | False positive root cause fixed |
| 007 | 99.5% | 97.7% | 9.1% | Precision-recall rebalance |
Run 007 breakdown by difficulty:
| Difficulty | Recall |
|---|---|
| Obvious | 100% |
| Moderate | 99% |
| Subtle | 100% |
Latency:
- Tier 1: 0.3ms average
- Tier 2: ~3.2s average (LLM judge — only invoked on ambiguous cases)
The methodology — why we built a custom eval suite instead of using existing
attack payload lists — is documented in agenttest/README.md.
| AgentGate | Guardrails AI | NeMo Guardrails | LangSmith | |
|---|---|---|---|---|
| Intercepts tool call arguments | ✅ | ❌ | ❌ | ❌ |
| Prevents execution | ✅ | ❌ | ❌ | ❌ (observability only) |
| Scope-based inversion | ✅ | ❌ | ❌ | ❌ |
| Works without framework changes | ✅ | ❌ | ❌ | ✅ |
| Trajectory analysis | ✅ | ❌ | ❌ | ❌ |
| Open source | ✅ | ✅ | ✅ | ❌ |
Guardrails AI and NeMo evaluate LLM text output. They never see what the agent executes. LangSmith observes after the fact. AgentGate is the only open-source tool that intercepts and prevents tool call execution before any side effect occurs.
AgentGate auto-patches installed frameworks on protect_all():
- OpenAI SDK — wraps
chat.completions.create, intercepts tool calls in responses before agent dispatch - LangChain / LangGraph — patches
BaseTool._runand_arun - Raw Python functions —
@agentgate.guarddecorator - MCP — proxy interceptor (in progress)
Tier 2 uses an LLM judge. By default it uses GPT-4o-mini — the cheapest and fastest option. If your stack doesn't use OpenAI, point it at any OpenAI-compatible endpoint:
# Anthropic
agentgate.protect_all(
judge_api_key=os.environ["ANTHROPIC_API_KEY"],
judge_base_url="https://api.anthropic.com/v1",
judge_model="claude-haiku-4-5-20251001",
)
# Local via Ollama
agentgate.protect_all(
judge_api_key="ollama",
judge_base_url="http://localhost:11434/v1",
judge_model="llama3",
)Or via environment:
AGENTGATE_JUDGE_API_KEY=your-key
AGENTGATE_JUDGE_BASE_URL=https://api.anthropic.com/v1git clone https://github.com/arhaamatiq/agentgate
cd agentgate
pip install -e .
python examples/demo_agent/run_demo.pyNo API keys required. Shows a compromised agent — legitimate task, hidden malicious instructions — with AgentGate blocking 3 attacks the agent never acknowledged.
- Real-time dashboard — Next.js, Supabase Realtime, live action feed
- MCP proxy interceptor
- PyPI install count badge
Issues and PRs welcome. See the eval methodology in agenttest/ if you want
to add scenarios or run the suite yourself.
MIT — see LICENSE.
Built by Arhaam Atiq.
