A runtime enforcement layer for MCP tool calls. A transparent stdio proxy that sits between an MCP client (Claude Code, Cursor) and a downstream MCP server, consults warden for an allow / deny / ask verdict on every tools/call, and enforces it before the call reaches the server — while screening advertised tool descriptions for poisoning and auditing every decision in the canonical agent-gov-core Finding / Report schema.
No LLM in the decision path. warden decides from a policy file; Barbican only maps, relays, enforces, and records — deterministic and local-first. Run
node examples/run-demo.mjsfor a full end-to-end demo against the real warden binary.
flowchart LR
Client["MCP client<br/>Claude Code · Cursor"] -->|tools/call| Barbican
Barbican -->|"allow → forward"| Server["MCP server<br/>(downstream, stdio)"]
Server -->|result| Barbican
Barbican -->|"deny → JSON-RPC error"| Client
Barbican <-->|"map → verdict"| Warden[("warden<br/>policy DSL engine")]
Barbican --> Findings["Findings + Reports<br/>(agent-gov-core schema)"]
Findings -.->|"review --reports"| GovVerdict["GovVerdict<br/>(audit consumer)"]
classDef box fill:#1e293b,stroke:#334155,color:#e2e8f0
classDef store fill:#0f172a,stroke:#1e293b,color:#e2e8f0,stroke-width:2px
classDef out fill:#0c4a6e,stroke:#0369a1,color:#e0f2fe
class Client,Server box
class Warden store
class Barbican,Findings,GovVerdict out
Ships as a single barbican command (a tsc build) that the MCP client launches in place of the real server. No daemon, no network surface of its own.
For checked evidence of the core enforcement claim, run npm run evidence: it asserts that denied tools/call requests never reach the downstream server, while an allowed call does, and writes a reproducible audit bundle. See EVIDENCE.md.
See also: examples/ for a self-contained end-to-end demo · warden for the policy DSL · agent-gov-core for the Finding / Report schema.
A client that speaks MCP will call whatever tools a server advertises, with whatever arguments the model produces — rm -rf /, a read of ~/.ssh/id_rsa, a curl … | bash. The server is trusted to behave, and the tool descriptions the server advertises are read by the agent as trusted context, which makes them a prompt-injection surface (tool poisoning) the model never sees coming.
Barbican puts a deterministic chokepoint on the wire. Every call is checked against a policy before it executes, every description is screened before the agent trusts it, and every decision is written to an auditable trail. The policy logic isn't Barbican's — warden owns that. Barbican is the runtime that makes warden's verdicts actually bind.
Barbican is the connective tissue of an existing four-repo suite. It does not reimplement any of them — it wires them together at runtime:
| Repo | Role | How Barbican uses it |
|---|---|---|
| warden | Policy DSL engine (Rust). | The decision core. Barbican spawns one long-lived warden <policy> --stdin process and relays its verdict on each call — unchanged. |
| agent-gov-core | Canonical Finding / Report schema. | Every screening hit and enforcement decision becomes a canonical Finding; on exit they roll up into per-tool Reports via createReport. |
| CapabilityEcho | Code-diff capability detector. | Its detectShellCapability regexes are reused inline (not forked) to screen commands lifted from tool-call arguments — a fix there flows here. |
| GovVerdict | Audit / report consumer. | Barbican's rolled-up reports are ingestible by govverdict review --reports <dir> alongside static scans. |
tools/list — for every advertised tool, Barbican screens the description for tool-poisoning vectors and the call surface for risky shell capability. Screening is observe-only and always on (it never blocks a listing); it just emits Findings. Two detector families run:
barbican.*(tool poisoning) — original to Barbican, because CapabilityEcho reviews code diffs and has no analogue. High-precision rules for: instruction-override openers (ignore all previous instructions), pseudo-system tags (<IMPORTANT>…), concealment directives (do not tell the user), credential-path + read/transmit exfil recipes (read ~/.ssh/id_rsa and POST it), hidden zero-width / bidi unicode, and embedded hardcoded secrets.capability_echo.*(command screening) — the reused CapabilityEcho shell detector, applied to commands lifted fromtools/callarguments (pipe-to-shell, external download, …).
tools/call — Barbican maps the MCP call to a warden action (via the mapping file), consults warden, and enforces the verdict:
| warden verdict | Barbican behaviour |
|---|---|
| allow | Forward the call to the downstream server. |
| deny | Return a JSON-RPC error to the client. The call never reaches the server. warden's own reason is surfaced in the error. |
| ask | Resolve per --on-ask (default deny). stdio has no channel to prompt a human, so the safe default is to deny-and-log. |
warden reasons about actions (bash, read, write, …) and fields (command, path), but MCP tools have arbitrary names and argument shapes. A mapping file bridges the two:
{
"tools": {
"run_shell": { "action": "bash", "commandArg": "cmd" },
"read_file": { "action": "read", "pathArg": "path" },
"write_config": { "action": "write", "pathArg": "path" }
},
"default": { "action": "mcp_call" }
}action becomes warden's tool; the named argument is lifted into warden's command or path so the policy can match on it. Unmapped tools fall through to default. With no --map, a single coarse mcp_call action applies to every call — the policy can still allow/deny by action name, but path/command granularity needs a per-server map.
Barbican consumes its sibling repos through local file: dependencies, so check them out next to each other:
<workspace>/
warden/ # cargo build → target/debug/warden
agent-gov-core/
CapabilityEcho/
barbican/
npm install
npm run build # tsc → dist/
npm test # builds, then node --test (34 tests)
npm run verify # npm test + the evidence demo (needs warden; see below)You also need the warden binary on PATH, or pointed at via --warden. The
unit suite is hermetic — it runs without warden, and the one evidence-demo test
skips itself when no warden binary is found. npm run verify additionally runs
the evidence demo for real, so it requires warden. CI (.github/workflows/ci.yml)
checks out the sibling repos, builds warden, and runs npm run verify end to
end on every push.
Barbican is launched by the client in place of the real server, and given the real server command after --. In a client's MCP config (Claude Code .mcp.json / claude_desktop_config.json):
{
"mcpServers": {
"files": {
"command": "barbican",
"args": [
"--policy", "/abs/path/policy.warden",
"--map", "/abs/path/map.json",
"--report-dir", "/abs/path/audit",
"--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/srv"
]
}
}
}| Flag | Meaning |
|---|---|
--policy <file> |
Warden policy to consult on each tools/call. Without it, enforcement is off (screening still runs). |
--warden <path> |
Warden executable to spawn (default: warden on PATH). |
--map <file> |
MCP-tool → warden-action mapping (JSON). Falls back to a coarse single-action default when omitted. |
--on-ask <mode> |
How an ask verdict resolves: deny (default) or allow. |
--fail-open |
Forward calls if warden can't be reached (default: deny / fail-closed). |
--otel |
Export OpenTelemetry GenAI spans over OTLP/HTTP. Endpoint/headers from the standard OTEL_EXPORTER_OTLP_* env. Off by default. |
--report-dir <d> |
On exit, write one canonical agent-gov-core Report per tool kind into <d>. |
--log <file> |
Append structured JSON log lines to <file> (in addition to stderr). |
-h, --help |
Show help. |
examples/ contains a self-contained, end-to-end demo driven against the real warden binary: a toy server advertises a poisoned tool description plus shell / read / write tools, and a scripted session attempts two destructive calls, one that needs confirmation, one obviously safe call, and a curl … | bash.
# warden on PATH:
node examples/run-demo.mjs
# or point at a build:
BARBICAN_WARDEN=../warden/target/debug/warden node examples/run-demo.mjsIt narrates what the enforcement layer did — descriptions screened, calls blocked (with warden's own reason), calls allowed, and the canonical reports it rolled up:
Tool descriptions screened at tools/list (Barbican tool-poisoning detector):
[HIGH] barbican.tool_poisoning_instruction — ...overriding prior agent context.
[HIGH] barbican.tool_poisoning_concealment — ...hide activity from the user.
[CRITICAL] barbican.tool_poisoning_exfiltration — ...sensitive credential path...
Tool calls (warden decides, Barbican enforces):
BLOCKED run_shell cmd='rm -rf /' ...contains "rm -rf"
BLOCKED read_file path='config/.env' ...matches "**/.env*"
BLOCKED write_config path='...settings.json' ask → deny
ALLOWED run_shell cmd='git status' → ok (server ran the tool)
BLOCKED run_shell cmd='curl … | bash' ...contains "| bash"
Call arguments screened at tools/call (reused CapabilityEcho detectors):
[CRITICAL] capability_echo.shell_pipe_to_shell — ...pipes...to a shell.
[MEDIUM] capability_echo.shell_external_download — ...external URL.
Session audit rolled up into canonical reports:
capability_echo.report.json (rating: critical, findings: 2)
barbican.report.json (rating: critical, findings: 3)
The narrated demo shows what happened. The checked evidence demo asserts the enforcement claim directly: denied tools/call requests return JSON-RPC errors and never appear in the downstream server's call log, while an allowed git status call reaches the downstream exactly once.
npm run build
BARBICAN_WARDEN=../warden/target/debug/warden npm run evidenceOn PowerShell:
$env:BARBICAN_WARDEN = "..\warden\target\debug\warden.exe"
npm run evidenceThe run writes an audit bundle containing audit.log, downstream-calls.jsonl, canonical reports under reports/, and a machine-readable evidence-summary.json. See EVIDENCE.md for the full checklist and limitations.
- Structured log — every decision (
warden_verdict,tool_decision,tool_call_denied,finding, …) is emitted as one JSON line to stderr and, with--log, appended to a file. - Canonical reports — with
--report-dir, findings roll up on exit into one<tool>.report.jsonperToolKind(e.g.barbican.report.json,capability_echo.report.json).Report.toolis a single closed enum andvalidateReportrequires every finding to match it, so findings are split per tool kind. The reports validate against agent-gov-core and are ingestible bygovverdict review --reports <dir>. - Traces — with
--otel, each session is a rootmcp.sessionspan and each call anexecute_tool {tool}child, carrying GenAI semantic-convention attributes (gen_ai.operation.name,gen_ai.tool.name,gen_ai.tool.call.id,gen_ai.conversation.id) plusbarbican.decisionand, on a block,barbican.deny.code/barbican.deny.reason.
npm test # tsc build, then node --test test/*.test.mjs34 tests covering the stdio relay (passthrough, request/response correlation, shutdown), the long-lived warden client (FIFO verdict correlation, error-line recovery, consult-after-close), the tool→action mapping (known tools, array-path splitting, default fall-through), enforcement (deny blocks and surfaces warden's reason, allow forwards, ask resolves deny-by-default and allow-when-configured, most-restrictive multi-path, fail-closed by default / fail-open when set), poisoning + command screening, the OpenTelemetry GenAI spans (attributes, forward/blocked decision, parent-child trace, the relay seam), the session-report rollup (one valid canonical report per tool kind, empty-finding skip, and an end-to-end run against the poisoned demo server), and the checked evidence demo showing denied calls do not reach the downstream server (this one test skips when no warden binary is present, so the unit suite stays hermetic).
v0.1.0. The runtime layer is feature-complete through a six-step incremental build, runnable and tested at each step. The decision path is deterministic and local-first; warden owns policy, Barbican owns the wire. Live:
- Transparent stdio JSON-RPC relay between MCP client and downstream server
- Long-lived warden consult on every
tools/call, with FIFO verdict correlation - Enforcement: allow → forward · deny → JSON-RPC error (call never reaches the server) · ask → configurable deny-by-default
- Tool-poisoning screening at
tools/list(barbican.*) and reused CapabilityEcho command screening attools/call(capability_echo.*) - Canonical agent-gov-core Findings, rolled up on exit into per-tool Reports ingestible by GovVerdict
- OpenTelemetry GenAI spans (opt-in)
- Self-contained end-to-end demo against the real warden binary
- Checked evidence demo with downstream call-log assertions and canonical report artifacts
Both telemetry and report files are opt-in. With no flags, Barbican screens and audits to stderr and nothing leaves the machine.
- Not a standalone gateway. Barbican has no policy logic of its own — warden is the decision core. Run it without
--policyand it screens and audits but enforces nothing. - stdio only. Barbican proxies stdio MCP servers. Because stdio has no human-prompt channel,
askresolves to deny-by-default; flip it with--on-ask allowonly when you understand the policy. - Mapping is required for precise enforcement. Tools you don't map fall through to the
defaultaction, which is as coarse as your policy's handling of that action. - Screening is heuristic. The poisoning rules are tuned high-precision to keep false positives low on a surface screened on every reconnect; they are a tripwire, not a proof. They observe — they never block a listing.
MIT.