Barbican

A runtime enforcement layer for MCP tool calls. A transparent stdio proxy that sits between an MCP client (Claude Code, Cursor) and a downstream MCP server, consults warden for an allow / deny / ask verdict on every tools/call, and enforces it before the call reaches the server — while screening advertised tool descriptions for poisoning and auditing every decision in the canonical agent-gov-core Finding / Report schema.

No LLM in the decision path. warden decides from a policy file; Barbican only maps, relays, enforces, and records — deterministic and local-first. Run node examples/run-demo.mjs for a full end-to-end demo against the real warden binary.

flowchart LR
    Client["MCP client<br/>Claude Code · Cursor"] -->|tools/call| Barbican
    Barbican -->|"allow → forward"| Server["MCP server<br/>(downstream, stdio)"]
    Server -->|result| Barbican
    Barbican -->|"deny → JSON-RPC error"| Client
    Barbican <-->|"map → verdict"| Warden[("warden<br/>policy DSL engine")]
    Barbican --> Findings["Findings + Reports<br/>(agent-gov-core schema)"]
    Findings -.->|"review --reports"| GovVerdict["GovVerdict<br/>(audit consumer)"]

    classDef box fill:#1e293b,stroke:#334155,color:#e2e8f0
    classDef store fill:#0f172a,stroke:#1e293b,color:#e2e8f0,stroke-width:2px
    classDef out fill:#0c4a6e,stroke:#0369a1,color:#e0f2fe
    class Client,Server box
    class Warden store
    class Barbican,Findings,GovVerdict out

Ships as a single barbican command (a tsc build) that the MCP client launches in place of the real server. No daemon, no network surface of its own.

For checked evidence of the core enforcement claim, run npm run evidence: it asserts that denied tools/call requests never reach the downstream server, while an allowed call does, and writes a reproducible audit bundle. See EVIDENCE.md.

See also: examples/ for a self-contained end-to-end demo · warden for the policy DSL · agent-gov-core for the Finding / Report schema.

Why this exists

A client that speaks MCP will call whatever tools a server advertises, with whatever arguments the model produces — rm -rf /, a read of ~/.ssh/id_rsa, a curl … | bash. The server is trusted to behave, and the tool descriptions the server advertises are read by the agent as trusted context, which makes them a prompt-injection surface (tool poisoning) the model never sees coming.

Barbican puts a deterministic chokepoint on the wire. Every call is checked against a policy before it executes, every description is screened before the agent trusts it, and every decision is written to an auditable trail. The policy logic isn't Barbican's — warden owns that. Barbican is the runtime that makes warden's verdicts actually bind.

Where it fits

Barbican is the connective tissue of an existing four-repo suite. It does not reimplement any of them — it wires them together at runtime:

Repo	Role	How Barbican uses it
warden	Policy DSL engine (Rust).	The decision core. Barbican spawns one long-lived `warden <policy> --stdin` process and relays its verdict on each call — unchanged.
agent-gov-core	Canonical Finding / Report schema.	Every screening hit and enforcement decision becomes a canonical `Finding`; on exit they roll up into per-tool `Report`s via `createReport`.
CapabilityEcho	Code-diff capability detector.	Its `detectShellCapability` regexes are reused inline (not forked) to screen commands lifted from tool-call arguments — a fix there flows here.
GovVerdict	Audit / report consumer.	Barbican's rolled-up reports are ingestible by `govverdict review --reports <dir>` alongside static scans.

What it does on each message

tools/list — for every advertised tool, Barbican screens the description for tool-poisoning vectors and the call surface for risky shell capability. Screening is observe-only and always on (it never blocks a listing); it just emits Findings. Two detector families run:

barbican.* (tool poisoning) — original to Barbican, because CapabilityEcho reviews code diffs and has no analogue. High-precision rules for: instruction-override openers (ignore all previous instructions), pseudo-system tags (<IMPORTANT>…), concealment directives (do not tell the user), credential-path + read/transmit exfil recipes (read ~/.ssh/id_rsa and POST it), hidden zero-width / bidi unicode, and embedded hardcoded secrets.
capability_echo.* (command screening) — the reused CapabilityEcho shell detector, applied to commands lifted from tools/call arguments (pipe-to-shell, external download, …).

tools/call — Barbican maps the MCP call to a warden action (via the mapping file), consults warden, and enforces the verdict:

warden verdict	Barbican behaviour
allow	Forward the call to the downstream server.
deny	Return a JSON-RPC error to the client. The call never reaches the server. warden's own reason is surfaced in the error.
ask	Resolve per `--on-ask` (default `deny`). stdio has no channel to prompt a human, so the safe default is to deny-and-log.

The mapping

warden reasons about actions (bash, read, write, …) and fields (command, path), but MCP tools have arbitrary names and argument shapes. A mapping file bridges the two:

{
  "tools": {
    "run_shell":    { "action": "bash",  "commandArg": "cmd" },
    "read_file":    { "action": "read",  "pathArg": "path" },
    "write_config": { "action": "write", "pathArg": "path" }
  },
  "default": { "action": "mcp_call" }
}

action becomes warden's tool; the named argument is lifted into warden's command or path so the policy can match on it. Unmapped tools fall through to default. With no --map, a single coarse mcp_call action applies to every call — the policy can still allow/deny by action name, but path/command granularity needs a per-server map.

Run it

Barbican consumes its sibling repos through local file: dependencies, so check them out next to each other:

<workspace>/
  warden/          # cargo build → target/debug/warden
  agent-gov-core/
  CapabilityEcho/
  barbican/

npm install
npm run build      # tsc → dist/
npm test           # builds, then node --test (34 tests)
npm run verify     # npm test + the evidence demo (needs warden; see below)

You also need the warden binary on PATH, or pointed at via --warden. The unit suite is hermetic — it runs without warden, and the one evidence-demo test skips itself when no warden binary is found. npm run verify additionally runs the evidence demo for real, so it requires warden. CI (.github/workflows/ci.yml) checks out the sibling repos, builds warden, and runs npm run verify end to end on every push.

Wiring into an MCP client

Barbican is launched by the client in place of the real server, and given the real server command after --. In a client's MCP config (Claude Code .mcp.json / claude_desktop_config.json):

{
  "mcpServers": {
    "files": {
      "command": "barbican",
      "args": [
        "--policy", "/abs/path/policy.warden",
        "--map", "/abs/path/map.json",
        "--report-dir", "/abs/path/audit",
        "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/srv"
      ]
    }
  }
}

Flags

Flag	Meaning
`--policy <file>`	Warden policy to consult on each `tools/call`. Without it, enforcement is off (screening still runs).
`--warden <path>`	Warden executable to spawn (default: `warden` on `PATH`).
`--map <file>`	MCP-tool → warden-action mapping (JSON). Falls back to a coarse single-action default when omitted.
`--on-ask <mode>`	How an `ask` verdict resolves: `deny` (default) or `allow`.
`--fail-open`	Forward calls if warden can't be reached (default: deny / fail-closed).
`--otel`	Export OpenTelemetry GenAI spans over OTLP/HTTP. Endpoint/headers from the standard `OTEL_EXPORTER_OTLP_*` env. Off by default.
`--report-dir <d>`	On exit, write one canonical agent-gov-core `Report` per tool kind into `<d>`.
`--log <file>`	Append structured JSON log lines to `<file>` (in addition to stderr).
`-h, --help`	Show help.

The demo

examples/ contains a self-contained, end-to-end demo driven against the real warden binary: a toy server advertises a poisoned tool description plus shell / read / write tools, and a scripted session attempts two destructive calls, one that needs confirmation, one obviously safe call, and a curl … | bash.

# warden on PATH:
node examples/run-demo.mjs
# or point at a build:
BARBICAN_WARDEN=../warden/target/debug/warden node examples/run-demo.mjs

It narrates what the enforcement layer did — descriptions screened, calls blocked (with warden's own reason), calls allowed, and the canonical reports it rolled up:

Tool descriptions screened at tools/list (Barbican tool-poisoning detector):
  [HIGH] barbican.tool_poisoning_instruction — ...overriding prior agent context.
  [HIGH] barbican.tool_poisoning_concealment — ...hide activity from the user.
  [CRITICAL] barbican.tool_poisoning_exfiltration — ...sensitive credential path...

Tool calls (warden decides, Barbican enforces):
  BLOCKED  run_shell  cmd='rm -rf /'              ...contains "rm -rf"
  BLOCKED  read_file  path='config/.env'          ...matches "**/.env*"
  BLOCKED  write_config  path='...settings.json'  ask → deny
  ALLOWED  run_shell  cmd='git status' → ok (server ran the tool)
  BLOCKED  run_shell  cmd='curl … | bash'         ...contains "| bash"

Call arguments screened at tools/call (reused CapabilityEcho detectors):
  [CRITICAL] capability_echo.shell_pipe_to_shell — ...pipes...to a shell.
  [MEDIUM]   capability_echo.shell_external_download — ...external URL.

Session audit rolled up into canonical reports:
  capability_echo.report.json  (rating: critical, findings: 2)
  barbican.report.json         (rating: critical, findings: 3)

Evidence demo

The narrated demo shows what happened. The checked evidence demo asserts the enforcement claim directly: denied tools/call requests return JSON-RPC errors and never appear in the downstream server's call log, while an allowed git status call reaches the downstream exactly once.

npm run build
BARBICAN_WARDEN=../warden/target/debug/warden npm run evidence

On PowerShell:

$env:BARBICAN_WARDEN = "..\warden\target\debug\warden.exe"
npm run evidence

The run writes an audit bundle containing audit.log, downstream-calls.jsonl, canonical reports under reports/, and a machine-readable evidence-summary.json. See EVIDENCE.md for the full checklist and limitations.

Audit output

Structured log — every decision (warden_verdict, tool_decision, tool_call_denied, finding, …) is emitted as one JSON line to stderr and, with --log, appended to a file.
Canonical reports — with --report-dir, findings roll up on exit into one <tool>.report.json per ToolKind (e.g. barbican.report.json, capability_echo.report.json). Report.tool is a single closed enum and validateReport requires every finding to match it, so findings are split per tool kind. The reports validate against agent-gov-core and are ingestible by govverdict review --reports <dir>.
Traces — with --otel, each session is a root mcp.session span and each call an execute_tool {tool} child, carrying GenAI semantic-convention attributes (gen_ai.operation.name, gen_ai.tool.name, gen_ai.tool.call.id, gen_ai.conversation.id) plus barbican.decision and, on a block, barbican.deny.code / barbican.deny.reason.

Tests

npm test     # tsc build, then node --test test/*.test.mjs

34 tests covering the stdio relay (passthrough, request/response correlation, shutdown), the long-lived warden client (FIFO verdict correlation, error-line recovery, consult-after-close), the tool→action mapping (known tools, array-path splitting, default fall-through), enforcement (deny blocks and surfaces warden's reason, allow forwards, ask resolves deny-by-default and allow-when-configured, most-restrictive multi-path, fail-closed by default / fail-open when set), poisoning + command screening, the OpenTelemetry GenAI spans (attributes, forward/blocked decision, parent-child trace, the relay seam), the session-report rollup (one valid canonical report per tool kind, empty-finding skip, and an end-to-end run against the poisoned demo server), and the checked evidence demo showing denied calls do not reach the downstream server (this one test skips when no warden binary is present, so the unit suite stays hermetic).

Status

v0.1.0. The runtime layer is feature-complete through a six-step incremental build, runnable and tested at each step. The decision path is deterministic and local-first; warden owns policy, Barbican owns the wire. Live:

Transparent stdio JSON-RPC relay between MCP client and downstream server
Long-lived warden consult on every tools/call, with FIFO verdict correlation
Enforcement: allow → forward · deny → JSON-RPC error (call never reaches the server) · ask → configurable deny-by-default
Tool-poisoning screening at tools/list (barbican.*) and reused CapabilityEcho command screening at tools/call (capability_echo.*)
Canonical agent-gov-core Findings, rolled up on exit into per-tool Reports ingestible by GovVerdict
OpenTelemetry GenAI spans (opt-in)
Self-contained end-to-end demo against the real warden binary
Checked evidence demo with downstream call-log assertions and canonical report artifacts

Both telemetry and report files are opt-in. With no flags, Barbican screens and audits to stderr and nothing leaves the machine.

Limitations & design choices

Not a standalone gateway. Barbican has no policy logic of its own — warden is the decision core. Run it without --policy and it screens and audits but enforces nothing.
stdio only. Barbican proxies stdio MCP servers. Because stdio has no human-prompt channel, ask resolves to deny-by-default; flip it with --on-ask allow only when you understand the policy.
Mapping is required for precise enforcement. Tools you don't map fall through to the default action, which is as coarse as your policy's handling of that action.
Screening is heuristic. The poisoning rules are tuned high-precision to keep false positives low on a surface screened on every reconnect; they are a tripwire, not a proof. They observe — they never block a listing.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
examples		examples
src		src
test		test
.gitignore		.gitignore
EVIDENCE.md		EVIDENCE.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Barbican

Why this exists

Where it fits

What it does on each message

The mapping

Run it

Wiring into an MCP client

Flags

The demo

Evidence demo

Audit output

Tests

Status

Limitations & design choices

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Barbican

Why this exists

Where it fits

What it does on each message

The mapping

Run it

Wiring into an MCP client

Flags

The demo

Evidence demo

Audit output

Tests

Status

Limitations & design choices

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages