Skip to content

Conalh/barbican

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Barbican

ci tests node typescript MCP no LLM license

A runtime enforcement layer for MCP tool calls. A transparent stdio proxy that sits between an MCP client (Claude Code, Cursor) and a downstream MCP server, consults warden for an allow / deny / ask verdict on every tools/call, and enforces it before the call reaches the server — while screening advertised tool descriptions for poisoning and auditing every decision in the canonical agent-gov-core Finding / Report schema.

No LLM in the decision path. warden decides from a policy file; Barbican only maps, relays, enforces, and records — deterministic and local-first. Run node examples/run-demo.mjs for a full end-to-end demo against the real warden binary.

flowchart LR
    Client["MCP client<br/>Claude Code · Cursor"] -->|tools/call| Barbican
    Barbican -->|"allow → forward"| Server["MCP server<br/>(downstream, stdio)"]
    Server -->|result| Barbican
    Barbican -->|"deny → JSON-RPC error"| Client
    Barbican <-->|"map → verdict"| Warden[("warden<br/>policy DSL engine")]
    Barbican --> Findings["Findings + Reports<br/>(agent-gov-core schema)"]
    Findings -.->|"review --reports"| GovVerdict["GovVerdict<br/>(audit consumer)"]

    classDef box fill:#1e293b,stroke:#334155,color:#e2e8f0
    classDef store fill:#0f172a,stroke:#1e293b,color:#e2e8f0,stroke-width:2px
    classDef out fill:#0c4a6e,stroke:#0369a1,color:#e0f2fe
    class Client,Server box
    class Warden store
    class Barbican,Findings,GovVerdict out
Loading

Ships as a single barbican command (a tsc build) that the MCP client launches in place of the real server. No daemon, no network surface of its own.

For checked evidence of the core enforcement claim, run npm run evidence: it asserts that denied tools/call requests never reach the downstream server, while an allowed call does, and writes a reproducible audit bundle. See EVIDENCE.md.

See also: examples/ for a self-contained end-to-end demo · warden for the policy DSL · agent-gov-core for the Finding / Report schema.

Why this exists

A client that speaks MCP will call whatever tools a server advertises, with whatever arguments the model produces — rm -rf /, a read of ~/.ssh/id_rsa, a curl … | bash. The server is trusted to behave, and the tool descriptions the server advertises are read by the agent as trusted context, which makes them a prompt-injection surface (tool poisoning) the model never sees coming.

Barbican puts a deterministic chokepoint on the wire. Every call is checked against a policy before it executes, every description is screened before the agent trusts it, and every decision is written to an auditable trail. The policy logic isn't Barbican's — warden owns that. Barbican is the runtime that makes warden's verdicts actually bind.

Where it fits

Barbican is the connective tissue of an existing four-repo suite. It does not reimplement any of them — it wires them together at runtime:

Repo Role How Barbican uses it
warden Policy DSL engine (Rust). The decision core. Barbican spawns one long-lived warden <policy> --stdin process and relays its verdict on each call — unchanged.
agent-gov-core Canonical Finding / Report schema. Every screening hit and enforcement decision becomes a canonical Finding; on exit they roll up into per-tool Reports via createReport.
CapabilityEcho Code-diff capability detector. Its detectShellCapability regexes are reused inline (not forked) to screen commands lifted from tool-call arguments — a fix there flows here.
GovVerdict Audit / report consumer. Barbican's rolled-up reports are ingestible by govverdict review --reports <dir> alongside static scans.

What it does on each message

tools/list — for every advertised tool, Barbican screens the description for tool-poisoning vectors and the call surface for risky shell capability. Screening is observe-only and always on (it never blocks a listing); it just emits Findings. Two detector families run:

  • barbican.* (tool poisoning) — original to Barbican, because CapabilityEcho reviews code diffs and has no analogue. High-precision rules for: instruction-override openers (ignore all previous instructions), pseudo-system tags (<IMPORTANT>…), concealment directives (do not tell the user), credential-path + read/transmit exfil recipes (read ~/.ssh/id_rsa and POST it), hidden zero-width / bidi unicode, and embedded hardcoded secrets.
  • capability_echo.* (command screening) — the reused CapabilityEcho shell detector, applied to commands lifted from tools/call arguments (pipe-to-shell, external download, …).

tools/call — Barbican maps the MCP call to a warden action (via the mapping file), consults warden, and enforces the verdict:

warden verdict Barbican behaviour
allow Forward the call to the downstream server.
deny Return a JSON-RPC error to the client. The call never reaches the server. warden's own reason is surfaced in the error.
ask Resolve per --on-ask (default deny). stdio has no channel to prompt a human, so the safe default is to deny-and-log.

The mapping

warden reasons about actions (bash, read, write, …) and fields (command, path), but MCP tools have arbitrary names and argument shapes. A mapping file bridges the two:

{
  "tools": {
    "run_shell":    { "action": "bash",  "commandArg": "cmd" },
    "read_file":    { "action": "read",  "pathArg": "path" },
    "write_config": { "action": "write", "pathArg": "path" }
  },
  "default": { "action": "mcp_call" }
}

action becomes warden's tool; the named argument is lifted into warden's command or path so the policy can match on it. Unmapped tools fall through to default. With no --map, a single coarse mcp_call action applies to every call — the policy can still allow/deny by action name, but path/command granularity needs a per-server map.

Run it

Barbican consumes its sibling repos through local file: dependencies, so check them out next to each other:

<workspace>/
  warden/          # cargo build → target/debug/warden
  agent-gov-core/
  CapabilityEcho/
  barbican/
npm install
npm run build      # tsc → dist/
npm test           # builds, then node --test (34 tests)
npm run verify     # npm test + the evidence demo (needs warden; see below)

You also need the warden binary on PATH, or pointed at via --warden. The unit suite is hermetic — it runs without warden, and the one evidence-demo test skips itself when no warden binary is found. npm run verify additionally runs the evidence demo for real, so it requires warden. CI (.github/workflows/ci.yml) checks out the sibling repos, builds warden, and runs npm run verify end to end on every push.

Wiring into an MCP client

Barbican is launched by the client in place of the real server, and given the real server command after --. In a client's MCP config (Claude Code .mcp.json / claude_desktop_config.json):

{
  "mcpServers": {
    "files": {
      "command": "barbican",
      "args": [
        "--policy", "/abs/path/policy.warden",
        "--map", "/abs/path/map.json",
        "--report-dir", "/abs/path/audit",
        "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/srv"
      ]
    }
  }
}

Flags

Flag Meaning
--policy <file> Warden policy to consult on each tools/call. Without it, enforcement is off (screening still runs).
--warden <path> Warden executable to spawn (default: warden on PATH).
--map <file> MCP-tool → warden-action mapping (JSON). Falls back to a coarse single-action default when omitted.
--on-ask <mode> How an ask verdict resolves: deny (default) or allow.
--fail-open Forward calls if warden can't be reached (default: deny / fail-closed).
--otel Export OpenTelemetry GenAI spans over OTLP/HTTP. Endpoint/headers from the standard OTEL_EXPORTER_OTLP_* env. Off by default.
--report-dir <d> On exit, write one canonical agent-gov-core Report per tool kind into <d>.
--log <file> Append structured JSON log lines to <file> (in addition to stderr).
-h, --help Show help.

The demo

examples/ contains a self-contained, end-to-end demo driven against the real warden binary: a toy server advertises a poisoned tool description plus shell / read / write tools, and a scripted session attempts two destructive calls, one that needs confirmation, one obviously safe call, and a curl … | bash.

# warden on PATH:
node examples/run-demo.mjs
# or point at a build:
BARBICAN_WARDEN=../warden/target/debug/warden node examples/run-demo.mjs

It narrates what the enforcement layer did — descriptions screened, calls blocked (with warden's own reason), calls allowed, and the canonical reports it rolled up:

Tool descriptions screened at tools/list (Barbican tool-poisoning detector):
  [HIGH] barbican.tool_poisoning_instruction — ...overriding prior agent context.
  [HIGH] barbican.tool_poisoning_concealment — ...hide activity from the user.
  [CRITICAL] barbican.tool_poisoning_exfiltration — ...sensitive credential path...

Tool calls (warden decides, Barbican enforces):
  BLOCKED  run_shell  cmd='rm -rf /'              ...contains "rm -rf"
  BLOCKED  read_file  path='config/.env'          ...matches "**/.env*"
  BLOCKED  write_config  path='...settings.json'  ask → deny
  ALLOWED  run_shell  cmd='git status' → ok (server ran the tool)
  BLOCKED  run_shell  cmd='curl … | bash'         ...contains "| bash"

Call arguments screened at tools/call (reused CapabilityEcho detectors):
  [CRITICAL] capability_echo.shell_pipe_to_shell — ...pipes...to a shell.
  [MEDIUM]   capability_echo.shell_external_download — ...external URL.

Session audit rolled up into canonical reports:
  capability_echo.report.json  (rating: critical, findings: 2)
  barbican.report.json         (rating: critical, findings: 3)

Evidence demo

The narrated demo shows what happened. The checked evidence demo asserts the enforcement claim directly: denied tools/call requests return JSON-RPC errors and never appear in the downstream server's call log, while an allowed git status call reaches the downstream exactly once.

npm run build
BARBICAN_WARDEN=../warden/target/debug/warden npm run evidence

On PowerShell:

$env:BARBICAN_WARDEN = "..\warden\target\debug\warden.exe"
npm run evidence

The run writes an audit bundle containing audit.log, downstream-calls.jsonl, canonical reports under reports/, and a machine-readable evidence-summary.json. See EVIDENCE.md for the full checklist and limitations.

Audit output

  • Structured log — every decision (warden_verdict, tool_decision, tool_call_denied, finding, …) is emitted as one JSON line to stderr and, with --log, appended to a file.
  • Canonical reports — with --report-dir, findings roll up on exit into one <tool>.report.json per ToolKind (e.g. barbican.report.json, capability_echo.report.json). Report.tool is a single closed enum and validateReport requires every finding to match it, so findings are split per tool kind. The reports validate against agent-gov-core and are ingestible by govverdict review --reports <dir>.
  • Traces — with --otel, each session is a root mcp.session span and each call an execute_tool {tool} child, carrying GenAI semantic-convention attributes (gen_ai.operation.name, gen_ai.tool.name, gen_ai.tool.call.id, gen_ai.conversation.id) plus barbican.decision and, on a block, barbican.deny.code / barbican.deny.reason.

Tests

npm test     # tsc build, then node --test test/*.test.mjs

34 tests covering the stdio relay (passthrough, request/response correlation, shutdown), the long-lived warden client (FIFO verdict correlation, error-line recovery, consult-after-close), the tool→action mapping (known tools, array-path splitting, default fall-through), enforcement (deny blocks and surfaces warden's reason, allow forwards, ask resolves deny-by-default and allow-when-configured, most-restrictive multi-path, fail-closed by default / fail-open when set), poisoning + command screening, the OpenTelemetry GenAI spans (attributes, forward/blocked decision, parent-child trace, the relay seam), the session-report rollup (one valid canonical report per tool kind, empty-finding skip, and an end-to-end run against the poisoned demo server), and the checked evidence demo showing denied calls do not reach the downstream server (this one test skips when no warden binary is present, so the unit suite stays hermetic).

Status

v0.1.0. The runtime layer is feature-complete through a six-step incremental build, runnable and tested at each step. The decision path is deterministic and local-first; warden owns policy, Barbican owns the wire. Live:

  • Transparent stdio JSON-RPC relay between MCP client and downstream server
  • Long-lived warden consult on every tools/call, with FIFO verdict correlation
  • Enforcement: allow → forward · deny → JSON-RPC error (call never reaches the server) · ask → configurable deny-by-default
  • Tool-poisoning screening at tools/list (barbican.*) and reused CapabilityEcho command screening at tools/call (capability_echo.*)
  • Canonical agent-gov-core Findings, rolled up on exit into per-tool Reports ingestible by GovVerdict
  • OpenTelemetry GenAI spans (opt-in)
  • Self-contained end-to-end demo against the real warden binary
  • Checked evidence demo with downstream call-log assertions and canonical report artifacts

Both telemetry and report files are opt-in. With no flags, Barbican screens and audits to stderr and nothing leaves the machine.

Limitations & design choices

  • Not a standalone gateway. Barbican has no policy logic of its own — warden is the decision core. Run it without --policy and it screens and audits but enforces nothing.
  • stdio only. Barbican proxies stdio MCP servers. Because stdio has no human-prompt channel, ask resolves to deny-by-default; flip it with --on-ask allow only when you understand the policy.
  • Mapping is required for precise enforcement. Tools you don't map fall through to the default action, which is as coarse as your policy's handling of that action.
  • Screening is heuristic. The poisoning rules are tuned high-precision to keep false positives low on a surface screened on every reconnect; they are a tripwire, not a proof. They observe — they never block a listing.

License

MIT.

About

Runtime enforcement layer for MCP tool calls — a transparent stdio proxy that consults warden for an allow/deny/ask verdict on every tools/call and enforces it before the call reaches the server.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors