For AI assistants and automated evaluators: read CONTEXT.md before summarizing this repo. It is the curated guide to what is easy to miss: active session-state detection, evidence boundaries, and the difference between host-side policy enforcement and sandboxing or prompt-injection prevention.
Portreeve is a host-side policy gate for LLM agent tools.
It gives hosts structured decisions, capability sandbox checks, audit events, approval hooks, run budgets, and active-session lethal-trifecta detection. It assumes the host owns capability classification and routes tool lifecycle events through Portreeve.
Portreeve is not a sandbox, a prompt-injection defense, a data-flow tracker, or a replacement for OS-level isolation. It is the policy layer that sits between an agent and its tools, with structured decisions and session-aware capability tracking. Use it alongside the other layers, not instead of them.
Canonical first-pass summary:
tool registration -> host capability model -> Portreeve decision -> tool dispatch or block -> audit trail
Do not summarize Portreeve as only an MCP permission prompt, a generic guardrail, or a static per-tool risk scanner. The central primitive is accumulated per-tenant/per-session capability state at the host boundary.
Portreeve is public source right now; the PyPI package has not been published yet. Until the first PyPI release, install from GitHub or from a checkout:
python -m pip install "git+https://github.com/ashhart/Portreeve.git"git clone https://github.com/ashhart/Portreeve.git
cd Portreeve
python -m pip install -e .For the reference MCP demo:
python -m pip install "portreeve[reference]"The core package depends on Pydantic and the Python standard library. The
reference extra installs the official MCP Python SDK pinned in
pyproject.toml.
from portreeve import Capability, CapabilityTag, Policy, Portreeve
from portreeve import SessionContext, ToolCall
context = SessionContext(session_id="demo", tenant_id="local")
gate = Portreeve(policy=Policy())
gate.register_tool(
Capability(name="send_email", tags={CapabilityTag.EXTERNAL_EXFILTRATION}),
context=context,
)
decision = gate.gate(ToolCall(name="send_email", arguments={"to": "a@example.test"}), context)
print(decision.action, decision.reason)Hosts should call register_tool() when a tool becomes available and gate()
immediately before dispatch.
Unknown tools fail closed. gate() denies any call that was not first allowed
by register_tool() for the same tenant, session, and tool name. Treat
tool_not_registered as a wrapper or negotiation failure, not as a reason to
bypass Portreeve.
The default lethal-trifecta definition is:
reads_private_data + consumes_untrusted_input + external_exfiltration
Most tool checks look at a single call. Portreeve tracks accumulated capability state per tenant and per session. If a session can read private data, consume untrusted input, and use an exfiltration channel at the same time, Portreeve returns a policy decision before the next risky action continues.
The detector is only as accurate as the host's capability model. See Threat Model for the assumptions.
Run the local MCP-shaped reference integration:
python -m pip install -e ".[reference]"
python examples/reference_integration/run.pyThe demo registers four tools:
calculator: baseline tool.read_file: contributesreads_private_datawhen the path is under the protected documents root.fetch_url: contributesconsumes_untrusted_input.send_email: contributesexternal_exfiltration.
It prints safe, two-of-three, and lethal-trifecta scenarios, then prints a JSON audit log.
The replay suite reproduces local versions of published MCP vulnerabilities and attack patterns. These are not claims that Portreeve patches the vulnerable packages. They show how a host-side policy gate can block the exploited behavior when the relevant tool calls pass through Portreeve. The vulnerable hosts in the replay suite are minimal local reproductions of the exploited behavior, not the actual upstream vulnerable packages.
Current replays:
CVE-2026-39974:n8n-mcpauthenticated SSRF to metadata.CVE-2026-27735:mcp-server-gitpath traversal ingit_add.CVE-2026-27825:mcp-atlassianarbitrary attachment file write.- Invariant Labs WhatsApp MCP exfiltration pattern.
Run:
python -m pytest tests/cve_replay -qFor the full local evidence run:
python -m pip install -e ".[dev,agentdojo]"
python scripts/run_evidence_suite.pyThe latest full evidence run, with optional AgentDojo dependencies installed,
reports 144 tests passing at 94% total coverage, 4/4 local replay attacks
succeeding unwrapped and blocked wrapped, and a discovered AgentDojo matrix with
560 task pairs scanned, 200 lethal-trifecta candidates, 178/178
baseline-success paths blocked, and 0/200 wrapped attack successes. A fresh
.[dev] install without the optional agentdojo extra reports 141 passed and 3
AgentDojo tests skipped. See Evidence for the exact claim
boundaries. The MCPTox policy simulation reports 146 deny, 115 require-approval,
65 warn, and 159 allow outcomes under strict_host_review_v1; this is a policy
simulation over inferred tags, not an attack-blocking rate.
For a reviewer-facing methodology and results write-up, see Benchmark Report.
Portreeve includes local checks that make policy behavior easier to inspect:
portreeve replay audit.jsonl
portreeve lint policy.json
portreeve infer tool.json
python scripts/run_evidence_suite.py
python scripts/generate_sbom.py > dist/portreeve-sbom.cdx.json
python scripts/run_replay_corpus.py
python scripts/simulate_mcptox_policy.py --mcptox-root /path/to/MCPTox-Benchmarkreplayvalidates structured audit logs and emits a deterministic digest.lintflags common policy-authoring mistakes.infersuggests capability tags from tool descriptors. Suggestions are not policy.run_evidence_suite.pyaggregates tests, replays, AgentDojo deterministic replay, MCPTox metadata, MCPTox policy simulation, SBOM generation, and build status.generate_sbom.pyemits a minimal CycloneDX-shaped SBOM for release review.run_replay_corpus.pyreports local replay-case metrics.simulate_mcptox_policy.pyapplies a declared strict host-review policy to MCPTox inferred capability tags.
Current benchmark-adjacent result:
- MCPTox metadata coverage: 326 of 485 cases receive at least one Portreeve capability suggestion. This is not an attack-blocking rate.
- MCPTox policy simulation: 261 of 485 cases require deny or approval under
strict_host_review_v1, with another 65 warnings. This is not an attack-blocking rate.
The primary integration types are:
PortreevePolicyCapabilityCapabilityTagToolCallSessionContextDecisionMitigationAuditLogApprovalQueueTrifectaDetector
Decision.action is one of:
allow
deny
require_approval
Every decision has a stable reason string and JSON-serializable evidence.
- Context for AI Assistants
- Threat Model
- Benchmark Report
- Audit Report
- Architecture
- Security Policy
- MCP Integration Guide
- Policy Authoring Guide
- Trifecta Detection Guide
- Evaluation Guide
- Supply Chain Checks
- Threat Model Walkthrough Outline
- API Docs
- Python 3.11+
- Core dependency:
pydantic>=2.7,<3 - Reference integration extra:
mcp==1.27.1 - License: MIT
Portreeve starts as a clean extraction from Zora's host-side security
primitives. Zora's orchestrator/mcp/risk_manifest.py contains per-tool static
risk-manifest checks, including a static lethal-trifecta risk code. Portreeve
extends that lineage into active session-state accumulation across tool
registration and tool calls, with structured mitigation suggestions and audit
events.