Skip to content

Security: Impossible-Mission-Force/wardproof

Security

SECURITY.md

Security Policy

This document has two parts: how to report a vulnerability, and the security best practices that shape Wardproof, the things you must get right so that the defensive framework itself does not become the weak link.

Reporting a vulnerability

Please report security issues privately. Do not open a public GitHub issue.

  • Email the maintainers (see repository metadata) with details and, if possible, a minimal reproduction.
  • Expect an acknowledgement within a few days and a coordinated disclosure timeline. We credit reporters who wish to be credited.

Best practices for running a defensive framework

A tool that guards other systems is a high-value target. If an attacker can subvert Wardproof, they can disable the very defences you rely on. The following are the practices we consider most important. They are ordered roughly by impact.

1. Minimize the trusted computing base (TCB)

Every dependency and every line in the decision path is something you must trust. Wardproof keeps its core dependency-free on purpose. When you fork:

  • Keep model-based logic out of the core decision path.
  • Pin and review any optional dependency you enable.
  • Prefer a few hundred lines of code you understand over a large framework you do not.

2. Treat your own defensive LLM as untrusted

The model you use for second-opinion detection is itself injectable. Wardproof enforces this structurally: the LLM may only raise risk, never lower a deterministic guardrail signal, and an LLM exception is swallowed rather than allowed to fail open. Never let a model's "looks fine to me" override a hard rule.

3. The audit ledger must be append-only and live outside the agents

  • The ledger records the agents; an agent must never be able to rewrite history. Keep the ledger process/store separate from the protected plane.
  • Use the hash chain always; it detects mutation, reordering, and deletion.
  • For attribution, enable Ed25519 signatures ([crypto] extra) so you can prove which signer appended each entry.
  • Export and verify regularly (wardproof verify-ledger). The Watchdog also re-verifies periodically, but external verification is your backstop.

4. Protect the signing key

If signatures are enabled, the private key is the crown jewel.

  • Wardproof writes generated keys with 0600 permissions; keep it that way.
  • Prefer a KMS/HSM or sealed secret store in production over a file on disk.
  • Rotate keys on a schedule and on any suspected compromise; record rotations in the ledger itself.
  • Never commit keys. .gitignore excludes *.key and *.pem; verify before pushing.

5. The permission broker is a policy boundary, not a security boundary

This is the most important honesty in the project. The PermissionBroker + SandboxExecutor stop an honest-but-confused or mildly compromised agent from calling tools it shouldn't, with rate limits and argument validation. They do not contain malicious native code, a registered tool runs with this process's privileges.

  • For untrusted or model-generated code, use real isolation: containers, gVisor, Firecracker/microVMs, or WASM. run_isolated_command adds rlimits + timeouts but is still not a true sandbox.
  • Run the defensive process with least privilege (no ambient cloud creds, no broad filesystem or network access it doesn't need).

6. Isolate the defensive plane from the protected plane

The agents that defend should not share a blast radius with what they defend. Separate credentials, separate network segments where feasible, and make sure a compromise of the protected app cannot reach in and silence the watchers or edit the ledger.

7. Guard against the confused-deputy problem

The Responder holds the authority to act (freeze, quarantine, mitigate). Grant it the minimum set of tools, with strict argument validators and rate limits, so a manipulated event cannot turn the Responder into an attacker's proxy. High-impact actions should require ESCALATE to a human, not full autonomy.

8. Fail closed, everywhere

Ambiguity, errors, and disagreement must resolve to the stricter outcome. The stricter_verdict combine, the default-deny broker, the circuit breaker that forces human escalation under alert storms, these are all expressions of one rule: when in doubt, do less and tell a human.

9. Keep secrets out of prompts and events

Don't put API keys, private keys, or PII into event content or LLM prompts. Guardrails inspect content; logs and ledgers persist it. Redact upstream.

10. Mind the supply chain

  • Pin dependencies and verify hashes for any optional extras you enable.
  • Build signed releases and publish an SBOM (on the roadmap for v1.0).
  • Review third-party guardrail/tool contributions with the same rigor as core changes, a malicious guardrail could under-report by design.

11. Tune and monitor for false negatives and false positives

Pattern-based detection has both. Track the Watchdog signals (guardrail bypass, anomalous agreement, breaker trips) and your own false-positive/false-negative rates. A defence you don't measure is a defence you can't trust.


Supported versions

Wardproof is pre-1.0. Security fixes are applied to the latest release only until the API stabilizes at v1.0.

There aren't any published security advisories