This document has two parts: how to report a vulnerability, and the security best practices that shape Wardproof, the things you must get right so that the defensive framework itself does not become the weak link.
Please report security issues privately. Do not open a public GitHub issue.
- Email the maintainers (see repository metadata) with details and, if possible, a minimal reproduction.
- Expect an acknowledgement within a few days and a coordinated disclosure timeline. We credit reporters who wish to be credited.
A tool that guards other systems is a high-value target. If an attacker can subvert Wardproof, they can disable the very defences you rely on. The following are the practices we consider most important. They are ordered roughly by impact.
Every dependency and every line in the decision path is something you must trust. Wardproof keeps its core dependency-free on purpose. When you fork:
- Keep model-based logic out of the core decision path.
- Pin and review any optional dependency you enable.
- Prefer a few hundred lines of code you understand over a large framework you do not.
The model you use for second-opinion detection is itself injectable. Wardproof enforces this structurally: the LLM may only raise risk, never lower a deterministic guardrail signal, and an LLM exception is swallowed rather than allowed to fail open. Never let a model's "looks fine to me" override a hard rule.
- The ledger records the agents; an agent must never be able to rewrite history. Keep the ledger process/store separate from the protected plane.
- Use the hash chain always; it detects mutation, reordering, and deletion.
- For attribution, enable Ed25519 signatures (
[crypto]extra) so you can prove which signer appended each entry. - Export and verify regularly (
wardproof verify-ledger). TheWatchdogalso re-verifies periodically, but external verification is your backstop.
If signatures are enabled, the private key is the crown jewel.
- Wardproof writes generated keys with
0600permissions; keep it that way. - Prefer a KMS/HSM or sealed secret store in production over a file on disk.
- Rotate keys on a schedule and on any suspected compromise; record rotations in the ledger itself.
- Never commit keys.
.gitignoreexcludes*.keyand*.pem; verify before pushing.
This is the most important honesty in the project. The PermissionBroker +
SandboxExecutor stop an honest-but-confused or mildly compromised agent from
calling tools it shouldn't, with rate limits and argument validation. They do
not contain malicious native code, a registered tool runs with this
process's privileges.
- For untrusted or model-generated code, use real isolation: containers, gVisor,
Firecracker/microVMs, or WASM.
run_isolated_commandadds rlimits + timeouts but is still not a true sandbox. - Run the defensive process with least privilege (no ambient cloud creds, no broad filesystem or network access it doesn't need).
The agents that defend should not share a blast radius with what they defend. Separate credentials, separate network segments where feasible, and make sure a compromise of the protected app cannot reach in and silence the watchers or edit the ledger.
The Responder holds the authority to act (freeze, quarantine, mitigate). Grant
it the minimum set of tools, with strict argument validators and rate limits,
so a manipulated event cannot turn the Responder into an attacker's proxy.
High-impact actions should require ESCALATE to a human, not full autonomy.
Ambiguity, errors, and disagreement must resolve to the stricter outcome. The
stricter_verdict combine, the default-deny broker, the circuit breaker that
forces human escalation under alert storms, these are all expressions of one
rule: when in doubt, do less and tell a human.
Don't put API keys, private keys, or PII into event content or LLM prompts.
Guardrails inspect content; logs and ledgers persist it. Redact upstream.
- Pin dependencies and verify hashes for any optional extras you enable.
- Build signed releases and publish an SBOM (on the roadmap for v1.0).
- Review third-party guardrail/tool contributions with the same rigor as core changes, a malicious guardrail could under-report by design.
Pattern-based detection has both. Track the Watchdog signals (guardrail
bypass, anomalous agreement, breaker trips) and your own
false-positive/false-negative rates. A defence you don't measure is a defence
you can't trust.
Wardproof is pre-1.0. Security fixes are applied to the latest release only until the API stabilizes at v1.0.