Skip to content

Agent Poisoning Defense

Nick Rygiel edited this page Jul 4, 2026 · 2 revisions

Agent and Prompt Poisoning Defense

Prompt injection and agent poisoning are control-plane failures. Public issues, pull requests, comments, documents, web pages, email, MCP tool output, external agent results, and uploads are untrusted data, not trusted instructions.

Required Defenses

  • Quote or wrap untrusted input as data.
  • Ignore instructions embedded inside source content.
  • Verify actors and triggers for write-capable workflows.
  • Split read and write workflows.
  • Keep secrets out of untrusted-input jobs.
  • Use least-privilege tokens.
  • Restrict network egress and tool arguments.
  • Snapshot inputs at trigger time.
  • Redact output sinks such as logs, artifacts, comments, and memory writes.
  • Require approval for mutation.
  • Preserve provenance.

Case study: https://flatt.tech/research/posts/poisoning-claude-code-one-github-issue-to-break-the-supply-chain/

Source doc: https://github.com/Protocol-Wealth/pwcli-core/blob/main/docs/agent-poisoning-defense.md

See also Adapter Control Demo.

Clone this wiki locally