Memory & context poisoning defense roadmap (OWASP ASI06)

## Context

Memory poisoning is the act of persisting malicious or adversarial data into an
AI agent's long-term memory so that it reshapes behavior on future sessions,
across users, or across tools. OWASP catalogues this threat as
[**ASI06 – Memory & Context Poisoning**](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/)
in the *OWASP Top 10 for Agentic Applications 2026*, published by the
[OWASP GenAI Security Project · Agentic Security Initiative](https://genai.owasp.org/initiatives/agentic-security-initiative/).
For a persistent-memory product like Mnemo, ASI06 is the threat class that most
directly targets the product's core surface — the memory store itself.

Mnemo is publishing a defense roadmap aligned with ASI06 six months before the
work ships, so the community can challenge the framing, surface additional
attack vectors, and contribute to the red-team suite before the defense
primitives are frozen. This is the first public defense-roadmap commitment we
are aware of from a persistent-memory layer for AI coding agents — comments
identifying prior art are welcome.

## Attack primitives in scope

The roadmap explicitly defends against:

- **Prompt-injection writes** — a skill or agent tool-call is coerced into
  writing attacker-controlled content into memory under a legitimate-looking
  author. *Defended by `T-Q4W29-01` (Bayesian trust scoring) +
  `T-Q4W29-03` (Policy-Bound Governance).*
- **Adversarial consolidation hijack** — a poisoned memory wins the
  consolidation ranker and displaces a canonical memory.
  *Defended by `T-Q4W29-01` signal weights (execution success rate,
  permissions-vs-category-norm penalty, abuse-report decay).*
- **Cross-tenant collusion** — skill A writes a memory in tenant X; skill B
  in tenant Y exfiltrates it through a shared retrieval path.
  *Defended by `T-Q4W29-03` immutable per-memory access policies.*
- **Embedding poisoning** — semantically-camouflaged content written to score
  high on cosine similarity for a targeted query class.
  *Defended by `T-Q4W29-02` red-team test suite + `T-Q4W29-01` trust dampening.*
- **Long-tail drift** — slow erosion through many low-confidence writes,
  rather than a single high-impact injection.
  *Defended by `T-Q4W29-01` decay terms and trust-score thresholds.*
- **Context exfiltration** — crafted retrieval queries designed to reconstitute
  sensitive memory content via rank-and-leak.
  *Defended by `T-Q4W29-03` PBG read policies + `T-Q4W29-04` SOC2 audit-log
  controls.*

Additional primitives, reframings, or counter-examples are explicitly welcome
as comments below or PRs against
[`packs/core/skills/memory-protocol.md`](packs/core/skills/memory-protocol.md).

## Defense roadmap

Four deliverables are scoped for **W29–W30 of Q4 2026** (week-of-quarter, not
a calendar commitment):

- **`T-Q4W29-01` · Bayesian trust scoring for memories + agents.** Composite
  trust score per memory, computed locally on every retrieval. Signal weights
  (signature, author history, age, downloads, reviews, execution success rate,
  permissions-vs-category-norm penalty, abuse reports) calibrated during
  alpha. Memories below a configurable threshold are hidden from retrieval
  unless explicitly surfaced.
- **`T-Q4W29-02` · Memory-poisoning red-team test suite.** Reproducible
  attack fixtures for each of the six primitives above, executed in CI with
  pass/fail assertions on the trust-score and policy layers. Community
  fixtures accepted under a single-file, deterministic, no-external-services
  contract.
- **`T-Q4W29-03` · Policy-Bound Governance (PBG) layer.** Immutable per-memory
  policies attached at write time — read-only actors, retention bounds,
  cross-tenant flags. Policies travel with the memory through consolidation,
  tiering, and export.
- **`T-Q4W29-04` · SOC2 Type I audit engagement + evidence collection plan.**
  Independent third-party validation of the controls above. Audit logs from
  PBG and trust-score events feed the evidence collection.

Roadmap IDs are stable identifiers — they will not be renumbered. Progress
will be reported as comments on this issue, with PR links once each task
opens.

## What help we want

- **Attack-vector PRs.** If you can demonstrate a poisoning primitive that is
  not in the six listed above, open a comment with a minimal reproducer.
  Credited contributions will be listed in `docs/SECURITY.md` (to ship
  alongside the roadmap deliverables).
- **Red-team fixtures.** `T-Q4W29-02` will accept community fixtures that
  satisfy the reproducer contract (single file, deterministic, no external
  services). The contract itself will be specified at task start.
- **Threat-model review.** Maintainers of adjacent projects (agent
  frameworks, memory libraries, plugin registries) are invited to review the
  primitive-to-defense mapping above and call out gaps.
- **Signal-weight calibration.** Empirical data from other memory-scoring or
  reputation systems is welcomed as a gist or paper link in this thread.

## Out of scope

- This is **not** a CVE disclosure. No exploited vulnerability has been
  reported against Mnemo. This issue tracks forward-looking defense, not
  incident response. For suspected vulnerabilities, follow the standard
  responsible-disclosure path that will be documented in `SECURITY.md`.
- This is **not** a bug report. Bug-report templates will ship in a
  follow-up.
- This is **not** a Mnemo-Cloud-only commitment. The defense roadmap applies
  to the single-binary local-first path, the hosted Mnemo Cloud path, and
  any self-hosted deployment.
- This is **not** a calendar-date commitment. The W29–W30 window is the
  reference; the shipping signal is the merged PR, not the issue.

## References

- **OWASP**:
  [Agentic Security Initiative](https://genai.owasp.org/initiatives/agentic-security-initiative/) ·
  [OWASP Top 10 for Agentic Applications 2026](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/) ·
  [Agentic AI – Threats and Mitigations taxonomy](https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/)
- **Mnemo packs**:
  [`memory-protocol`](packs/core/skills/memory-protocol.md) — always-active
  contract behind persistent memory ·
  [`judgment-day`](packs/core/skills/judgment-day.md) — adversarial
  pre-merge review used on Mnemo itself
- **Repo posture**:
  [`README.md`](README.md) — repo split, closed-core engine, public
  community surface

---

Comments, counter-examples, and PRs welcome. Maintainer triage notes will be
posted inline.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory & context poisoning defense roadmap (OWASP ASI06) #1

Context

Attack primitives in scope

Defense roadmap

What help we want

Out of scope

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Memory & context poisoning defense roadmap (OWASP ASI06) #1

Description

Context

Attack primitives in scope

Defense roadmap

What help we want

Out of scope

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions