Context
Memory poisoning is the act of persisting malicious or adversarial data into an
AI agent's long-term memory so that it reshapes behavior on future sessions,
across users, or across tools. OWASP catalogues this threat as
ASI06 – Memory & Context Poisoning
in the OWASP Top 10 for Agentic Applications 2026, published by the
OWASP GenAI Security Project · Agentic Security Initiative.
For a persistent-memory product like Mnemo, ASI06 is the threat class that most
directly targets the product's core surface — the memory store itself.
Mnemo is publishing a defense roadmap aligned with ASI06 six months before the
work ships, so the community can challenge the framing, surface additional
attack vectors, and contribute to the red-team suite before the defense
primitives are frozen. This is the first public defense-roadmap commitment we
are aware of from a persistent-memory layer for AI coding agents — comments
identifying prior art are welcome.
Attack primitives in scope
The roadmap explicitly defends against:
- Prompt-injection writes — a skill or agent tool-call is coerced into
writing attacker-controlled content into memory under a legitimate-looking
author. Defended by T-Q4W29-01 (Bayesian trust scoring) +
T-Q4W29-03 (Policy-Bound Governance).
- Adversarial consolidation hijack — a poisoned memory wins the
consolidation ranker and displaces a canonical memory.
Defended by T-Q4W29-01 signal weights (execution success rate,
permissions-vs-category-norm penalty, abuse-report decay).
- Cross-tenant collusion — skill A writes a memory in tenant X; skill B
in tenant Y exfiltrates it through a shared retrieval path.
Defended by T-Q4W29-03 immutable per-memory access policies.
- Embedding poisoning — semantically-camouflaged content written to score
high on cosine similarity for a targeted query class.
Defended by T-Q4W29-02 red-team test suite + T-Q4W29-01 trust dampening.
- Long-tail drift — slow erosion through many low-confidence writes,
rather than a single high-impact injection.
Defended by T-Q4W29-01 decay terms and trust-score thresholds.
- Context exfiltration — crafted retrieval queries designed to reconstitute
sensitive memory content via rank-and-leak.
Defended by T-Q4W29-03 PBG read policies + T-Q4W29-04 SOC2 audit-log
controls.
Additional primitives, reframings, or counter-examples are explicitly welcome
as comments below or PRs against
packs/core/skills/memory-protocol.md.
Defense roadmap
Four deliverables are scoped for W29–W30 of Q4 2026 (week-of-quarter, not
a calendar commitment):
T-Q4W29-01 · Bayesian trust scoring for memories + agents. Composite
trust score per memory, computed locally on every retrieval. Signal weights
(signature, author history, age, downloads, reviews, execution success rate,
permissions-vs-category-norm penalty, abuse reports) calibrated during
alpha. Memories below a configurable threshold are hidden from retrieval
unless explicitly surfaced.
T-Q4W29-02 · Memory-poisoning red-team test suite. Reproducible
attack fixtures for each of the six primitives above, executed in CI with
pass/fail assertions on the trust-score and policy layers. Community
fixtures accepted under a single-file, deterministic, no-external-services
contract.
T-Q4W29-03 · Policy-Bound Governance (PBG) layer. Immutable per-memory
policies attached at write time — read-only actors, retention bounds,
cross-tenant flags. Policies travel with the memory through consolidation,
tiering, and export.
T-Q4W29-04 · SOC2 Type I audit engagement + evidence collection plan.
Independent third-party validation of the controls above. Audit logs from
PBG and trust-score events feed the evidence collection.
Roadmap IDs are stable identifiers — they will not be renumbered. Progress
will be reported as comments on this issue, with PR links once each task
opens.
What help we want
- Attack-vector PRs. If you can demonstrate a poisoning primitive that is
not in the six listed above, open a comment with a minimal reproducer.
Credited contributions will be listed in docs/SECURITY.md (to ship
alongside the roadmap deliverables).
- Red-team fixtures.
T-Q4W29-02 will accept community fixtures that
satisfy the reproducer contract (single file, deterministic, no external
services). The contract itself will be specified at task start.
- Threat-model review. Maintainers of adjacent projects (agent
frameworks, memory libraries, plugin registries) are invited to review the
primitive-to-defense mapping above and call out gaps.
- Signal-weight calibration. Empirical data from other memory-scoring or
reputation systems is welcomed as a gist or paper link in this thread.
Out of scope
- This is not a CVE disclosure. No exploited vulnerability has been
reported against Mnemo. This issue tracks forward-looking defense, not
incident response. For suspected vulnerabilities, follow the standard
responsible-disclosure path that will be documented in SECURITY.md.
- This is not a bug report. Bug-report templates will ship in a
follow-up.
- This is not a Mnemo-Cloud-only commitment. The defense roadmap applies
to the single-binary local-first path, the hosted Mnemo Cloud path, and
any self-hosted deployment.
- This is not a calendar-date commitment. The W29–W30 window is the
reference; the shipping signal is the merged PR, not the issue.
References
Comments, counter-examples, and PRs welcome. Maintainer triage notes will be
posted inline.
Context
Memory poisoning is the act of persisting malicious or adversarial data into an
AI agent's long-term memory so that it reshapes behavior on future sessions,
across users, or across tools. OWASP catalogues this threat as
ASI06 – Memory & Context Poisoning
in the OWASP Top 10 for Agentic Applications 2026, published by the
OWASP GenAI Security Project · Agentic Security Initiative.
For a persistent-memory product like Mnemo, ASI06 is the threat class that most
directly targets the product's core surface — the memory store itself.
Mnemo is publishing a defense roadmap aligned with ASI06 six months before the
work ships, so the community can challenge the framing, surface additional
attack vectors, and contribute to the red-team suite before the defense
primitives are frozen. This is the first public defense-roadmap commitment we
are aware of from a persistent-memory layer for AI coding agents — comments
identifying prior art are welcome.
Attack primitives in scope
The roadmap explicitly defends against:
writing attacker-controlled content into memory under a legitimate-looking
author. Defended by
T-Q4W29-01(Bayesian trust scoring) +T-Q4W29-03(Policy-Bound Governance).consolidation ranker and displaces a canonical memory.
Defended by
T-Q4W29-01signal weights (execution success rate,permissions-vs-category-norm penalty, abuse-report decay).
in tenant Y exfiltrates it through a shared retrieval path.
Defended by
T-Q4W29-03immutable per-memory access policies.high on cosine similarity for a targeted query class.
Defended by
T-Q4W29-02red-team test suite +T-Q4W29-01trust dampening.rather than a single high-impact injection.
Defended by
T-Q4W29-01decay terms and trust-score thresholds.sensitive memory content via rank-and-leak.
Defended by
T-Q4W29-03PBG read policies +T-Q4W29-04SOC2 audit-logcontrols.
Additional primitives, reframings, or counter-examples are explicitly welcome
as comments below or PRs against
packs/core/skills/memory-protocol.md.Defense roadmap
Four deliverables are scoped for W29–W30 of Q4 2026 (week-of-quarter, not
a calendar commitment):
T-Q4W29-01· Bayesian trust scoring for memories + agents. Compositetrust score per memory, computed locally on every retrieval. Signal weights
(signature, author history, age, downloads, reviews, execution success rate,
permissions-vs-category-norm penalty, abuse reports) calibrated during
alpha. Memories below a configurable threshold are hidden from retrieval
unless explicitly surfaced.
T-Q4W29-02· Memory-poisoning red-team test suite. Reproducibleattack fixtures for each of the six primitives above, executed in CI with
pass/fail assertions on the trust-score and policy layers. Community
fixtures accepted under a single-file, deterministic, no-external-services
contract.
T-Q4W29-03· Policy-Bound Governance (PBG) layer. Immutable per-memorypolicies attached at write time — read-only actors, retention bounds,
cross-tenant flags. Policies travel with the memory through consolidation,
tiering, and export.
T-Q4W29-04· SOC2 Type I audit engagement + evidence collection plan.Independent third-party validation of the controls above. Audit logs from
PBG and trust-score events feed the evidence collection.
Roadmap IDs are stable identifiers — they will not be renumbered. Progress
will be reported as comments on this issue, with PR links once each task
opens.
What help we want
not in the six listed above, open a comment with a minimal reproducer.
Credited contributions will be listed in
docs/SECURITY.md(to shipalongside the roadmap deliverables).
T-Q4W29-02will accept community fixtures thatsatisfy the reproducer contract (single file, deterministic, no external
services). The contract itself will be specified at task start.
frameworks, memory libraries, plugin registries) are invited to review the
primitive-to-defense mapping above and call out gaps.
reputation systems is welcomed as a gist or paper link in this thread.
Out of scope
reported against Mnemo. This issue tracks forward-looking defense, not
incident response. For suspected vulnerabilities, follow the standard
responsible-disclosure path that will be documented in
SECURITY.md.follow-up.
to the single-binary local-first path, the hosted Mnemo Cloud path, and
any self-hosted deployment.
reference; the shipping signal is the merged PR, not the issue.
References
Agentic Security Initiative ·
OWASP Top 10 for Agentic Applications 2026 ·
Agentic AI – Threats and Mitigations taxonomy
memory-protocol— always-activecontract behind persistent memory ·
judgment-day— adversarialpre-merge review used on Mnemo itself
README.md— repo split, closed-core engine, publiccommunity surface
Comments, counter-examples, and PRs welcome. Maintainer triage notes will be
posted inline.