Skip to content

Memory & context poisoning defense roadmap (OWASP ASI06) #1

@ashmdev

Description

@ashmdev

Context

Memory poisoning is the act of persisting malicious or adversarial data into an
AI agent's long-term memory so that it reshapes behavior on future sessions,
across users, or across tools. OWASP catalogues this threat as
ASI06 – Memory & Context Poisoning
in the OWASP Top 10 for Agentic Applications 2026, published by the
OWASP GenAI Security Project · Agentic Security Initiative.
For a persistent-memory product like Mnemo, ASI06 is the threat class that most
directly targets the product's core surface — the memory store itself.

Mnemo is publishing a defense roadmap aligned with ASI06 six months before the
work ships, so the community can challenge the framing, surface additional
attack vectors, and contribute to the red-team suite before the defense
primitives are frozen. This is the first public defense-roadmap commitment we
are aware of from a persistent-memory layer for AI coding agents — comments
identifying prior art are welcome.

Attack primitives in scope

The roadmap explicitly defends against:

  • Prompt-injection writes — a skill or agent tool-call is coerced into
    writing attacker-controlled content into memory under a legitimate-looking
    author. Defended by T-Q4W29-01 (Bayesian trust scoring) +
    T-Q4W29-03 (Policy-Bound Governance).
  • Adversarial consolidation hijack — a poisoned memory wins the
    consolidation ranker and displaces a canonical memory.
    Defended by T-Q4W29-01 signal weights (execution success rate,
    permissions-vs-category-norm penalty, abuse-report decay).
  • Cross-tenant collusion — skill A writes a memory in tenant X; skill B
    in tenant Y exfiltrates it through a shared retrieval path.
    Defended by T-Q4W29-03 immutable per-memory access policies.
  • Embedding poisoning — semantically-camouflaged content written to score
    high on cosine similarity for a targeted query class.
    Defended by T-Q4W29-02 red-team test suite + T-Q4W29-01 trust dampening.
  • Long-tail drift — slow erosion through many low-confidence writes,
    rather than a single high-impact injection.
    Defended by T-Q4W29-01 decay terms and trust-score thresholds.
  • Context exfiltration — crafted retrieval queries designed to reconstitute
    sensitive memory content via rank-and-leak.
    Defended by T-Q4W29-03 PBG read policies + T-Q4W29-04 SOC2 audit-log
    controls.

Additional primitives, reframings, or counter-examples are explicitly welcome
as comments below or PRs against
packs/core/skills/memory-protocol.md.

Defense roadmap

Four deliverables are scoped for W29–W30 of Q4 2026 (week-of-quarter, not
a calendar commitment):

  • T-Q4W29-01 · Bayesian trust scoring for memories + agents. Composite
    trust score per memory, computed locally on every retrieval. Signal weights
    (signature, author history, age, downloads, reviews, execution success rate,
    permissions-vs-category-norm penalty, abuse reports) calibrated during
    alpha. Memories below a configurable threshold are hidden from retrieval
    unless explicitly surfaced.
  • T-Q4W29-02 · Memory-poisoning red-team test suite. Reproducible
    attack fixtures for each of the six primitives above, executed in CI with
    pass/fail assertions on the trust-score and policy layers. Community
    fixtures accepted under a single-file, deterministic, no-external-services
    contract.
  • T-Q4W29-03 · Policy-Bound Governance (PBG) layer. Immutable per-memory
    policies attached at write time — read-only actors, retention bounds,
    cross-tenant flags. Policies travel with the memory through consolidation,
    tiering, and export.
  • T-Q4W29-04 · SOC2 Type I audit engagement + evidence collection plan.
    Independent third-party validation of the controls above. Audit logs from
    PBG and trust-score events feed the evidence collection.

Roadmap IDs are stable identifiers — they will not be renumbered. Progress
will be reported as comments on this issue, with PR links once each task
opens.

What help we want

  • Attack-vector PRs. If you can demonstrate a poisoning primitive that is
    not in the six listed above, open a comment with a minimal reproducer.
    Credited contributions will be listed in docs/SECURITY.md (to ship
    alongside the roadmap deliverables).
  • Red-team fixtures. T-Q4W29-02 will accept community fixtures that
    satisfy the reproducer contract (single file, deterministic, no external
    services). The contract itself will be specified at task start.
  • Threat-model review. Maintainers of adjacent projects (agent
    frameworks, memory libraries, plugin registries) are invited to review the
    primitive-to-defense mapping above and call out gaps.
  • Signal-weight calibration. Empirical data from other memory-scoring or
    reputation systems is welcomed as a gist or paper link in this thread.

Out of scope

  • This is not a CVE disclosure. No exploited vulnerability has been
    reported against Mnemo. This issue tracks forward-looking defense, not
    incident response. For suspected vulnerabilities, follow the standard
    responsible-disclosure path that will be documented in SECURITY.md.
  • This is not a bug report. Bug-report templates will ship in a
    follow-up.
  • This is not a Mnemo-Cloud-only commitment. The defense roadmap applies
    to the single-binary local-first path, the hosted Mnemo Cloud path, and
    any self-hosted deployment.
  • This is not a calendar-date commitment. The W29–W30 window is the
    reference; the shipping signal is the merged PR, not the issue.

References


Comments, counter-examples, and PRs welcome. Maintainer triage notes will be
posted inline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededroadmapPublic roadmap commitment or roadmap-tracking issuesecuritySecurity posture, threat model, vulnerability defense

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions