-
Notifications
You must be signed in to change notification settings - Fork 0
agentm memory system
title: memory-system — design status: launched kind: design scope: feature area: agentm/memory governs: [scripts/harness_memory.py, harness/skills/memory/scripts/] parent: agentm-hld.md children: [memoryvault.md] seeded: 2026-06-20 approved: 2026-06-21
Note
LAUNCHED (lifted 2026-06-24, AG Phase 3; originally approved 2026-06-21). child-design — the Memory pillar, parent agentm HLD; inherits the Foundations HLD by reference. The largest pillar — the ground the other three stand on. status: launched (lifted into tracked wiki/designs/ 2026-06-24, AG Phase 3); the seam content has migrated to the launched memory-storage-seam design.
Memory is the durable record agentm keeps — what it has learned, the plans and designs it works on, the standards it holds — and the single disciplined path every caller takes to reach it. It is the largest of the four pillars and the ground the other three stand on: Experience grows it, Opinions keep their learned half in it, Personas draw their state from it.
It is built to compound: entries are typed, densely linked to their neighbors, and indexed for recall — so as agentm learns and reaches outward, the record grows into an interconnected knowledge base.
The substrate serves one aim: every caller reaches storage the same way, and nothing leaks a dependency downward. A small set of generic pieces — none knowing a specific backend, host, or tool — arranged so everything points inward: a caller reaches the memory engine, the engine reaches storage through the resolution plane and the seam, and the concrete backends, personas, and crickets tools all depend up. The substrate depends on nothing below it.
The components:
-
The memory engine — the verbs the rest of the system calls (
save·recall·forget, plus reflection) and the cross-cutting logic that must live exactly once: idempotency + content-hash CAS, soft-delete, token-budgeted recall, link integrity. -
The resolution plane — how a call finds its store without naming one: the config holds the backend choice; the selector (
backend_selection.py) maps a protocol name to a concreteStorageBackend, failing loud if it's missing. -
The storage seam — the one port to disk: a
StorageBackendcontract, a registry, an opaqueLocator, and the storage tiers. (The seam's full contract — the verbs, theLocatorguards, the tiers + the never-sync invariant, the reservedDerivedMaintenance— is the launched memory-storage-seam design; this pillar points down to it.) - Harness-state I/O — plan/progress/feature state is backend-aware: it routes through the seam to the active backend, so state and memory reach disk the same way.
The one-way rule, enforced. Routing and memory code may import the seam + selector (substrate) but never a concrete backend — the LC-8 gate (check-process-seam-import-direction.sh) fails the build on any import storage_vault. The backend is chosen at call time and injected through the abstract contract. Backends and tools point up; the substrate points at nothing below.
agentm classifies every memory on three independent axes: by kind (what sort of thing it is), by durability (does it survive the session?), and by ownership (whose space is it?). Kind drives retrieval; durability and ownership drive placement.
By kind. Every entry declares a kind — open-ended but conventional: preference · workflow · fix · domain-reference · idea · skill-pointer. Kind is load-bearing: it picks the storage group an entry is written to, and it scopes which entries a phase pulls into its recall budget (a /work pass recalls different kinds than a /plan pass). Kind is how a caller asks for the right sort of memory rather than a flat search over everything.
By durability + ownership. Two planes:
- Layer 0 — agent-harness memory (not durable): the host's context window — the running conversation. Short-lived; it fills and gets summarized away. Scratch space; the record lives below, on disk. (The layer the whole substrate exists to compensate for.)
- Layer 1 — the durable record (on disk): resolved at runtime, never a cached path. It splits into three ownership tiers, by ascending agent autonomy — and each tier is a place in the vault:
| Tier | What it holds | Where | The agent's hand |
|---|---|---|---|
| T1 — personal | the operator's own notes | the Obsidian vault above Agent/ (its siblings) |
reads + links; writes only when told, through a separate seam call |
| T2 — curated / collaborative | designs · plans · roadmaps, and the operator-directives the agent follows (voice, conventions, preferences) | the Agent/ root — our shared drive |
writes as needed and reports each change in the digest; revertable |
| T3 — agentm-sole | the agent's own learned memory + insights | Agent/Memory/ |
writes, curates, and prunes freely; no notice |
Personal (T1) sits outside agentm's vault_path (which is Agent/), so it is out of reach by default, not by policy alone. The agent writes there only through an explicit, separate storage-seam call that an operator request authorizes — keeping agent-controlled and user-controlled space cleanly apart (the memory-storage-seam design). Autonomous jobs work mostly in T3; when a job changes T2, that change lands in the digest for the operator to see and revert.
The diagram shows the nested ownership tiers (containment = autonomy) below the durability line; kind is the orthogonal third axis, carried in each entry's frontmatter.
The structure is opinionated — the agent controls most of it, so the design fixes the shape:
.../Obsidian/ T1 personal — the operator's vault; agentm does not touch it
├── Church/ · Home/ · Tech/ · … the operator's own notes
└── Agent/ T2 curated — agentm's vault_path; the shared drive
├── projects/<project>/ designs · plans (_harness) · roadmap · progress
├── _always-load/ operator-directives the agent follows (voice, conventions, non-negotiables)
├── preferences/ · feedback/ more operator-directives
├── _archive/ retired curated content (cold; recall skips it)
└── Memory/ T3 agentm-sole — the agent's own memory
├── _always-load/ the heat-promoted floor (flat — kind is not a subfolder here)
├── <kind>/<slug>.md the leaf: kind is the subfolder (insight/, domain-reference/, crystallized/, …)
├── _inbox/ raw reflection candidates (recall-excluded by default)
├── _idea-incubator/ agent-incubated ideas
├── _index.md a generated map of contents, rebuilt from frontmatter (never hand-kept)
├── _archive/ the cold zone (recall skips it)
└── _meta/ device-local sidecars — vector index, heat, embedding queue; derived, never synced as truth
T2 (curated) shape. Curated content lives under projects/<project>/ — each project carrying designs/, plans in _harness/, a roadmap, and a progress log — or in a named directive space at the Agent/ root. No loose notes at the root. The operator-directives the agent follows (voice, conventions, preferences, feedback) are curated: the operator owns them, the agent refines them and reports the change.
T3 (Memory) leaves reuse the live entry convention — <kind>/<slug>.md with the locked frontmatter (below); kind is the subfolder, because kind drives recall. The reserved leading-underscore folders (_always-load, _inbox, _archive, _idea-incubator, _meta) are recall-aware. Two consolidation kinds are designed-for: crystallized/ (the phase-close digest) and procedural/ (distilled how-to) — see How it grows.
Archive at every tier. Each tier carries a cold _archive/ for retired content. Recall skips it by default, so the record stays cheap to load as it grows; it opens only on deep research, an explicit ask, or granted permission. Nothing is hard-deleted — "prune" means move to _archive/ (markdown stays the source of truth; the revert-log is the undo). The per-tier archive policy follows the autonomy gradient: T1 archives only on an operator-confirmed proposal; T2 archives a curated artifact when its successor supersedes it; T3 the agent archives its own cold entries on its own. Capture describes the decay pipeline that feeds the archive.
A memory is one atomic entry: a markdown note with a locked frontmatter block, stored under its kind's group. The frontmatter is what makes it addressable and curatable:
-
kind— the classification above (routes storage, scopes recall). -
status—active/superseded— recall filters superseded entries out. -
always_load— the boot-injection gate: an entry withalways_load: trueis assembled into context at session start; the heat policy flips this field as the entry heats up or cools (the Experience design). -
supersedes— a back-link to the entry this one replaces (the supersession chain — see Capture). - plus
created/updated/tags/slug.
The content rule is engagement, not encyclopedia: an entry captures why it mattered here and how to apply it, not a generic description. One note is one fragment — small, indexable on its own — and densely linked to its neighbors. Beyond the typed supersedes: back-link, a note cross-references related entries with Obsidian [[wikilinks]], the native interconnection substrate; the engine keeps those links sound (link-integrity discovery across the corpus, notes_link_discovery.py). Linking is first-class: a well-connected fragment is what lets the vector index, and later the V6 knowledge-graph, retrieve over relationships, not just text.
Everything agentm remembers reaches disk through a single storage port — the seam. The discipline is load-bearing: there is exactly one way to storage, and every caller goes through it. Read bottom-to-top — three layers, each with one job:
-
The storage seam — the one port. The only layer that talks to a concrete store.
device-localandobsidian-vaultare interchangeable adapters; more can be added by implementing the same contract. Nothing above knows whether bytes land on the local filesystem or a synced vault — that ignorance is the point of a port. -
The memory engine — the one set of verbs.
save·recall·forget+ the cross-cutting logic, every verb reaching the store through the seam. - Inbound adapters — the ways in. In-process (always present, a plain library call, zero daemon — the simple local case) and the MCP server (opt-in, a thin transport shim forwarding to the same engine).
The MCP server belongs above the seam, as a client of it — never underneath (that would force a daemon onto the simple local case and invert the layering; V5-0 put the mutex + content-hash CAS into the storage primitives precisely so the system needs no daemon). The seam, selector, device-local backend, and MCP shim are genericizable → agentm substrate; the obsidian-vault backend implements the contract one-way up → a crickets backing plugin.
The MCP server carries five load-bearing design choices:
-
Singleton streamable-HTTP — one daemon, many sessions (
Mcp-Session-Id), collapsing the host fan-out to a single writer alongside the CLI. Not stdio: stdio spawns one server per client, giving N OS processes onvault_mutex— safe but not single-writer. -
Four snake_case tools:
memory_search·memory_recall·memory_append·memory_forget. Not dot-names: OpenAI-family MCP hosts reject them. -
Soft-delete:
memory_forgetflipsstatus → deleted+ stampsdeleted_at; the file is never unlinked. Not hard-delete: GDrive sync resurrects a hard-deleted file from propagation cache. -
Loopback-first (
127.0.0.1/ Unix socket): the remote tier (cross-device via OAuth 2.1 tunnel) is a deferred v1.1 addition. -
Built on FastMCP, pinned
>=3,<4— with the official MCP SDK as a named fallback. Not unpinned: a FastMCP major can move the transport surface under the server.
As-built vs. target. Today harness state routes through the seam, but memory entries and the MCP server still write the vault directly — reaching around the port. Routing save/recall/forget through the engine→seam and re-platforming the MCP tools is V5-14 (see References).
The goal is concurrency-safe writes: when two sessions save to the same synced backend at once, both land and neither corrupts the other. A memory reaches disk through the V5-0 write protocol — a single atomic_write (temp → fsync → rename), guarded by an advisory per-backend mutex (a mkdir lock with an mtime heartbeat + stale-takeover, living outside the synced store) and, for replace-style files, a content-hash compare-and-swap that re-reads inside the lock before committing. Coordination lives in these primitives, not a daemon. (Honest boundary: this is grounded at the seam — the backend.write body that composes these for the vault lives in the obsidian-vault plugin.)
Supersession is archive-then-replace, not overwrite. A new entry that evolves an old one flips the old to status: superseded and sets a supersedes: back-link from the new — so the history is an auditable link-chain, not a lost update or a numeric confidence score.
Archive, decay, and prune. Supersession is one road into the archive; the other is decay — an entry that goes cold over time is retired to its tier's _archive/ rather than left to bloat recall. Two arms run today: heat curation demotes a cold always-load entry to its group root, and /memory evolve archives a superseded entry. The fuller lifecycle — access-reinforced decay, consolidation tiers (episodic → semantic → procedural), the phase-close crystallization digest, and a whole-corpus dreaming pass that compacts supersession chains — is designed-for, framed in V6/V7. Two prerequisites gate any autonomous archive: the revert-log plus a derived-from provenance edge (so undoing a consolidated entry also undoes what was derived from it), and a staging gate — an autonomous job proposes archival to a staging inbox and the operator confirms via the digest before anything goes cold (_dream-staging/, the pre-approval inbox, is a separate place from _archive/, the cold store). Prune resolves to archive; there is no hard delete.
Writing T1 (personal) takes a separate, explicit call. The normal write path reaches T2/T3 inside Agent/. Personal content sits outside vault_path, so writing it goes through a distinct seam call that the operator's request authorizes — the line between agent-controlled and user-controlled space (the memory-storage-seam design).
Memory is injected by two hooks. At session start, the always-load set (entries gated by always_load: true) is assembled under a hard ~500 ms budget. On every prompt, a five-step engine runs under ~300 ms:
- tokenize the prompt;
- embed it (an API embedder, degrading to a local model, then a deterministic stub) and search the local sqlite-vec index for the top-K by cosine similarity;
-
keyword-grep (filtering
status: superseded); - merge (semantic-weighted: ~0.85 similarity + ~0.05 keyword);
- dedup against always-load, and return the top few.
Recall is token-budgeted and scoped by kind + phase — a phase pulls the kinds its budget allows, not a flat top-K over the whole corpus. The vector index is device-local and never synced (the local-index tier), with mtime-vs-indexed-at drift detection that falls back to grep when an entry changed since it was embedded. Recall skips every tier's _archive/ by default — the cold store stays out of the always-load floor and the per-prompt search, so a growing record does not inflate what each call pays. An --include-archive opt-in (mirroring --include-inbox) widens the walk over the archive for deep research, an explicit ask, or a granted request; the vector index still covers archived entries, so an opened search ranks them on the same budget.
The record is built to compound. Three things turn a growing pile of entries into a navigable knowledge base:
- Learning populates it. Reflection mines durable entries from finished sessions, and forward learning + deep research reach approved sources and the web for any task and bring back what's worth keeping — typed and filed like anything else (the Experience design). The more agentm works and reaches outward, the larger and richer the record. This is the growth engine.
-
It's interconnected by wikilinks — built. Every entry cross-references its neighbors with
[[wikilinks]], and the engine maintains link integrity. The links are the connective tissue: the substrate everything else builds on. - It becomes navigable by an index over the markdown — built → designed-for. Today a device-local vector index makes the corpus semantically searchable. The designed-for V6 layer extends this without moving the source of truth off markdown: a knowledge-graph (V6-2) extracts typed edges over the wikilinks (deterministic, no LLM) so relationships become a retrieval path — including multi-hop traversal; a SQLite metadata table, chunking, and RRF hybrid retrieval (V6-3/10/11) sharpen recall as the corpus grows past the hundreds. Graph and index are layers over the pages — markdown stays the source of truth; the graph is for navigation and discovery.
That is the trajectory toward a true knowledge base: a typed, densely-linked, indexed record that accumulates and compounds as agentm learns — the brain the other three pillars stand on. The typed entries, the wikilink substrate, and the vector index are built today; the knowledge-graph and the richer index are designed-for, framed in V6 (Risks).
- The seam design — memory-storage-seam holds the deep storage contract (A2-fold, launched 2026-06-24); this pillar points down to it.
-
Experience grows the sole-owned space — reflection, heat curation, dreaming write here (Experience design); the heat policy owns the
always_loadgate. - Opinions keep their learned half here — an opinion's vault supplement is a memory entry (Opinions design).
- Personas draw their state here — Memory is the pseudo-persona beneath all (Personas design).
-
crickets backs the store —
obsidian-vaultimplements theStorageBackendcontract, one-way up. - The runner + the digest — scheduled jobs (the job-runner design, designed-for) write T2/T3 on their own and surface T2 changes to the operator through the reporting capability's digest; both route by the ownership tiers above.
-
V6 indexed-recall is designed-for, not built. The recall loop is the pre-V6 form (semantic-weighted merge + grep fallback); the reserved
DerivedMaintenanceextension point is where V6 adds the knowledge-graph layer (typed edges over the wikilinks; multi-hop traversal), hybrid retrieval (RRF over BM25 + vector + graph), a SQLite metadata table, consolidation tiers, and chunking. Graph and index stay layers over the markdown (the pages remain source of truth). Framed designed-for; the spec lives in the V6 plan + the seam design. -
Kind-scoped recall — validate as-built. The
kindclassification and the entry frontmatter are live; confirm the currentrecall.pyactually enforces the kind/phase recall budget vs a flat top-K (a check at review). -
The decay/archive lifecycle is mostly designed-for. Heat curation + supersession-archive ship; access-reinforced decay, consolidation tiers, crystallization, self-healing lint, and the dreaming compaction pass are V6/V7, gated on the revert-log + the
derived-fromprovenance edge and the staging inbox. Until those land, archival stays manual (/memory evolve) or operator-confirmed. The migration to the three-tier folder layout is operator-gated and updates the hardwiredpersonal/_always-loadconstants inrecall.py/heat_policy.py/save.py. -
The seam content has migrated —
memory-storage-seamis launched (A2 ADR-fold, 2026-06-24); this pillar holds the pointer down to it. -
The kernel
storage_vault.pywas deleted — removed in V5-3 (commit d95468b); the vault backend now lives only in theobsidian-vaultplugin'sstorage_vault.py. - Re-audit triggers: confirm kind-scoped recall.
-
scripts/harness_memory.py—vault_path()resolver, config readers,resolve_project(), backend-aware*_state_file -
scripts/storage_seam.py+scripts/backend_selection.py— the seam contract + selector (deep detail in the seam design) -
scripts/capability_resolver.py— capability-availability (theenhances:runtime half) -
scripts/vault_lock.py—atomic_write·content_hash(CAS) ·vault_mutex(the V5-0 primitives) -
harness/skills/memory/scripts/—recall.py(5-step engine),vec_index.py(sqlite-vec index),save.py/evolve.py(write + supersession),notes_link_discovery.py(wikilink integrity) -
harness/hooks/—memory-recall-session-start,memory-recall-prompt-submit - memory-storage-seam design — the concurrent-write protocol, seam fail-loud selection, V5-3 cutover, routing plane, backend-aware harness state
- V5-14 — storage-convergence (memory-entry seam adoption + MCP re-platform); ROADMAP-MASTER ⑤
2026-06-26 — codified the three ownership tiers + the vault folder layout, the per-tier archive, and the decay pipeline (operator design pass). Reworked §"By durability + ownership" into three explicit tiers by ascending autonomy — T1 personal (the operator's vault above Agent/, written only through a separate explicit seam call), T2 curated/collaborative (the Agent/ shared drive: designs · plans · roadmaps + the operator-directives; autonomous writes reported in the digest, revertable), T3 agentm-sole (Agent/Memory/, fully autonomous) — reconciling the prior "co-owned"/"user-owned"/"agentm-sole" band labels. Added the opinionated vault layout (the T2 projects/<project>/ shape + the prescribed Memory/ internals: <kind>/<slug>.md leaves, the reserved folders, a generated _index.md, the crystallized/procedural consolidation kinds), an _archive/ at every tier with recall skipping it by default (the --include-archive opt-in), and the archive/decay/prune lifecycle (heat curation + supersession ship; the fuller decay → consolidation → crystallization → dreaming-compaction arc is designed-for, gated on the revert-log + provenance edge + a staging inbox; prune resolves to archive, never hard delete). Why not approve-before for curated content: the operator co-owns it, so autonomous-write-with-revert + a digest report is the lighter, correct gate; a non-revertable curated change is the one case that would warrant asking, accepted as a known limitation until agent-memory is git-backed (backlogged). Grounded in the 2026-06 research (R02 lifecycle/consolidation · R06 token-efficiency · R09 content-shapes) + the live save.py/recall.py/evolve.py/heat_policy.py. Re-audit triggers: flip the decay/consolidation arms to as-built as V6/V7 land; run the operator-gated vault migration (relocate today's Agent/personal/* learned-memory → Memory/, directives → T2, updating the hardwired personal/_always-load constants); confirm the new reporting capability's digest is the T2-change report surface.
2026-06-24 — folded ADR 0017 (MCP server) into §“How storage is served” (AG ADR-migration tail). Five DC calls, all surfaced into the body: singleton streamable-HTTP + broker property (DC-1); 4 snake_case tools memory_search · memory_recall · memory_append · memory_forget (DC-2, dot-names break OpenAI-family hosts); soft-delete status → deleted + deleted_at never unlinks (DC-3, not hard-delete: GDrive sync resurrects from propagation cache); loopback-first / deferred remote tier via OAuth 2.1 (DC-4); FastMCP >=3,<4 primary / official SDK named fallback (DC-5). Re-audit triggers: MCP spec deprecates streamable-HTTP; spec mandates dot-names; next FastMCP major; hard-delete obligation arises; homelab posture changes.
2026-06-21 — authored, reviewed, and finalized.
Migrated from the agentm HLD, deepened against the live code, and conformed to the abbreviated-design template (Objective / Overview / Design / Dependencies / Risks) with all three diagrams. Documents the Memory pillar: a substrate where everything points inward (engine → resolution → seam), classified on three axes — kind (routes storage + scopes recall), durability, and ownership — with each memory an atomic frontmatter-keyed entry (kind / status / always_load / supersedes; engagement-not-encyclopedia; one-note-one-fragment). Capture is concurrency-safe and archive-then-replace (a supersession link-chain); recall is the kind-scoped, token-budgeted five-step loop over a device-local, never-synced index.
Operator-approved after restoring the content-first axis (the kind taxonomy + the entry contract) the storage-first reframe had dropped. Content-final; status: launched (lifted into tracked wiki/designs/ 2026-06-24, AG Phase 3). Designed-for, not built: the V6 indexed-recall work (hybrid RRF · knowledge-graph · consolidation tiers · chunking) under the DerivedMaintenance reservation, and V5-14 storage-convergence (memory entries + the MCP server still reach around the seam today). Re-audit triggers: flip the V5-14 as-built flags when convergence lands; migrate the seam prose into memory-storage-seam at the A2 fold; confirm recall.py enforces the kind/phase recall budget (vs flat top-K).
2026-06-24 — folded ADR 0007 into this design (AG Phase 4, move-and-retire).
0007 — Auto-context into harness phases (2026-05-22; amended 2026-05-27 / 2026-05-28 / 2026-05-31). harness_memory.py injects recalled memory into each phase at session start. Five design calls: (Q1) per-phase recall budgets to prevent context bloat; (Q2) three-tier slug detection for kind / status / tags; (Q3) graceful-skip if the vault is unavailable; (Q4) confidence-modulated ask before injecting low-confidence entries; (Q5) dual-trigger progress.md (phase-end + explicit save). Amended 2026-05-27: memory files moved to agentm/harness/skills/memory/. Amended 2026-05-28: documenter recall phase added. Amended 2026-05-31: SessionStart hook vault-path resolution order — $MEMORY_VAULT_PATH env var → .agentm-config.json::vault_path → no vault (graceful skip). Why not always-load all entries: per-phase budgets prevent context bloat; graceful skip ensures vault unavailability never breaks a phase. Why not single-trigger progress.md: dual-trigger ensures progress is written even if the phase crashes mid-run. Re-audit trigger: when V6 indexed-recall lands, re-examine whether per-phase budgets are still the right token-control knob; if dual-trigger causes duplicate entries, consolidate to a single write-and-flush.
2026-06-21 — reopened (living-design amendment): backlinking made first-class + the interconnection/brain trajectory named (operator). The design implied the "grows into a knowledge base" direction but never stated it. Made Obsidian [[wikilinks]] first-class in the entry contract (link-integrity discovery, notes_link_discovery.py — built), added a How it grows design subsection tying the growth engine (forward learning + deep research → typed entries), the wikilink substrate (built), and the index-over-markdown (vector built → V6 knowledge-graph + metadata table + chunking + RRF designed-for) into one trajectory, and stated the trajectory in the Objective. Why not leave it implicit: the operator expects the store to grow into a true brain of knowledge, and the design should say so and name the interconnection path, not leave it to inference. The V6 graph/index stay framed designed-for (no source-attribution; markdown remains source of truth). Re-audit trigger: when V6-2 lands, flip the knowledge-graph from designed-for to as-built here.