-
Notifications
You must be signed in to change notification settings - Fork 1
memory storage seam
title: Memory↔Storage Seam (agentm) status: launched seeded: 2026-06-13 approved: 2026-06-21 kind: design scope: arc area: agentm/storage parent: agentm-hld.md governs:
- scripts/storage_seam.py
- scripts/backend_selection.py
- scripts/harness_memory.py
- scripts/repo_registry.py
- scripts/storage_device_local.py
- scripts/vault_lock.py
- scripts/vault_backend_stub.py
- scripts/storage_conformance.py
This is the living storage-seam design for agentm. It absorbs six predecessor ADRs retired by the AG Phase 2 fold: ADR 0009 (on-host state-mode config), ADR 0012 (vault-write protocol), ADR 0013 (seam and fail-loud selection), ADR 0018 (V5-3 storage cutover, DC-1 reversed by ADR 0020), ADR 0019 (V5-6 routing-plane de-vaulting), and ADR 0020 (backend-aware harness state, the current truth for state routing). Decision history — rationale, why-not-the-alternative, re-audit triggers — is preserved in the Amendment log at the bottom. (AG Phase 2, 2026-06-21; see Design governance.)
The storage seam is agentm's boundary between the memory engine and its backing store. It gives the engine a small, stable set of verbs (resolve / read / write / list / exists / info / mkdir) operating on opaque Locator handles so the backing store is swappable — the engine never holds a pathlib.Path into the store. Selection picks the configured backend on startup; fail-loud refusal prevents silent fall-back to device-local if the configured backend is unavailable.
| Layer | What shipped | Module |
|---|---|---|
| Seam interface | Verbs + Locator / Capabilities / Info + BackendRegistry
|
storage_seam.py |
| Device-local backend |
~/.agentm/memory/ markdown root; bytes-mode atomic writes |
storage_device_local.py |
| Vault backend (plugin) | crickets obsidian-vault plugin — loaded on demand; kernel built-in deleted (V5-3, 2026-06-21) |
crickets/src/obsidian-vault/scripts/storage_vault.py |
| Vault write protocol |
vault_mutex + content-hash CAS + atomic fsync→rename; vendored to /memory skill |
vault_lock.py |
| Backend selection | 3-step chain: storage.backend config → $MEMORY_VAULT_PATH env → device-local; fail-loud on misconfiguration; capability-request matching (required=) |
backend_selection.py |
| Routing plane |
resolve_project returns {slug, project_locator, backend, …} in Locators; repo_registry rides the active backend; state_mode: vault → backend read-alias |
harness_memory.py, repo_registry.py
|
| Harness state I/O | Backend-aware: synced backend (capabilities.sync=True) → vault _harness/; else device-local <project>/.harness/. .project-mode=local opt-out wins over a synced backend |
harness_memory.py |
| Gate enforcement | No-Path-leak (AST, return-type scan); no routing-import of capability plugins (LC-8); routing conformance suite |
check-storage-seam-no-path-leak.py, check-process-seam-import-direction.sh, storage_conformance.py
|
scripts/storage_seam.py defines the seven seam verbs. They operate on, and return, the seam's own Locator type — an opaque, backend-relative key with only namespace operations (child, name, parts) — never a pathlib.Path. All I/O goes back through the verbs; the engine holds no filesystem assumption.
The no-Path-leak rule is gate-enforced: check-storage-seam-no-path-leak.py parses every scripts/storage_*.py file and flags any seam verb whose return annotation is a path type (however nested). A Path in a backend's body is fine; handing one back across the seam is the violation, and the return annotation is precisely where that surfaces.
Why opaque Locator, not Path: if a verb returned a Path, the leak would be silent and total — downstream code would treat it as a real handle and swapping backends would quietly stop working. The vocabulary mirrors fsspec's method names and named-protocol registry, but imports nothing external. Bare markdown is the floor.
Selection (scripts/backend_selection.py) maps the on-device install config to a concrete, registered backend via a three-step chain:
- Explicit
storage.backendvalue in.agentm-config.json→ that protocol name. -
$MEMORY_VAULT_PATHenv var set (non-empty) →vault(per-session escape hatch). - Else →
device-local(fresh-install default).
When the configured backend cannot be produced, select_backend raises StorageSelectionError and never falls back silently to device-local. Three refusal cases:
- The named backend's plugin is unregistered (
registry.get → None, vault excluded since it loads on-demand) →_install_plugin_messagenaming the missing backend and the installed alternatives. -
vaultselected but the obsidian-vault plugin is undiscoverable →_install_obsidian_vault_message. -
vaultselected but novault_pathconfigured to seed it → loud error. - Install config corrupt or
storage.backendpresent-but-not-a-string → loud error.
Why never demote: a silent fall-back to device-local is the single failure that mis-writes or orphans the vault. A loud refusal forces the operator to fix the install or change the config — always the correct outcome.
Capability-request matching (V5-7): select_backend(required=Capabilities(...)) raises CapabilityMismatchError (a StorageSelectionError subclass) if the resolved backend doesn't satisfy every True flag. required=None (default) is backward-compatible. --requires in _doctor_main is the pre-flight surface, sharing the code path with the runtime guard.
storage_preview mirrors select_backend: the doctor storage check calls storage_preview() — never-raising, never-mutating — which reuses the resolver's code path and error messages, so the preview an operator sees is byte-identical to the runtime refusal.
Import cycle guard (§2b DC-4): backend_selection.py lazy-imports harness_memory only at call time inside resolve_project / select_backend, never at module top-level. This is the one guarded lazy import this design permits; no other cross-seam top-level cycles are allowed.
Vault plugin discovery (post-V5-3): the kernel storage_vault.py was deleted in V5-3 (the kernel built-in) and replaced by the crickets obsidian-vault plugin. _load_vault_plugin_backend discovers the plugin via $OBSIDIAN_VAULT_SCRIPTS override → sibling checkout → plugin-cache path, execs scripts/storage_vault.py in the plugin dir, and returns the VaultBackend class. The registry slot is always empty at entry (no built-in pre-registers it); the finally block pops whatever the plugin registered, leaving the slot empty on exit.
harness_state_dir, read_state_file, and write_state_file are backend-aware. A single private helper, _state_backend_target(resolution), encodes the routing rule:
-
Synced backend active (
resolution["backend"].capabilities.sync is True+project_locatorpresent) → state targets<vault>/projects/<slug>/_harness/<file>via the backend verbs. -
Otherwise (no backend,
sync=False, missing root, or.project-mode=localopt-out) → state stays at<project_root>/.harness/<file>.
The .project-mode=local opt-out is checked first and wins over a synced backend — a per-repo or per-device opt-out is an explicit operator choice to keep that checkout's state off the shared vault.
Graceful degradation: on a machine with no vault, select_backend resolves device-local (sync=False) or resolve_project returns backend=None — either way _state_backend_target returns None and state falls to <project_root>/.harness/. This degradation is structural (a property of the discriminator), not a special code path.
Writes through the seam: write_state_file on a synced backend calls backend.write(locator, content), composing the full V5-0 vault-write protocol: vault_mutex + content-hash CAS + atomic fsync→rename. The device-local branch keeps the lighter atomic_write-only path.
Why route via capabilities.sync, not isinstance(backend, VaultBackend): harness_memory.py must not import the obsidian-vault plugin (the process-seam import-direction gate forbids it). The capabilities.sync flag is the contract; a backend that can't answer its capabilities degrades to device-local.
Session-start hook co-location: harness-context-session-start.sh/.ps1 derives progress.md as a sibling of the resolved PLAN_PATH (taking dirname), so it co-locates with whatever _harness/ the bridge resolved (vault or device-local). This ensures the locked DC-7 4-line output block fires when both files are present at the vault path.
scripts/vault_lock.py is the one canonical write library: vault_mutex (advisory lock-dir outside the vault at ~/.cache/agentm/locks/<sha256(realpath(vault))>/lock; heartbeat-liveness; bounded-block-with-backoff) + content_hash (sha256 CAS currency) + atomic_write (bytes-mode fsync→rename). It is vendored byte-identically to harness/skills/memory/scripts/vault_lock.py, enforced by check-vault-lock-parity.sh.
Key decisions:
- Lock lives outside the vault (never synced by Drive — a lock inside a synced tree is itself synced, which defeats mutual exclusion).
- CAS keys on content hash, not mtime (Drive re-downloads rewrite mtimes, producing false CAS matches; sha256 of bytes is the only reliable currency).
- One global lock per vault, not per-file (ownership-partitioning keeps real contention near zero; per-file locking adds machinery for contention that partitioning already prevents).
-
Plain
fsync, notF_FULLFSYNC(the cloud copy is the durability backstop; plainfsyncbefore rename keeps each uploaded snapshot internally consistent at the right cost). - Cross-device mutual exclusion is out of scope (impossible on Drive; out-of-band writers — another device, Obsidian itself — are the Phase-1 broker's problem).
The state mode (vault-backed vs. device-local) is an on-host configuration:
-
Device-level:
state_modekey in.agentm-config.json(set byinstall.sh --local-stateoragentm_config --state-mode). -
Per-repo override:
<project>/.harness/.project-modemarker — higher precedence than the device default. -
Neither: vault-first, guarded by a
ValueErrorso a missing/unreachable vault fails loudly.
state_mode: vault → backend read-alias (§2b): the canonical non-local value is "backend". Any "vault" value in config or per-repo marker is returned as "backend" at read time without rewriting the file. "vault" is retained as a deprecated CLI alias and normalized to "backend" at write time. No operator action required.
Why on-host, not in-vault: you cannot read a marker out of a store you do not have. Configuration must be reachable on a vault-less machine. .agentm-config.json is already read vault-free.
Why explicit, not inferred from vault_path == null: a null vault_path is ambiguous between never-configured and transiently-unreachable (Drive not yet mounted). Inferring local from absence causes a transiently-unreachable vault to silently split state across two stores — the V4 #35 split-brain class.
resolve_project returns {slug, project_locator, backend, project_root, layout}:
-
project_locatoris aLocator, not aPath. -
backendis the resolvedStorageBackend(vault plugin or device-local). -
_vault_projects_dir(backend: StorageBackend) -> Locatorcallsbackend.resolve("projects").
repo_registry functions take backend: StorageBackend and call backend.resolve("_meta", "repos.json") for the registry locator. On vault this resolves to <vault>/_meta/repos.json (same bytes as before V5-6); on device-local to ~/.agentm/memory/_meta/repos.json.
LC-8 gate (import direction): check-process-seam-import-direction.sh scans harness_memory.py and repo_registry.py for import storage_vault / from storage_vault import — the routing layer must never import capability plugins.
Routing conformance suite: storage_conformance.py's check_routing_repo_registry(make_backend) proves the register/list/unregister cycle on any conforming backend; exercised by RoutingConformanceReport against both DeviceLocalBackend and vault_backend_stub.VaultBackend.
The seam's normal write verb targets the active backend's store — Agent/, which holds the agent-controlled tiers (T2 curated, T3 agentm-sole; the memory-system tier model). The operator's personal space (T1) sits above the Agent/ folder, outside that store, so the normal verbs cannot reach it — its path lies above the store root the seam operates in.
Writing T1 takes a separate, explicit seam call, distinct from write, that targets a path above the store root and runs only when an operator request authorizes it. The separation is the guard: reaching T1 means calling a different verb deliberately, with no tier flag to set wrong — so the seam itself draws the line between agent-controlled space (write) and user-controlled space (the personal-write call, opened only on request).
[PENDING-IMPL — the personal-write call ships with the T1/T2/T3 tier work; the documenter flips this to as-built when the seam method lands. It composes the same V5-0 write protocol (vault_mutex + CAS + atomic fsync→rename) as the normal path; only the target root and the explicit-authorization gate differ.]
| Gate | What it checks |
|---|---|
check-storage-seam-no-path-leak |
Return types of seam verbs in storage_*.py — no Path escapes |
check-process-seam-import-direction (LC-8) |
harness_memory.py, repo_registry.py never import storage_vault
|
check-vault-lock-parity |
vault_lock.py byte-identical to vendored skill copy |
storage_conformance |
Universal verb battery + routing contract on any backend |
| Tag | Resolution |
|---|---|
| DC-1 → 0020 | The current truth for harness_state_dir / read_state_file / write_state_file is ADR 0020: backend-aware, not device-local-only. ADR 0018 DC-1 (device-local only) was reversed by ADR 0020. See §3 above. |
| DC-4 import | "No top-level cycle; one guarded lazy import at resolve_project." backend_selection.py lazy-imports harness_memory only inside call bodies, never at module top-level. All other cross-seam top-level imports are forbidden. See §2 above. |
state_mode |
The canonical non-local value is "backend". "vault" is a deprecated read-alias (normalized at read time, no file rewrite needed). See §5 above. |
storage_vault DELETE |
The kernel storage_vault.py was deleted in V5-3 (this commit, 2026-06-21). The vault backend now lives exclusively in the crickets obsidian-vault plugin. Tests use vault_backend_stub.VaultBackend. See §2 ("Vault plugin discovery") above. |
Cross-design pointer: ADR 0018's "re-audit if personal-private/ → personal/ rename ships" trigger fired in V5-3. See Foundations HLD (vault taxonomy).
This log preserves the decision history from the six retired ADRs. Each entry records the original decision, why-not-the-alternative, re-audit triggers, and any later amendments. Entries appear in chronological order (earliest first); the most recent (0020) is current truth.
Decision: State mode (vault vs. local) is configured on-host only — state_mode in .agentm-config.json, with an optional per-repo <repo>/.harness/.project-mode marker override. No configuration lives in the vault.
Why not in-vault markers: a marker inside the vault is structurally unreachable on a vault-less machine — the exact machine the vault-less feature targets. The retired design read .project-mode from <vault>/_harness/ (deleted); the harness_state_mode registry field (write-only, never read by any resolution path) was deleted as dead weight.
Why not overload the existing mode key: mode means install mode (source vs. release) — a different axis. Conflating "how this harness was installed" with "where it writes state" would require every reader to know which bits mean what.
Why explicit, not inferred from vault_path == null: a null vault_path is ambiguous between never-set and transiently-unreachable (Drive not yet mounted). Silent inference causes the V4 #35 split-brain class; an explicit setting is the only one that distinguishes "I chose local" from "my vault is late to mount."
Resolution order (DC-2): (1) repo-local <repo>/.harness/.project-mode marker; (2) device state_mode in .agentm-config.json; (3) neither → vault-first, guarded ValueError.
Re-audit triggers: .agentm-config.json fragments across multiple files; vault_path == null ambiguity is resolved (inference becomes safe); multi-vault-per-device becomes a real use case; per-repo marker proves to be the common path.
Decision: Ship the write-safety floor for N≥2 concurrent vault writers: per-vault advisory mutex (lock-dir outside vault, mtime heartbeat, bounded-block-with-backoff), content-hash CAS (replacing mtime CAS), atomic fsync→rename writer, broadened conflict-janitor, and operator pin-offline / ownership-partitioning guidance. Not the singleton MCP broker, SQLite-WAL journal, or encryption-at-rest.
Why not build the broker now: R4 sequences those as Phase 1 / V5-9 / optional. The floor unblocks N≥2 writers and is the prerequisite the broker imports — shipping the floor first keeps each diff reviewable and the cutover an expand→contract.
Why one global lock, not per-file: writes are short and rare, so one lock suffices. Ownership-partitioning (agents write different slugs → different files) already keeps contention near zero; per-file locking adds machinery for contention that doesn't exist.
Why content-hash over mtime: Drive re-downloads rewrite mtimes, producing false "changed" and false "unchanged" CAS verdicts. sha256 of bytes is the only reliable currency.
Why plain fsync, not F_FULLFSYNC: the cloud copy is the durability backstop; plain fsync before rename ensures each uploaded snapshot is internally consistent at the right cost for short markdown writes. fsync ≠ durable on macOS is a documented limit, not a bug.
Why vendored, not imported cross-tree (DC-9): the /memory skill scripts are self-contained by construction (three install scopes; top-level scripts/ is not installed into target prefixes). Byte-identity gate (check-vault-lock-parity) makes drift a hard CI failure.
Re-audit triggers: a write path ever approaches seconds (grows the stale window); top-level scripts/ ever ships into target prefixes (collapse the vendored copy to an import); vault runs on a non-synced/local-only backend (re-evaluate F_FULLFSYNC); a second machine ever becomes a writer (broker becomes mandatory).
Decision: Ship the seam interface (Locator-based verbs in storage_seam.py), the device-local backend, fail-loud selection (StorageSelectionError — never demote), and the doctor preview sharing the resolver's code path. The engine cutover (live read/write via the seam) was deferred to V5-3.
Why opaque Locator, not Path: a path-returning verb leaks a filesystem assumption silently and totally. The opaque handle means "swap the backend, the engine doesn't notice" holds by construction.
Why the no-Path-leak rule is AST-gated, not grepped: a filesystem backend legitimately writes Path all through its body. Only handing one back across the seam is the violation; the return annotation is precisely where that shows up.
Why registry.get → None is absence, not corruption: a registry miss is an absent backend (not a malformed key — that's ProtocolError / TypeError immediately). Folding a raise into get would scatter the fail-loud policy across every lookup; concentrating it in selection keeps "what a miss means" in one place.
Why never demote: a silent fall-back to device-local is the single failure that mis-writes or orphans the vault. A loud refusal forces the operator to fix the install — always the correct outcome for a misconfiguration.
Amendment — 2026-06-18 (V5-7): capability-request matching extends DC-4 (same fail-loud principle). select_backend(required=Capabilities(...)) raises CapabilityMismatchError on mismatch. Silent downgrade, partial satisfaction, and return-a-warning were all rejected for the same reason DC-4 carries: a wrong-but-running engine is worse than a stopped one. --requires in _doctor_main extends DC-5's preview/runtime shared path.
Re-audit triggers: a deployment needs a missing backend to degrade rather than block (add a config-gated soft mode); text is no longer the v1 currency (widen the verb contract deliberately); the V6 index needs change-detection finer than mtime survives.
Decisions: (DC-1 — reversed by ADR 0020, see §3 and the §2b block) make state functions device-local only (reversed). (DC-2) phase_recall → "" (context now V5-9 MCP). (DC-3) resolve_documenter_context → None. (DC-4) vault_path() raises StorageBackendNotInstalledError when storage.backend=vault + no vault accessible — load-bearing fail-loud guard. (DC-5) $MEMORY_VAULT_PATH env override is a graceful-skip (per-session escape hatch, not a durable commitment). (DC-6) Routing/index layer retained (resolve_project, _vault_projects_dir, repo_registry, etc. — probes for vault identity, not state reads). (DC-7) personal-private/ → personal/ rename shipped in lockstep with V5-3.
Why DC-4 in vault_path(), not only in select_backend(): the guard fires on every call path that reaches vault resolution, not only through explicit selection. Also avoids circular import (harness_memory.py does not import backend_selection.py).
Why DC-2 empty-string return, not removal: phase_recall is part of the frozen process-seam surface; callers get a no-op result, not an import error. V5-9 MCP replaces the recall capability at the process boundary.
DC-1 amendment — 2026-06-19 (ADR 0020 reverses DC-1): making state functions device-local only conflated recall (read-only context bundles, correctly device-local / MCP-owned) with execution state (PLAN.md / progress.md / ROADMAP.md, which a synced vault exists to hold). See ADR 0020 / §3.
Re-audit triggers: V5-9 MCP server is removed (restore a minimal recall path in the kernel); any installer auto-writes storage.backend=vault without ensuring a vault path; a harness state dir move (update harness_state_dir and callers); a proposal to write state through the routing layer (route through the seam instead).
Cross-design pointer: ADR 0018 DC-7's personal-private/ → personal/ rename fires ADR 0010's first load-bearing assumption re-audit trigger — see Foundations HLD — Vault internal taxonomy.
Decisions: (task 1) resolve_project / _vault_projects_dir speak Locators: signature change from (vault: Path) -> Path to (backend: StorageBackend) -> Locator; resolve_project returns {slug, project_locator, backend, project_root, layout}. (task 2) repo_registry rides the active backend: registry_locator(backend) -> Locator, all five public functions take backend: StorageBackend. (task 3) state_mode: vault → "backend" read-alias at both _read_config_state_mode and _read_project_mode; canonical value is "backend"; no file rewrite. (task 4) Gate extensions: no-Path-leak Pass 2 on routing functions; LC-8 block on routing-to-plugin imports; storage_conformance.py routing suite.
Why not preserve vault_path: Path in the resolve_project return value: V5-7 removed implicit vault selection from a bare vault_path config key (a configured path ≠ a chosen backend). Honoring a bare path would resurrect implicit inference and route state to a vault the selection chain deliberately did not pick — the exact split-brain ADR 0018 DC-4/DC-5 guard against.
Why state_mode: vault → "backend" at read time, not file rewrite: every operator's config written before V5-6 carries state_mode: vault; a file rewrite would require operator action for zero functional change. The alias makes the rename transparent.
Re-audit triggers: VaultBackend.write() is refactored to not hold vault_mutex internally (update _mutate_registry); "local" state-mode value ever needs a rename; V8 multi-agent / cross-machine support is designed (cross-machine registry coherence is device-specific by assumption today).
Decision: Reverse ADR 0018 DC-1. The three kernel state functions (harness_state_dir, read_state_file, write_state_file) are backend-aware: synced-backend active → vault _harness/; otherwise → device-local <project>/.harness/. Single discriminator: resolution["backend"].capabilities.sync. .project-mode=local opt-out wins over a synced backend.
Why reverse DC-1: ADR 0018 DC-1 conflated recall (read-only context bundles — correctly device-local / MCP-owned, per DC-2/DC-3) with execution state (PLAN.md / progress.md / ROADMAP.md, which the synced vault exists to hold across devices). The concrete failure: a storage.backend=vault project's plan was invisible at session start because the SessionStart hook read the empty device-local .harness/ instead of the vault's _harness/.
Why capabilities.sync, not isinstance(backend, VaultBackend): harness_memory.py must not import the obsidian-vault plugin (process-seam import-direction gate). The capabilities.sync flag is the contract; the implementation detail of which plugin provides it is invisible to the kernel.
Why the discriminator is the live backend, not the vault_path key: ADR 0019 removed vault_path from the resolution shape; V5-7 removed implicit vault selection from a config-supplied vault_path. Honoring a bare path would resurrect the implicit inference ADR 0018 DC-4/DC-5 guard against. The backend object is the single source of truth for "is a synced store active."
Why writes go through backend.write(), not raw path I/O: writing the vault directly would bypass vault_mutex + CAS + atomic_write, reintroducing the torn-write and lost-update hazards ADR 0012 exists to prevent.
Re-audit triggers: a future backend is sync=True but not an appropriate state home (add a more specific capability than sync); any change lets a backend-selection error propagate out of resolve_project (crash session boot instead of degrading); vault layout moves _harness/ out from under projects/<slug>/; the hook gains a second plan-discovery path that doesn't route through the bridge.
Decision: Writing T1 (the operator's personal space, above the Agent/ store root) is a separate seam call, distinct from the normal write verb, run only on an authorized operator request. The normal verbs target the agent-controlled store (Agent/ = T2/T3); T1 is unreachable through them, since its path lies above the store root. Grounds the tier model in memory-system.
Why not a tier flag on write: a flag is fat-fingerable — one wrong argument spills agent output into the operator's notes. A distinct verb makes reaching user space a deliberate act, so the seam itself carries the agent-controlled / user-controlled line; the call is the gate.
Why it still composes the V5-0 protocol: a T1 write is still a vault write — it takes the same vault_mutex + CAS + atomic fsync→rename floor; only the target root (above Agent/) and the authorization gate differ.
Re-audit triggers: the vault becomes git-backed (a non-revertable T1 write stops being a concern — revisit whether the gate can relax); a host exposes a finer permission surface for user space; multi-vault support lands (a per-project shared vault may need its own tier mapping).