Skip to content

memoryvault

github-actions[bot] edited this page Jun 25, 2026 · 5 revisions

title: "MemoryVault — Permanent agent memory via Obsidian-vault-folder + reflection sidecar" status: launched kind: design area: agentm/vault-taxonomy parent: agentm-memory-system.md children: [memoryvault/parts/write-primitives.md, memoryvault/parts/recall-loop.md, memoryvault/parts/reflection-and-recovery.md, memoryvault/parts/idea-ledger.md, memoryvault/parts/seed-pass.md, memoryvault/parts/discovery-mining.md] visibility: published author: Alex Herrero created: 2026-05-15 updated: 2026-06-25

MemoryVault — Permanent agent memory via Obsidian-vault-folder + reflection sidecar

Context

Objective

Built-in agent memory (Claude memories, Gemini personal context, etc.) is per-platform, opaque, lossy, and not composable across tools — bouncing between models fragments context every session. Today, every new Claude Code chat starts cold even on the same project; manual re-priming wastes tokens and loses nuance. MemoryVault is a file-based, agent-curated permanent memory layer that captures durable preferences / workflows / fixes via a reflection sidecar, recalls relevant entries automatically into every new prompt, and adapts the agent's behavior over time without explicit configuration. The goal: compound learning — each conversation makes the next one better, because the agent never forgets what already happened.

Background

The idea evolved through three stages:

  1. Initial framing — persistent memory on a single machine. Frustration with every agent session starting cold on the same project. The first sketch was a local file-based vault on disk: agent writes durable preferences to a known directory; agent reads them at session start. Better than built-in opaque memory, but bounded to one machine.

  2. Cross-model sync. As work moved between different agent surfaces and tools, the single-machine assumption broke — context captured in one tool was invisible in another. The vault shape extended to model-agnostic markdown-on-disk so any tool that could read files (or talk to a thin server fronting the files) could load the same memory.

  3. Cross-machine + integration with the user's own contextual memory. The final pivot: the user already runs a personal note-taking surface for their own thinking, with sync between devices. Rather than maintaining a separate agent-only vault, MemoryVault becomes a folder inside that existing surface — the agent's memory and the user's notes coexist in the same place, the user can inspect and edit what the agent learned, and the cross-device sync the user already has solves the multi-machine problem for free.

Base-skill dependencies (all shipped via prior plans): #3 evaluator (per-step review), #4 kill-switch + steer (long-running execution control), #5 commit-on-stop (crashed-session recovery), #6 design skill (this design doc is the first real dogfood of that skill).

Design

Overview

MemoryVault is a single memory skill in crickets with four sub-commands and four Claude Code hooks. The skill's on-disk format is markdown + YAML frontmatter inside a folder of the user's existing Obsidian vault, synced between devices via the user's existing setup. Writes default to MemoryVault/; reads default to everywhere in the Obsidian vault (rich grounding context from the user's existing notes); writes outside MemoryVault/ happen only on explicit user request or agent-proposed + user-confirmed (the permeable boundary).

Three architectural pillars drive the design:

  1. Reflection sidecar (write loop) — aggressive end-of-session sweep mines the conversation for 3 extraction categories (Successful Workflows / User Preferences / Fixes & Workarounds) → writes MemoryVault entries; a parallel sweep mines for follow-ups / project ideas / research candidates → writes to a single Ideas.md file at the user's vault root. Confidence-rated tri-modal routing: HIGH-confidence candidates auto-save; MEDIUM-confidence go through an interactive approve/edit/reject review prompt; LOW-confidence land in _inbox/ for batch review. Stop-event + idle-time hooks fire the sidecar; idle hook also recovers crashed sessions where Stop didn't fire (via .harness/session-id-<uuid>.{start,reflected} marker files).

  2. Hook-driven recall (read loop) — two-hook recall pattern: SessionStart loads MemoryVault/personal-private/_always-load/ (top ~20 high-priority always-relevant entries — dev-flow conventions, top preferences); UserPromptSubmit does a fresh relevance query against the rest of the vault on every user message and injects matches. Recall mechanism is sqlite-vec primary + grep+frontmatter alongside (merge results) — semantic recall for paraphrased relevance, keyword + frontmatter filter for high-precision queries, dedup before injection. Embeddings computed locally via sentence-transformers on save (async, non-blocking) — see crickets HLD for the v0.9.2 local-only refactor.

  3. Two-tier idea capture (the user-facing dividend) — when the reflection sidecar surfaces a project-idea candidate (not a memory-entry candidate), it writes a 2-sentence summary + Obsidian wikilink to ~/Obsidian/Ideas.md at the vault root, AND simultaneously does deep research (web search + cross-reference against existing MemoryVault entries + scan existing Obsidian notes) writing the rich context to MemoryVault/personal-private/_idea-incubator/<idea-slug>/. When the user decides to progress an idea to a real project, the incubator entry graduates to MemoryVault/personal-projects/<idea-slug>/. Incubator entries get garbage-collected after N months without engagement.

Infrastructure

  • Vault storage: filesystem-direct access to the synced Obsidian vault path. The agent uses Read/Write/Edit on the canonical path. The vault itself remains a user-curated Obsidian vault; MemoryVault is one folder inside it.

  • Vector index: sqlite-vec — a SQLite loadable extension written in C (.so/.dylib/.dll); language-agnostic at its core, callable from any language that can load SQLite extensions. v1 calls it from Python (via the sqlite-vec pip wheel, which ships prebuilt binaries for all 3 OSes — no compile step). Hook scripts that touch the index become Python scripts; Claude Code supports any executable hook. The C-extension origin is load-bearing for future flexibility: if hook latency becomes a problem, we can swap the caller layer to Rust or Go (each has working bindings to load sqlite-vec) without touching the on-disk format. The data is portable across caller languages.

    Index lives at MemoryVault/_meta/vec-index.db (a small binary file, sync-friendly). The index supplements the markdown files rather than replacing them — the .md files in Obsidian remain the canonical, human-readable, human-editable source of truth; the vec-index is a query-acceleration layer storing (path, embedding_vector, last_updated_ts) per file for fast semantic nearest-neighbor lookup. This separation matters in several ways:

    • Index is derivable, content is not. If the index gets corrupted or out of sync, it can be rebuilt deterministically from the .md files via /memory reindex. The reverse isn't true — losing the .md files loses the memory.
    • User-side edits don't need the agent. The user can open Obsidian, edit any entry directly, and the next save / reindex picks up the change. The index updates incrementally on each /memory save and /memory evolve call (async — doesn't block the save UX), and an on-demand /memory reindex handles user-side edits made outside the skill.
    • Sync surface stays simple. Only the .md files need conflict-resolution; the vec-index is local-machine-rebuildable, so cross-device sync conflicts in the binary file are non-fatal (resolution = rebuild on the device that lost the conflict).
    • Obsidian's existing tooling still works. Backlinks, tags, graph view, search — all of Obsidian's native features apply to MemoryVault entries because they're just markdown files in the vault. The vec-index is invisible to Obsidian; it's an agent-only concern.
  • Embeddings: local sentence-transformers is the only production mode as of v0.9.2 (2026-05-20) — see crickets HLD for the rationale (dual-mode API + local was the v1 design; collapsed to local-only because the primary operator is a Claude Ultra subscriber without a separate API key, and modern small-to-mid models deliver near-SOTA quality on desktop-class hardware). Embedding text = entry title + frontmatter tags + first paragraph of body. Default model: BAAI/bge-large-en-v1.5 (1024-d native; ~1.3GB on disk + ~1.5GB RAM; downloads lazily on first invocation; cached at ~/.cache/crickets/sentence-transformers/). Operators on low-spec hosts swap to a smaller model via the AGENT_TOOLKIT_EMBEDDING_MODEL env var escape hatch (still local — no API option). EMBEDDING_DIM = 1024 (bumped from 384 in v0.9.2).

  • Claude Code hooks (the orchestration surface):

    • SessionStart — load _always-load/ entries into the session's initial context.
    • UserPromptSubmit — do a relevance query against the prompt; inject matches; dedup against already-loaded entries.
    • Stop — fire reflection sidecar; mine the session for 3 categories + idea candidates; route via tri-modal logic.
    • Idle-time (a new crickets primitive added by this plan) — fire reflection sidecar if a session went silent > N minutes; also scan .harness/session-id-*.start marker files for orphans (crashed sessions where Stop didn't fire) and run reflection retroactively.
  • Runtime state: .harness/session-id-<uuid>.start written on SessionStart hook; renamed to .reflected after Stop hook reflection completes. All markers in .harness/ (gitignored, runtime-only). Markers GC'd at 30 days.

  • Skill home: crickets/skills/memory/SKILL.md with sub-command bodies; hooks in crickets/hooks/memory-recall-session-start/, crickets/hooks/memory-recall-prompt-submit/, crickets/hooks/memory-reflect-stop/, crickets/hooks/memory-reflect-idle/.

Detailed Design

Detailed Design splits into 8 subsections; each candidate-part for /design translate to consider. Subsections 1-7 are #7a (core); subsection 8 is #7b (discovery + mining), shipped after #7a is dogfooded for 1-2 weeks.

1. Write primitives — /memory save + /memory evolve

/memory save <kind> <slug> [--group <group>] [--always-load]: writes an entry at MemoryVault/<group>/<kind>/<slug>.md with frontmatter (kind, status: active, created, updated, tags, supersedes optional). Body = markdown content (free-form). On save: synchronously write file; asynchronously embed + index in sqlite-vec.

/memory evolve <old-path> <new-content> <reason>: atomic supersede. Steps: (1) Read the old entry; (2) write new entry with supersedes: <old-path> frontmatter + content from <new-content>; (3) atomically git mv old entry to MemoryVault/personal-private/_archive/<original-path>.YYYYMMDD.md (or filesystem rename equivalent — Obsidian-vault isn't necessarily a git repo); (4) update old entry's frontmatter status: superseded + superseded_by: <new-path> cross-link; (5) trigger vec-index update for both. Recall filter skips status: superseded entries by default.

Tri-modal confidence routing applies when these primitives are called by the reflection sidecar — direct user invocation always writes immediately.

2. Reflection sidecar — /memory reflect + Stop + idle hooks

Three trigger surfaces, all running the same reflection logic:

  • Manual /memory reflect [--session <path>] — user-initiated; runs against the current Claude Code session transcript (default) or a specified transcript path.
  • Stop-event hook — fires automatically on session Stop. Same logic, scoped to the just-ended session's transcript.
  • Idle-time hook — fires when Claude Code has been idle > N minutes (N TBD, default 30); also scans for orphan .harness/session-id-*.start markers (crashed sessions) and runs reflection on those transcripts retroactively.

Reflection logic (shared across all three triggers):

  1. Read the session transcript (Claude Code stores them at ~/.claude/projects/<repo>/<session-id>.jsonl).
  2. Two parallel mining passes:
    • 3-category mine (MemoryVault destination): scan for Successful Workflows ("I noticed the agent successfully did X for Y reason"), User Preferences ("user said 'always do X' / user manually corrected the agent's output"), Fixes & Workarounds ("hit error Z, resolved by W"). Each candidate gets a confidence rating (HIGH/MEDIUM/LOW per heuristics in the tri-modal section).
    • Idea-candidate mine (user-vault destination): scan for follow-ups ("we should also do X later"), future project ideas ("this could be its own project"), research candidates ("worth investigating Z"). Each candidate gets a 2-sentence summary + the agent does deep research (per subsection 5).
  3. Tri-modal routing for 3-category candidates:
    • HIGH-confidence (explicit user signal: "always X" phrasing / locked-design-call / user-corrected-agent) → auto-save via /memory save.
    • MEDIUM-confidence (pattern-inferred: 3+ occurrences) → interactive review prompt with approve / edit / reject / skip / supersede-existing-X options. Controlled by memory.review_mode: interactive (default) | silent.
    • LOW-confidence (single-instance inference) → write to MemoryVault/personal-private/_inbox/<slug>.md for batch review later.
  4. Idea-candidates always go to two destinations simultaneously: 2-sentence summary appended to ~/Obsidian/Ideas.md + deep-research entry written to MemoryVault/personal-private/_idea-incubator/<idea-slug>/. Ideas.md never goes through tri-modal routing — all ideas surface in user-facing inbox.

3. Recall hooks — SessionStart + UserPromptSubmit

Two-hook recall pattern. SessionStart loads the always-load core; UserPromptSubmit injects task-shifted relevance per user prompt.

SessionStart hook:

  1. Glob MemoryVault/personal-private/_always-load/*.md.
  2. Read each, format as a single Markdown block, inject into session context.
  3. Output: a "Loaded N MemoryVault always-load entries" line for transparency.

UserPromptSubmit hook:

  1. Take the user's prompt as the query.
  2. Run the recall engine (subsection 4) → returns top-K (K=5 default) relevant entries.
  3. Dedup against already-loaded _always-load entries (by path).
  4. Inject the remaining matches as a system message before the agent processes the prompt.
  5. Output: a "Loaded N relevant entries: " line for transparency (this is how the user knows what memory shaped the response).

Both hooks have a hard time-budget (SessionStart 500ms, UserPromptSubmit 300ms) — if exceeded, log warning and proceed with partial results rather than blocking.

4. Recall engine — sqlite-vec + grep+frontmatter merge

Query path:

  1. Embed the query (local sentence-transformers only as of v0.9.2; synchronous; on time-budget overrun, fall back to grep-only).
  2. Vec search: query sqlite-vec for top-K nearest entries by cosine similarity. Returns (path, similarity_score).
  3. Grep + frontmatter search (parallel to vec): scan entry titles + tags + first-line content for keyword matches; filter by frontmatter (status: active only; respect group filter if query specifies).
  4. Merge results: union the two result sets; rank by combined score (similarity_score * 0.7 + keyword_match_count * 0.3); dedup.
  5. Return top-K (default K=5).

Filtering:

  • Always filter status: superseded (recall never surfaces superseded entries by default).
  • Optional --group <group> flag to scope recall to one group.
  • Optional --include-inbox flag for explicit access to _inbox/ entries (default excluded).

5. Idea ledger — Ideas.md + _idea-incubator/

When the reflection sidecar surfaces an idea candidate:

  1. Surface entry (user-facing): append a section to ~/Obsidian/Ideas.md at the vault root. Section format:
    ## YYYY-MM-DD: <Idea Title>
    <2-sentence summary of the idea>
    See deep research: [[MemoryVault/personal-private/_idea-incubator/<idea-slug>/_index.md]]
    
  2. Deep research (agent-facing): create MemoryVault/personal-private/_idea-incubator/<idea-slug>/ directory with:
    • _index.md — full agent reasoning about the idea, frontmatter (kind: idea, status: incubating, surfaced_in_session: <session-id>).
    • Additional research files (web fetch dumps, cross-references to existing MemoryVault entries, scan of existing Obsidian notes for related content).
  3. Research depth budget: cap research at 5 minutes wall-time / 3 web fetches / 5K tokens per idea (TBD — to settle during /design author walk; conservative defaults to avoid runaway).

Promotion path: when the user decides to progress idea X to a real project:

  • /memory promote idea <slug> command moves _idea-incubator/<slug>/personal-projects/<slug>/.
  • Updates Ideas.md section: appends → promoted YYYY-MM-DD to MemoryVault/personal-projects/<slug>/.
  • Recalculates vec-index entries for moved files.

Garbage collection: incubator entries get GC'd after N months without engagement (N default 6; configurable). GC presents the user with a list before deletion: "These ideas haven't been promoted or referenced in 6+ months: . Keep / Archive / Delete?" — never silent deletion.

6. Crash recovery — session-id markers

The idle-time hook recovers crashed sessions where the Stop hook never fired (Claude Code force-quit, OS crash, etc.):

  1. SessionStart hook writes .harness/session-id-<session-uuid>.start (one file per session, contents = session start timestamp + transcript path).
  2. Stop hook (after running reflection successfully) renames .start.reflected. If reflection fails, file stays as .start (idle hook will retry).
  3. Idle-time hook scans for .start files older than 1 hour (idle threshold for assuming session is truly dead) → runs reflection retroactively on those transcripts → renames to .reflected on success.
  4. GC: .reflected markers older than 30 days get deleted on next idle pass.

All markers in .harness/ (gitignored, runtime-only).

7. Manual seed pass — task 1 of #7a

The user provides additional context + ideas during this task to seed both MemoryVault AND the user-vault idea ledger before the reflection sidecar's first autonomous run. Task flow:

  1. Seed MemoryVault _always-load/ core (~10-20 entries): distill from ~/.claude/CLAUDE.md (dev-flow conventions) + AGENTS.md sibling-repo imports + locked design calls from plans #3-#6. Each entry hand-written, validated, vec-embedded.
  2. Seed personal-projects/ for in-flight projects: agentm, crickets, plus operator-private siblings. Each gets a project-index entry referencing the locked decisions from each repo's prior plans.
  3. Seed Ideas.md and _idea-incubator/ (user-provided + agent-extracted): user provides loose ideas; agent extracts from recent Claude Code transcripts; co-curated.
  4. Validate by running a sample recall: pose a sample query, confirm the SessionStart + UserPromptSubmit hooks return sensible matches.
  5. Initial migration of ~/ContextVault/ contents (domains/ and projects/ subdirs) into the new MemoryVault structure: this happens here, as part of the seed pass.

This task is genuinely large (probably a full session's worth of work) and is deliberately co-created — the agent can't seed the vault alone, and the quality of the seed determines whether the loop pays off in the first weeks of use.

8. Discovery + mining (plan #7b scope)

Three sub-components, all shipped in plan #7b after #7a has been dogfooded for 1-2 weeks:

  1. Transcript reflection pass (one-time + ongoing): run reflection sidecar against the historical Claude Code transcripts at ~/.claude/projects/*/ to retroactively populate MemoryVault from past sessions. After the one-time pass, the Stop + idle hooks handle ongoing sessions.

  2. Internet skill-discovery: periodic scan (cadence TBD — weekly default) of curated sources for SKILL.md-shaped patterns worth adopting. Sources whitelist (TBD): GitHub trending with claude-code / agent-skills tags + Anthropic Cookbook + specific awesome-lists + named blog feeds. Adapt-don't-import principle: when a relevant pattern is found, the agent writes a personal-skill entry in MemoryVault capturing what to adopt — never wholesale forks the SKILL.md into crickets/. The entry is human-reviewed before any actual skill code is written.

  3. Personal-skills auto-indexer: walks crickets/skills/ + agentm/.claude/skills/ (plus any other repos the user installs) at install time + on /release and writes one MemoryVault entry per SKILL.md to personal-skills/. Pre-hook injection then merges entries from personal-private/ AND personal-skills/ at query time — the agent learns "we have a /design author skill" without being told every session.

Alternatives Considered

  1. Built-in vendor memory only (Claude memories, Gemini personal context, similar offerings from other agent platforms). Rejected: per-platform, opaque, lossy, not composable across tools. Captures some signal but invisible to the user, can't be inspected or edited, doesn't survive vendor changes, fragments when you switch tools. MemoryVault keeps the file-based + user-inspectable property that built-in offerings lack.

  2. Filesystem-only with keyword search, no semantic recall. Rejected: too thin for the compound-learning goal. Markdown + grep is good for "find the entry titled X" but misses paraphrased relevance — "I prefer paragraph-long narratives" won't match "write a long summary". We keep the filesystem substrate (human-inspectable, version-control-friendly) but layer semantic vector recall on top so the agent finds what you meant, not just what you said.

  3. Corporate-scale managed memory architecture (centralized vector database service + retrieval API + team-scoped memory groups). Rejected: infrastructure overkill for personal-scale use. Single user, single machine cluster, no team isolation requirements — most of the heavy infrastructure exists to solve coordination problems we don't have. We adopt the mechanics (multi-phase loop, extraction categories, atomic evolve primitive, separate skill-discovery memory group) but use personal primitives (Obsidian-folder, sqlite-vec, Claude Code hooks).

  4. v1 design — private GitHub repo + three separate skills (context-recall, context-save, context-search). Rejected (superseded 2026-05-15). Two problems: (a) standalone repo means no human-inspectable surface — vault contents only visible via the agent or by manually navigating the repo; (b) three skills creates artificial boundaries — recall happens via hooks (not user-invoked), save and search collapse into the broader memory skill with sub-commands.

  5. Obsidian REST API plugin as access mechanism. Rejected for v1: requires Obsidian running, adds a moving piece, doesn't pay off until cross-surface read (web / mobile access) becomes a felt need. Filesystem-direct is simpler and covers desktop use end-to-end.

  6. Personal MCP server with custom recall logic (recall_context(query), save_context(path, content) primitives). Rejected for v1: real engineering up front; the skill-level abstraction is sufficient for the loop. MCP server becomes a follow-up if cross-surface-read pressure surfaces.

  7. No auto-recall — manual /memory recall invocation only. Rejected: agents forget; humans forget; a memory system that nobody actively loads is worthless. Manual recall is theater; the auto-injection at hook boundaries is what makes the loop pay off.

  8. status: superseded frontmatter only — no atomic evolve primitive. Rejected (decision C4): the tri-modal interactive review flow needs the atomic-supersede primitive because "supersede existing entry X?" is one of the four approve/edit/reject/supersede options at review time. Status-frontmatter-only forces a manual two-step (edit old + write new) that doesn't map cleanly.

Dependencies

Internal (all shipped via prior plans):

  • #3 fresh-context evaluator — consumed by reflection sidecar's optional per-entry grading.
  • #4 kill-switch + steer — long-running execution control (reflection sidecar respects .harness/STOP).
  • #5 commit-on-stop — crashed-session safety branch (orthogonal but complementary to MemoryVault crash recovery — commit-on-stop handles dirty git trees, MemoryVault crash markers handle missed reflection).
  • #6 design skill — this design doc is the first real dogfood.

External:

  • (historical: v1 spec listed Anthropic API for embeddings; v0.9.2 dropped this — see crickets HLD. sentence-transformers below is now the only embedding path.)
  • Obsidian + the user's existing cross-device sync (not installed by this plan — assumed precondition).
  • sqlite-vec — SQLite C loadable extension. Called from Python in v1 via pip install sqlite-vec (prebuilt wheels for all 3 OSes); on-disk format is caller-language-agnostic, so future swap to Rust/Go binary callers is non-breaking.
  • Python 3.10+ — already an implicit dependency of crickets (validate-manifests.py, check-wiki.py, check-no-pii.sh all require it). MemoryVault makes this explicit + adds sqlite-vec + sentence-transformers to the toolkit's pip-install set.
  • Claude Code hook lifecycle — SessionStart, UserPromptSubmit, Stop, idle. SessionStart + UserPromptSubmit + Stop are documented hooks; idle-time may require a new crickets primitive (the commit-on-stop hook already establishes the Stop-event hook pattern; idle is similar shape).

Migrations

  1. ~/ContextVault/MemoryVault/ content migration: existing files at ~/ContextVault/domains/ and ~/ContextVault/projects/ migrate to the new vault structure as part of the manual seed pass (task 1 of #7a). The migration is:

    • ~/ContextVault/domains/*.mdMemoryVault/personal-private/domains/*.md (reorganized into personal-private group with kind: domain-reference frontmatter).
    • ~/ContextVault/projects/ai-context-system/conversations/*.mdMemoryVault/personal-projects/memoryvault/conversations/*.md (the prior design conversation lands inside the MemoryVault project itself).
    • After migration, ~/ContextVault/ can be deleted; ROADMAP references to ~/ContextVault/ paths get updated in a follow-up doc pass.

    Additional source paths (flagged for follow-up discussion at the seed-pass task): the ~/ContextVault/ tree is just one source of prior knowledge worth pulling into MemoryVault. There are at least three other sources to inventory + decide what migrates: (a) the user's own Obsidian vault has existing notes that may overlap with what MemoryVault would otherwise auto-capture; (b) a GitHub experimental repo has a README and supporting files describing prior context-system / memory-related explorations; (c) prior decisions / preferences / conventions are scattered across the synced GitHub repos already on this device (CLAUDE.md fragments, AGENTS.md sections, PLAN.archive narratives, ADRs, ROADMAP locked design calls). Defer the full inventory + per-source migration decisions to the seed-pass task — at that point the user walks through each source with the agent, decides what's worth pulling in, what to leave in place, and what to summarize-rather-than-duplicate. Capture as a sub-task list when planning #7a task 1.

  2. No on-disk-format migration: the new vault uses the same markdown + YAML frontmatter shape as the v1 prior design, so content carries over cleanly. Frontmatter schema is extended (new fields: group, kind, always_load, supersedes, superseded_by, confidence_rating_at_capture) — existing entries get reasonable defaults during migration.

  3. Skill name renames: context-recall / context-save / context-search are deprecated (never shipped as actual skills — they were planned in v1 design but pre-empted by v2 pivot). New skill: single memory with sub-commands. No user-visible breakage because v1 skills never existed on disk.

  4. ROADMAP item rename: ContextVaultMemoryVault rename applied globally 2026-05-15 in .harness/ROADMAP.md (the active-plan-tracking file in agentm). #14 (learn skill) folded into #7a scope. Plan #7 split into #7a (core) + #7b (discovery + mining).

Technical Debt & Risks

  1. Python becomes an explicit toolkit dependency. Today crickets uses Python informally (validate-manifests, check-wiki, check-no-pii) — Python 3.10+ is an implicit requirement we haven't called out in the README. MemoryVault formalizes this + adds sqlite-vec + sentence-transformers pip deps. Mitigation: document Python as a first-class requirement in the toolkit README + Agent M's Use-The-Memory-Skill page; graceful-skip if pip deps are missing — vault stays read-via-grep + write-via-file (no embeddings, no semantic recall) until user installs the deps. Future optimization: if Python hook latency becomes a problem, swap the caller layer to Rust or Go (sqlite-vec is C-extension, language-agnostic on disk — non-breaking swap). Captured as a deferred follow-up.

  2. First-run model download cost (~1.3GB). v0.9.2 ships BGE-large as the default local model (see crickets HLD). On first /memory save or embed.py --mode local invocation, sentence-transformers downloads the BGE-large checkpoint (~1.3GB) into ~/.cache/crickets/sentence-transformers/. Subsequent invocations are offline + fast. Risk: operators on slow / metered connections feel the first-run cost; operators on low-spec hosts may not have disk + RAM headroom. Mitigation: AGENT_TOOLKIT_EMBEDDING_MODEL env var lets operators swap to a smaller model (e.g. all-MiniLM-L6-v2 at 80MB) without code changes; --no-python-deps install flag defers the install entirely; graceful-skip path (no sentence-transformers installed → grep+frontmatter recall only) keeps the toolkit usable until the operator decides to pay the download.

  3. Cloud sync conflicts. If the agent writes to the vault from one device while the user edits from another, the user's cross-device sync layer could create conflict-marker files (most cloud sync providers handle this similarly). Mitigation: single-user-mostly-desktop assumption (user rarely edits MemoryVault contents directly — that's the agent's job; user-side edits typically happen on the human-facing parts of the vault, not the agent-curated MemoryVault folder). If conflicts bite, escalate to a sync-mediating access mechanism (Obsidian REST API plugin or similar) — captured as a deferred alternative.

  4. Interactive review fatigue. Tri-modal routing reduces prompt frequency but the MEDIUM-confidence pool may still feel like noise. Mitigation: memory.review_mode: silent escape hatch + adjustable confidence thresholds in skill config; default interactive is intentional friction during the trust-building phase.

  5. Vault bloat. Aggressive sweep + _inbox/ could accumulate cruft if user doesn't do weekly inbox review. Mitigation: /memory inbox command shows inbox count + age; /memory reflect end-of-session output reminds user when inbox > N entries; incubator GC at 6 months gives a soft cleanup deadline.

  6. Cross-machine config sanitization (the follow-up added 2026-05-15 — see .harness/ROADMAP.md §7 Still open). MemoryVault skill config + hooks + crickets settings ought to be backed up to an operator-private sibling repo, but the vault contents are private. Need a redaction boundary — what's safe to commit (skill source, hook source, schema, templates) vs. device-local (real vault paths on disk, sync-provider identifiers, account emails, any project-specific overrides). Three candidate shapes flagged in the ROADMAP follow-up; decision deferred to a small follow-up plan.

  7. Recall-quality uncertainty. We've never run this loop personally. The relevance heuristic (vec_similarity * 0.7 + keyword_match_count * 0.3) is a guess. Mitigation: ship instrumented — every recall logs which entries were injected + the user can manually inspect via /memory inspect to validate; tune weights based on real use.

  8. Skill-discovery (#7b) "adapt-don't-import" principle is hand-wavy. The line between "adopt this pattern's idea" and "fork their SKILL.md" is fuzzy in practice. Mitigation: ship #7b conservatively — the agent always proposes a personal-skill entry FIRST, human approves the entry, and only then does the user (not the agent) decide whether to author an actual skill in crickets/.

  9. Single-library embedding lock-in (v0.9.2). v0.9.2 narrowed embeddings to a single mode: local sentence-transformers (see crickets HLD). The library is widely-used + permissively-licensed + actively maintained, but the toolkit is now coupled to its API + model-loading semantics. Risk: if sentence-transformers is abandoned or pivots incompatibly, swap candidates include direct PyTorch + tokenizers integration or the transformers library. Re-audit trigger: sentence-transformers stops shipping releases for 6+ months OR drops support for the BGE-large family. Mitigation: the abstraction at embed.py is thin (~50 lines wrapping SentenceTransformer.encode()); replacement effort is bounded.

Quality Attributes

Security

Vault contents are private — they may include PII (project names, internal preferences, fixes that mention real systems). Three layers of access control:

  1. Filesystem permissions: the vault lives at the user's synced storage path; only the user's OS account has read/write access. Agent inherits via the user's session.
  2. Network surface: the agent never makes the vault contents accessible via network. As of v0.9.2, MemoryVault makes zero external network calls during normal operation — embeddings are computed entirely on-device via local sentence-transformers (see crickets HLD). The only network access is the one-time model download (~1.3GB BGE-large from HuggingFace Hub) on first invocation, after which the toolkit is fully offline-capable.
  3. Tool allowlist: the memory skill is allowed Read, Write, Edit, Glob, Grep only — no Bash, no network primitives, no shell exec. Reflection sidecar uses a sub-agent (per the evaluator pattern) which inherits the same restricted allowlist.

PII guardrails: the crickets/ pre-push hook + CI gate covers PII detection on toolkit-committed content (skill source + templates + how-to docs). Vault contents themselves are NOT in crickets — they're in the user's private Obsidian vault. No public surface for vault contents.

Reliability

Failure modes + mitigations:

  • Cloud sync failure (network down, sync paused): agent can still read/write the local cached vault path; changes propagate when sync recovers. No data loss.
  • sqlite-vec index corruption: recall falls back to grep+frontmatter-only (degraded but functional); index can be rebuilt from scratch by re-embedding all entries.
  • Local-embedding failure (rare — sentence-transformers not installed, or PyTorch MPS regression): save still succeeds (file write is unconditional); embedding queue stays pending until deps are restored. UserPromptSubmit hook falls back to grep+frontmatter when sentence-transformers is unavailable. No external network dependency exists post-v0.9.2 (crickets HLD) so there's no rate-limit / API-failure class of incident.
  • Hook crash mid-reflection: .harness/session-id-*.start marker stays in place; idle-time hook will retry reflection retroactively. commit-on-stop covers any dirty git tree from interrupted writes.
  • Vault path missing (sync layer not mounted, drive disconnected): hooks log error + graceful-skip; agent continues without memory injection rather than failing the session.

Data Integrity

File writes are atomic at the filesystem level (single Write call). /memory evolve is a two-step rename + write; transactional integrity relies on filesystem atomicity (good on macOS APFS).

Frontmatter status field is the supersession discipline: active (default), resolved, superseded. Recall filters skip non-active by default. superseded_by and supersedes cross-link the supersession graph.

No database transactions (file-based by design). Risk: simultaneous writes to the same entry from two devices via the user's sync layer. Accepted as a known limitation under the single-user assumption; if it bites, escalate to a sync-mediating access mechanism (e.g. Obsidian REST API).

Privacy

Vault contents are PRIVATE — assumed to contain PII. Storage = user's local + synced storage (user's account). Embeddings are computed entirely on-device via local sentence-transformers (crickets HLD) — no external API calls. If sentence-transformers is unavailable (not installed, or PyTorch MPS issue), recall degrades to grep+frontmatter-only.

No analytics, no telemetry, no third-party sharing.

Scalability

sqlite-vec scales to 100K+ entries trivially with sub-second query times. Realistic personal-use estimates: no more than ~20K entries by year 5 (reflection sweep + idea-incubator + personal-skills index combined). Grep scales linearly with vault size; at 10K entries grep walltime is ~1-2 seconds (acceptable for the UserPromptSubmit time budget). Headroom is comfortable — partitioning the vec-index by group or by year only becomes worth considering if vault grows past 20K entries faster than projected.

Latency

Hook time budgets (hard limits — exceed → log warning + proceed with partial results):

  • SessionStart: 500ms. One filesystem walk of _always-load/ (~20 files) + file reads. Achievable.
  • UserPromptSubmit: 300ms. Embed query (local BGE-large on M-series ~50-100ms via PyTorch MPS; CPU-only ~150-300ms — operator-config-dependent per crickets HLD) + vec query (10ms) + grep merge (50ms) + format + inject (10ms). Tight on CPU; consider caching common embeddings (e.g. tokenize prompt → check embedding cache).
  • Stop hook reflection: no time budget (runs in background after session ends; user-perceived latency = 0).
  • Idle hook reflection: no time budget (runs in background).

Save latency: file write synchronous (<50ms); embedding async (doesn't block agent).

Abuse

N/A: single-user personal tooling. No external surface, no rate-limiting needs, no anti-spam, no malicious-input handling beyond standard Claude Code sandboxing. The vault is trusted-source-only.

Accessibility

N/A: text-only on-disk format; no UI provided by this design. The user accesses the vault via Obsidian (which provides its own accessibility support per Obsidian's WCAG compliance). Agent-side surface is Claude Code's standard text-based UX.

Testability

The skill is documentation + sub-command bodies; tests follow the established crickets pattern:

  • Smoke install tests (existing smoke-install-bash.sh + .ps1): extended to verify memory skill + 4 hooks install correctly at the 2 host destinations (Claude Code + Antigravity; gemini-cli removed in v0.9.0 per ROADMAP item #15 / crickets HLD).
  • Manual end-to-end walks: per established pattern (manual fill-out verification for /design author, manual hook fire for /work step verifications). Each sub-command walked through a synthetic 5-minute scenario.
  • Recall-quality tests: manual via seeded vault — fixture vault with 50 entries, fixed query set, expected recall set; run as periodic regression. Vec-quality regressions surface here.
  • Hook tests: manual via fixture session — write a fixture transcript, fire the Stop hook, inspect the resulting MemoryVault diff. Deterministic enough for CI.
  • Tri-modal routing tests: unit-level per heuristic (HIGH/MEDIUM/LOW) — given a candidate string, assert routing decision. Lives in crickets/scripts/test-memory-routing.py.

Deterministic verification per gate per agentm conventions; LLM-judge augmentation only for recall-relevance gating (not as a primary check).

Internationalization & Localization

N/A: vault content is English-only (single user, English-speaking). No locale-aware date formatting; dates use ISO 8601 (YYYY-MM-DD) which is locale-neutral. Future expansion possible if user wanted to capture content in another language, but no current demand.

Compliance

N/A: personal tooling, user-owned data, no regulatory framework applies. GDPR-style "right to be forgotten" is satisfied trivially by deleting the vault directory; no third-party data processing.

Project management

Work estimates

Plan #7a (MemoryVault Core) — Large, estimated 8-10 tasks, 2-3 weeks calendar:

  1. (L) Manual co-created seed pass + ~/ContextVault/ content migration.
  2. (M) Skill scaffold + memory save write primitive + sqlite-vec dependency wiring.
  3. (M) memory evolve atomic supersede primitive.
  4. (M) Reflection sidecar logic + 3-category mine + tri-modal routing.
  5. (L) Stop-event hook + idle-time hook (new crickets primitive) + crash recovery.
  6. (M) SessionStart + UserPromptSubmit recall hooks + dedup logic.
  7. (L) Recall engine — sqlite-vec + grep+frontmatter merge + local embedding integration (BGE-large via sentence-transformers; see crickets HLD for the v0.9.2 local-only refactor).
  8. (M) Idea ledger — Ideas.md + _idea-incubator/ two-tier capture + permeable boundary enforcement.
  9. (S) Documentation pass — how-to + amendment log + cross-refs.
  10. (M) Release pair crickets v0.9.0 + (if harness integration needed) agentm v2.4.0.

Plan #7b (MemoryVault Discovery + Mining) — Medium, estimated 5-7 tasks, 1-2 weeks calendar, ships after 1-2 weeks of #7a dogfood:

  1. (M) Transcript reflection one-time pass over ~/.claude/projects/*/.
  2. (M) Personal-skills auto-indexer (toolkit + harness SKILL.md → personal-skills/ group).
  3. (L) Internet skill-discovery component with adapt-don't-import workflow.
  4. (S) Documentation pass — how-to update + amendment log.
  5. (M) Release pair crickets v0.9.2 + harness if needed.

Documentation Plan

Agent-toolkit wiki additions (#7a):

  • New how-to: crickets/wiki/how-to/Use-The-Memory-Skill.md — comprehensive page covering 4 sub-commands + worked scenarios (capture flow / recall flow / idea promotion / supersede flow) + tri-modal routing explanation + interactive-review mode setting + troubleshooting (sqlite-vec install / cloud sync issues / API embedding fallback / vault bloat).
  • Updated: Home.md + _Sidebar.md (add memory skill to reader-intent sections); README.md "What's inside" table (bump version + add memory skill row); Customization-Types.md (add memory as concrete example link in skill row).
  • This design doc itself (memoryvault.md) becomes the canonical "Why we built this" wiki entry point per the locked design call from plan #6.

Agent-toolkit wiki additions (#7b):

  • Update: Use-The-Memory-Skill.md — add transcript-reflection + skill-discovery sections.

Harness wiki additions: None for #7a (toolkit-only). #7b: same. Plan #8 (auto context integration into harness phases) is when harness wiki adds memory references.

Launch Plans

Phased rollout via the locked dev-flow convention:

  1. #7a release: crickets v0.9.0 + agentm v2.4.0 (if any harness integration; likely not — toolkit-only). Coordinated cross-repo if needed; toolkit-first per the locked order from plans #3-#6.
  2. Dogfood window (1-2 weeks): user runs MemoryVault in real Claude Code sessions. Inbox review + interactive-review tuning + recall-quality measurement happen here.
  3. #7b release: crickets v0.9.2. Lands transcript reflection + personal-skills indexer + internet skill-discovery.
  4. Re-audit triggers: after 1 month of real use, re-audit the locked design calls. Capture-threshold + recall-quality + cross-device sync conflicts get re-evaluated; amendment log updated if any decisions flipped.

No feature flags; no phased rollout to user segments (single user). The escape hatches are: memory.review_mode: silent (cuts MEDIUM-tier prompts), AGENT_TOOLKIT_EMBEDDING_MODEL env var (swap default BGE-large for a smaller model on low-spec hosts — added in v0.9.2 per crickets HLD; still local-only — no API option), memory.enabled: false (kill switch — disables all hooks + auto-recall, vault becomes read-only).

Operations

SLAs

N/A: personal tooling, no external SLA exposure. The hooks have soft time budgets (SessionStart 500ms, UserPromptSubmit 300ms) but exceeding them is logged + degraded-graceful, not paged.

Monitoring and Alerting

Minimal personal-only monitoring:

  • Hook execution log: .harness/memoryvault.log (rotating, gitignored) — one structured JSON line per hook invocation with timestamp, hook name, duration, result (success / partial / error), entries-injected count. User-readable; supports tail -f for debugging.
  • Vault health snapshot: /memory health command outputs entry count per group + last-reflection timestamp + sqlite-vec index size + inbox count + incubator count.
  • Alerts (personal — no PagerDuty): the /memory reflect Stop-event output warns when inbox > 50 entries or incubator > 20 unpromoted entries. Idle-time hook surfaces a "no reflection in 7+ days, something might be broken" notice if vault is silent.
  • Disk + memory usage: BGE-large is ~1.3GB on disk + ~1.5GB RAM at runtime per crickets HLD. Monitor via ~/.cache/crickets/sentence-transformers/ (disk) and process RSS (RAM). If footprint becomes a problem, swap to a smaller model via AGENT_TOOLKIT_EMBEDDING_MODEL env var.

Logging Plan

Structured JSON logs at .harness/memoryvault.log (gitignored, runtime-only):

{"ts": "2026-05-15T18:30:00Z", "hook": "user-prompt-submit", "duration_ms": 245, "entries_injected": 5, "vec_hits": 4, "grep_hits": 3, "deduped_count": 2}
{"ts": "2026-05-15T19:00:00Z", "hook": "stop-reflect", "duration_ms": 12500, "candidates_mined": 8, "auto_saved": 2, "interactive_reviewed": 4, "inboxed": 2}

Retention: 30 days, rotated weekly. Log rotation handled by a simple logrotate-style discipline in the hook scripts.

Log levels: per-hook duration (always), errors (always), debug-trace (opt-in via memory.log_level: debug).

Rollback Strategy

Three rollback levels, depending on what's broken:

  1. Soft disable (most common): set memory.enabled: false in skill config → all hooks become no-ops; auto-recall + auto-save stop; vault contents untouched. Reversible by flipping back.

  2. Skill uninstall: bash crickets/install.sh --uninstall memory removes the skill + 4 hooks from the host destinations. Vault contents untouched (intentional — vault is the user's data, not the skill's data).

  3. Vault rollback: vault contents are versioned via Obsidian's file versioning + the user's cloud sync provider's restore-deleted-files history + Obsidian's optional git plugin (if user enables — not required by this design). If the vault gets corrupted by a runaway reflection sidecar, the combination of sync-side restore + Obsidian versioning provides recovery; worst case, restore the vault to a snapshot before the runaway and replay the sidecar with stricter confidence thresholds.

No schema migrations are involved in rollback — the markdown + YAML format is backwards-compatible by design.

Amendment log

2026-05-15 — Initial design locked. Four-group architectural session (A1-A4/B1-B3/C1-C4/D1-D3) settled 13 design calls covering write primitives, reflection sidecar, recall engine, idea ledger, seed pass, and discovery mining. Translated to 6 implementation parts (write-primitives, recall-loop, reflection-and-recovery, idea-ledger, seed-pass, discovery-mining), sequenced topologically.

2026-05-16 — Host scope narrowed to [claude-code, antigravity]. memory skill ships without gemini-cli in supported_hosts; fleet-wide sweep (2026-05-17) aligned all existing customizations to the same 2-host scope. Why not ship with 3 hosts: ROADMAP #15 was already removing gemini-cli — shipping then re-sweeping would be redundant churn. Re-audit: gemini-cli re-added to the toolkit.

2026-05-20 — Embedding-mode collapsed to local-only (v0.9.2). DC C2 superseded: dual-mode API + local → local sentence-transformers only. Default model upgraded all-MiniLM-L6-v2 (384-d, MTEB 56.3) → BAAI/bge-large-en-v1.5 (1024-d, MTEB 64.2); EMBEDDING_DIM 384→1024; AGENT_TOOLKIT_EMBEDDING_MODEL env var escape hatch (still local). Why not retain API path: primary operator is a Claude Ultra subscriber without a separate embedding API key; desktop-class hardware (M-series + 64GB-RAM) makes local quality the better default. Re-audit: host drops below M-series floor; a local model exceeds BGE-large at lower footprint; Anthropic ships a materially better API embedding path.

2026-05-22 — Discovery + mining shipped (crickets v0.10.0, #7b). Five sub-commands: /memory index-skills, /memory reflect corpus, /memory discover-skills, /memory adapt-skills, /memory watchlist. adapt-evaluator sub-agent (write-scoped to _skill-watchlist/) enforces adapt-don't-import; stdlib-only, no new deps. Why not fork-and-import: forking external skills would erode the "no external skill execution" boundary — adapt-don't-import is load-bearing. Re-audit: operator-confirmed 4-source whitelist grows stale; cadence or dry-run defaults need adjustment.

2026-06-25 — Migrated to launched living design (AG Phase 3). status: finallaunched; AG graph keys added; Document History table collapsed to amendment log; (former ADR XXXX) chain-pointers stripped; dead TLS/API-key embedding residue removed; internal provenance removed from body.

Clone this wiki locally