-
Notifications
You must be signed in to change notification settings - Fork 1
recall loop
parent: memoryvault.md part_slug: recall-loop title: "Recall hooks + recall engine" status: pending visibility: published author: Alex Herrero contributors: [] created: 2026-05-15 updated: 2026-05-15 last_major_revision: 2026-05-15 dependencies: [write-primitives] estimated_scope: L prd: project:
Parent design: MemoryVault — see Detailed Design §3 (Recall hooks — SessionStart + UserPromptSubmit) and §4 (Recall engine — sqlite-vec + grep+frontmatter merge) for full architectural context.
This part ships the read loop — the two hooks that inject MemoryVault content into the agent's context + the recall engine they call. Together these make the vault actually usable: without recall, save is theater.
Two-hook recall pattern:
-
SessionStart hook — fires once on session boot. Globs
MemoryVault/personal-private/_always-load/*.md, reads each, formats as a single markdown block, injects into session context. Outputs a "Loaded N MemoryVault always-load entries" transparency line. Hard time budget 500ms; degraded-graceful on overrun. -
UserPromptSubmit hook — fires on every user message. Takes the prompt as the recall query and calls the recall engine for top-K (K=5 default) relevant entries. Dedups against the already-loaded
_always-loadentries (by file path), then injects the remaining matches as a system message before the agent processes the prompt. Outputs a "Loaded N relevant entries: " transparency line so the user can see what memory shaped the response. Hard time budget 300ms; degraded-graceful on overrun.
Recall engine (called by both hooks; also exposable as /memory search for manual semantic queries):
-
Embed the query — local
sentence-transformers(BGE-large default; ~1.3GB checkpoint cached at~/.cache/crickets/sentence-transformers/; PyTorch MPS on Apple Silicon for acceleration) as of v0.9.2 (see ADR 0001's 2026-05-20 amendment). Synchronous call; on time-budget overrun, fall back to grep-only. -
Vec search — query sqlite-vec for top-K nearest entries by cosine similarity. Returns
(path, similarity_score). -
Grep + frontmatter search in parallel — scan entry titles + tags + first-line content for keyword matches; filter by frontmatter (
status: activeonly by default; respectgroupfilter if query specifies; respect--include-inboxflag for_inbox/access). -
Merge — union the two result sets; rank by
similarity_score × 0.7 + keyword_match_count × 0.3; dedup. - Return top-K — default K=5.
Filtering invariants: recall never surfaces status: superseded entries by default; _inbox/ excluded unless --include-inbox flag set; _archive/ always excluded.
Local-only as of v0.9.2 (ADR 0001 amendment) — no API path remains. sentence-transformers + BGE-large is the production mode; AGENT_TOOLKIT_EMBEDDING_MODEL env var swaps in a smaller model for low-spec hosts. If sentence-transformers is unavailable (not installed, or PyTorch MPS issue), recall degrades to grep+frontmatter-only.
-
write-primitives— recall reads entries written by/memory save. Without entries, recall returns empty; with entries, recall validates that save's frontmatter + vec-indexing actually work end-to-end. This dependency is the natural sequencing gate: save must ship first.
-
SessionStart hook fires on session start — install the hook at the 2 host destinations; start a Claude Code session against a scratch vault with seeded
_always-load/; verify the hook runs + injects entries + outputs the transparency line. - UserPromptSubmit hook fires on every user message — same scratch session; submit multiple prompts; verify the hook fires per prompt + recall returns relevant matches + dedups against already-loaded entries.
- Recall returns top-K relevant entries — seed a fixture vault with ~20 entries on varied topics; submit queries with both keyword-match and paraphrase semantics; verify top-K returns the expected matches per the rubric (semantic queries should pull paraphrase matches that grep alone would miss).
- Time budgets respected — SessionStart completes in <500ms; UserPromptSubmit completes in <300ms; on overrun, hook logs warning + proceeds with partial results rather than blocking.
-
status: supersededfiltered by default — create an entry, evolve it; verify recall returns only the new entry (not the superseded one). - Offline-capable recall works — verify recall succeeds with network disconnected once BGE-large model is cached locally (v0.9.2 is local-only by default — see ADR 0001 amendment; no API path exists to fall back to).
- Local-embedding failure → grep fallback — simulate sentence-transformers missing; verify recall degrades to grep+frontmatter-only rather than failing entirely.
-
Smoke install verifies hooks land —
smoke-install-bash.sh+.ps1extended to verify both hook scripts install at the 2 host destinations. - All 3 OS CI workflows green on the commit that lands this part.
- The recall engine is shared between SessionStart, UserPromptSubmit, and (eventually)
/memory search. Build it as a single Python module (memory/recall_engine.pyor similar) callable from each hook script + the skill's search sub-command. - The 300ms UserPromptSubmit budget is tight. Consider an embedding cache: tokenize the prompt → check a small LRU cache of recent embeddings → cache hit avoids the API/local call entirely. Defer caching to a follow-up if v1 hits the budget without it.
- The merge formula (
sim × 0.7 + keyword × 0.3) is a guess per Tech Debt #7. Ship instrumented — every recall logs its components so the user can validate via/memory inspectand tune from real use. -
_always-load/is just a path convention; the SessionStart hook doesn't filter by frontmatteralways_load: true— it just loads everything in that subdir. Thealways_load: truefrontmatter is informational only in v1 (might be used by/memory inspectto surface always-load entries differently).