Memory can now answer a question — not just return hits. Plus projects, a query-less browse, deferred embedding, and the pplx embedder
Added
recall— an opt-in LLM read tool: Gemma 4 E2B-it (official QAT GGUF) synthesizes a concise, grounded answer to a query from a project's memories, replyingNo relevant memories found.when none apply (never outside knowledge). Available via MCP and the CLI (mnemo recall <project> "<query>"); the write path stays LLM-free, and the generator is transient (loaded on demand, unloaded after).browse— query-less retrieval: list memories by filter (type / tags / scope /created_after), newest first, no relevance ranking.- Projects as first-class entities —
create_project/update_project/list_projects/delete_project(cascades its memories viaON DELETE CASCADE); writes and reads are gated on a registered project (an unknown project errors, with near-match suggestions). - Deferred embedding — writes return immediately while an async worker pool embeds off the hot path (
MNEMO_EMBED_WORKERSsizes a bounded instance pool for parallel encodes); pending (vector-less) memories are supported and reconciled. - pplx-embed-v1-0.6b (int8 ONNX) as the default embedder, with
mnemo reindexto re-embed on an embedder switch. - Per-memory content cap (
MNEMO_MAX_MEMORY_TOKENS, default 512); over-window or empty content is rejected with an actionable error (never truncated). - A composable-stage pipeline framework underpinning recall (and groundwork for background consolidation);
mnemo statsnow reports pending (un-embedded) memories.
Changed
llmkit— a new shared inference package: ONNX (embedder / reranker) and llama.cpp (generator) runtimes behind an on-demand residency lifecycle; the embedder, reranker, and generator all run through it.- Store — SQLite +
sqlite-vec+ FTS5 is the sole backend (the in-memory store is a test double), refactored onto SQL executors; the typed-links table folded into asupersedescolumn; the reranker is off by default. - Architecture — the repository port segregated (ISP), repositories made stateless and no longer mutate domain entities, interface/implementation naming normalized.
- Service lifecycle —
stop_service; the service restarts after areindex; the connector waits out a slow start instead of double-spawning, and the proxy reconnects if the service restarts mid-session. rememberreturns a status (created / duplicate / superseded); the legacy dedup accounting was removed.
Fixed
- Every numeric
MNEMO_*is validated at startup (no opaque later crash). - Background workers (embed, idle monitor, drain) stay alive and loud on a store fault instead of dying silently; idle-exit can't hang on a fault.
- Deletion heals the supersede chain (promote head, splice interior) across batch / root / whole-chain cases.
- Re-keyed exact duplicates and half-specified retrievals now error explicitly; exact-content dedup is scoped to active rows within a project;
topic_keysupersede is atomic. - Large multi-id deletes are chunked under SQLite's parameter cap; the reindex rebuild is atomic; a never-binding service spawn is killed (no double-spawn).
- Project-scoped search requires a project; a project filter on an all / global search is rejected;
created_afteris normalized to UTC.
Full Changelog: 0.2.0...0.3.0