Skip to content

Releases: arttttt/mnemo

0.3.0 — recall

22 Jun 00:39
752e273

Choose a tag to compare

Memory can now answer a question — not just return hits. Plus projects, a query-less browse, deferred embedding, and the pplx embedder

Added

  • recall — an opt-in LLM read tool: Gemma 4 E2B-it (official QAT GGUF) synthesizes a concise, grounded answer to a query from a project's memories, replying No relevant memories found. when none apply (never outside knowledge). Available via MCP and the CLI (mnemo recall <project> "<query>"); the write path stays LLM-free, and the generator is transient (loaded on demand, unloaded after).
  • browse — query-less retrieval: list memories by filter (type / tags / scope / created_after), newest first, no relevance ranking.
  • Projects as first-class entitiescreate_project / update_project / list_projects / delete_project (cascades its memories via ON DELETE CASCADE); writes and reads are gated on a registered project (an unknown project errors, with near-match suggestions).
  • Deferred embedding — writes return immediately while an async worker pool embeds off the hot path (MNEMO_EMBED_WORKERS sizes a bounded instance pool for parallel encodes); pending (vector-less) memories are supported and reconciled.
  • pplx-embed-v1-0.6b (int8 ONNX) as the default embedder, with mnemo reindex to re-embed on an embedder switch.
  • Per-memory content cap (MNEMO_MAX_MEMORY_TOKENS, default 512); over-window or empty content is rejected with an actionable error (never truncated).
  • A composable-stage pipeline framework underpinning recall (and groundwork for background consolidation); mnemo stats now reports pending (un-embedded) memories.

Changed

  • llmkit — a new shared inference package: ONNX (embedder / reranker) and llama.cpp (generator) runtimes behind an on-demand residency lifecycle; the embedder, reranker, and generator all run through it.
  • Store — SQLite + sqlite-vec + FTS5 is the sole backend (the in-memory store is a test double), refactored onto SQL executors; the typed-links table folded into a supersedes column; the reranker is off by default.
  • Architecture — the repository port segregated (ISP), repositories made stateless and no longer mutate domain entities, interface/implementation naming normalized.
  • Service lifecyclestop_service; the service restarts after a reindex; the connector waits out a slow start instead of double-spawning, and the proxy reconnects if the service restarts mid-session.
  • remember returns a status (created / duplicate / superseded); the legacy dedup accounting was removed.

Fixed

  • Every numeric MNEMO_* is validated at startup (no opaque later crash).
  • Background workers (embed, idle monitor, drain) stay alive and loud on a store fault instead of dying silently; idle-exit can't hang on a fault.
  • Deletion heals the supersede chain (promote head, splice interior) across batch / root / whole-chain cases.
  • Re-keyed exact duplicates and half-specified retrievals now error explicitly; exact-content dedup is scoped to active rows within a project; topic_key supersede is atomic.
  • Large multi-id deletes are chunked under SQLite's parameter cap; the reindex rebuild is atomic; a never-binding service spawn is killed (no double-spawn).
  • Project-scoped search requires a project; a project filter on an all / global search is rejected; created_after is normalized to UTC.

Full Changelog: 0.2.0...0.3.0

0.2.0 — on-demand lifecycle

14 Jun 21:24
b88e084

Choose a tag to compare

Agents now share one on-demand service instead of running a server each.

Added

  • mnemo setup — one-command client wiring for Claude Code, Codex, Kimi CLI,
    Cursor, Windsurf, and opencode. Run it with no args to detect installed
    clients and pick from a list, or mnemo setup <client> / --all / --dry-run.
  • Idle-exit — the service shuts down after a grace period once no connector is
    alive (kernel-backed lock liveness; survives crashes/SIGKILL, immune to PID reuse).

Changed

  • Runtime is now one shared mnemo-service + a thin per-agent mnemo-mcp
    connector that starts it on demand. The embedder and store load once for
    all agents — footprint S + c·N (~170 MB service + ~40 MB/connector), not S·N.
  • Store is thread-safe: one serialized writer + per-thread WAL readers
    (concurrent reads, zero lost writes).

Full Changelog: 0.1.0...0.2.0