Skip to content

Memory and Knowledge

ankurCES edited this page Jun 6, 2026 · 10 revisions

Memory & Knowledge

blumi is built for long-running and distributed agentic work, which needs three things most assistants lack: memory that survives a crash, memory that learns and travels across machines, and a real understanding of your codebase. blumi ships all of it local-first — a bundled embedding model means no API key and nothing leaves your machines.

This page covers durable execution, semantic long-term memory, SEDM cross-peer diffusion, and the native code knowledge base — plus how to configure, drive, and test each.


At a glance

Capability What it does Drive it with
Durable execution Checkpoints each tool step; resumes mid-turn after a crash/restart automatic
Semantic memory (RAG) Recalls relevant memories each turn (vectors, FTS5 fallback) automatic + memory tool
SEDM governance Dedup on write, utility scoring, consolidation/eviction automatic
Cross-peer diffusion Spreads worth-sharing memories across the grid automatic (grid + diffuse)
Code knowledge base Index a repo; hybrid vector/FTS5 code search blumi knowledge, code_search/code_retrieve, blugo Code tab

Everything is on by default. The first memory/knowledge operation downloads the embedding model (~130 MB, bge-small-en-v1.5) once into ~/.blumi/models; after that it runs fully offline. If a node can't reach the network, memory/knowledge transparently fall back to keyword (FTS5) search.


Durable execution

Every tool step within a turn is checkpointed to SQLite (checkpoints table in blumi.db). If the process crashes or the gateway is restarted mid-turn, the next run resumes from the last completed step instead of replaying the whole turn — the backbone for long autonomous runs (blumi loop, grid-dispatched work). Resume is at-least-once: idempotent tools + the doom-loop guard keep it safe.

Nothing to configure. To see it: start a multi-step task, restart the gateway (launchctl kickstart -k gui/$(id -u)/com.blumi.serve) mid-run, reconnect — it picks up where it left off.


Semantic long-term memory (RAG)

A vector store (memories + memory_vectors + memories_fts in blumi.db) lets blumi recall knowledge across sessions. Each turn, blumi embeds your latest message, finds the most relevant memories, and injects them as background context — appended as a trailing user message so the cached system prompt stays byte-identical (prompt caching keeps working). With embeddings off, it degrades to FTS5 keyword recall.

Namespaces

  • user — durable facts/preferences about you. Private — never diffuses off the node.
  • agent — the agent's own notes/conventions (the default for "remember this…"). Diffusable.
  • project:<hash> — project-scoped knowledge. Diffusable.

Using it from chat — the memory tool:

  • add — remember a fact (mirrored to MEMORY.md/USER.md and embedded, dedup-gated).
  • query — semantic search over everything remembered.
  • read / replace / remove — manage the file-backed entries.

Tip: phrasing decides the namespace. "Remember about me: I prefer dark mode" → user (private). "Remember this project convention: …" → agent/project (shareable across the grid).


SEDM — self-evolving, distributed memory

Memory that only grows becomes noise. blumi applies the governance from the SEDM paper:

  • Write-admission (dedup). Before inserting, blumi checks the closest existing memory; if it's a near-duplicate (cosine ≥ dedup_threshold, default 0.92) it merges and bumps utility instead of piling up.
  • Utility scoring. Every time a memory is recalled or merged, its hits/utility rise — so what actually helps surfaces more.
  • Consolidation & eviction. A periodic sweep folds near-duplicate clusters into the highest-utility member and soft-evicts the weakest once a namespace exceeds max_per_namespace.
  • Cross-peer diffusion. The sweep pushes high-value, non-user memories to live grid peers (POST /api/grid/memory, authenticated by the shared grid secret). Each receiver re-admits the memory through its own dedup gate and tags it with the origin node, so memories never loop (A→B→A). What one machine learns, the others pick up. Your user namespace never leaves the node.

Diffusion needs the grid enabled on the participating nodes and memory.diffuse = true (default). The sweep runs every memory.sweep_secs (default 60s), so a new shareable memory appears on peers within ~a minute.

See diffusion happen (deterministic check on each machine):

sqlite3 ~/.blumi/blumi.db \
  "SELECT namespace, origin, substr(text,1,60) FROM memories WHERE status='active';"

On the node that learned it, the row has origin = ''; on a peer that received it, origin is the sender's node name.


Native code knowledge base

Index a repo into a sibling ~/.blumi/knowledge.db and search it by meaning or keyword (hybrid: FTS5 first for exact/symbol precision, vector fill for semantic recall). Indexing is gitignore-aware and diff-aware — re-running only touches changed files (by SHA). Symbol extraction is lightweight (per-language, no external grammars), and asset/binary files (svg, images, lockfiles…) are skipped.

CLI — blumi knowledge

blumi knowledge ingest ~/code/my-repo     # index a repo/dir (incremental)
blumi knowledge search "where is auth handled"
blumi knowledge status                     # files · symbols · vectors
blumi knowledge list                       # indexed sources
blumi knowledge remove <source>            # drop an indexed source

From chat — agent tools

  • code_search — find relevant functions/types/code by meaning or keywords; returns file:line + snippet. Use it to locate where something is implemented before reading/editing.
  • code_retrieve — fetch a file's indexed symbols (with snippets), or one symbol by name.

From the phone — blugo Code tab

Control center → Code: index a repo (path on the gateway machine, with live progress), list/remove sources, and search code with file:line + snippet result cards.

The index is local to each node (the orchestrator's knowledge.db). Re-ingest is cheap, so keep it fresh as the code changes.


Configuration

In ~/.blumi/settings.json (every field shown is the default — omit to keep it):

{
  "embeddings": {
    "enabled": true,
    "backend": "local",
    "model": "bge-small-en-v1.5",
    "dim": 384
  },
  "memory": {
    "enabled": true,
    "recall_k": 5,
    "dedup_threshold": 0.92,
    "max_per_namespace": 2000,
    "diffuse": true,
    "sweep_secs": 60
  },
  "knowledge": {
    "enabled": true,
    "max_file_kb": 256,
    "exclude": []
  }
}
  • embeddings.backendlocal (bundled ONNX model, fully offline), openai (a configured OpenAI-compatible /embeddings endpoint, e.g. Ollama/llama.cpp/cloud), or grid (offload to a peer).
  • Lean / constrained install: set embeddings.enabled = false to skip the model entirely — memory and code search then run on FTS5 keyword matching, no download.

The bundled-embeddings binary is the default; a --no-default-features build of blumi-llm omits the ONNX runtime for a leaner artifact.


Troubleshooting

  • First memory/code op is slow, then fast. That's the one-time ~130 MB model download (~/.blumi/models). It runs in the background; recall falls back to keyword until it's ready, and a later sweep backfills any vectors. Confirm with du -sh ~/.blumi/models (~130 MB when complete).
  • Search returns irrelevant results. Make sure the repo is indexed (blumi knowledge status). Symbol/keyword queries are answered FTS-first, so exact names land precisely; very broad queries lean on vectors.
  • Diffusion didn't reach a peer. Both nodes must run a build with /api/grid/memory, share the grid secret, and have the grid up; the memory must be in a non-user namespace; allow one sweep (~sweep_secs). The user namespace never diffuses by design.

See also: Grid (distributed) · Configuration · Troubleshooting.

Clone this wiki locally