Skip to content

v0.9.2 — Local-only embeddings; BGE-large default (paired with agentic-harness v2.4.1)

Choose a tag to compare

@alexherrero alexherrero released this 20 May 23:51
· 476 commits to main since this release

Patch — embedding-mode collapse + default model upgrade. Drops the Voyage/Anthropic API embedding mode entirely; local sentence-transformers is now the only production mode. Default model upgraded from all-MiniLM-L6-v2 (384-d, MTEB English 56.3) to BAAI/bge-large-en-v1.5 (1024-d, MTEB English 64.2). EMBEDDING_DIM bumped 384 → 1024. Triggered by ROADMAP item #18 (inserted mid-flight of plan #7a part 5 / seed-pass on 2026-05-20). Implemented as plan #18 (7 tasks across 8 toolkit commits). Paired with agentic-harness v2.4.1 (doc-only on the harness side).

Why this shape: the primary operator is a Claude Ultra subscriber without a separate Anthropic / Voyage API key — the API path was unreachable for the toolkit's actual user. Dual-mode added surface area (mode resolution, env-var contract, dim-truncation, two test paths) without value for the personal-dev-env use case. Modern small-to-mid local models (BGE-large family, mxbai, nomic-embed) deliver near-SOTA MTEB results on desktop-class hardware (M-series + 64GB RAM) — the quality gap that motivated dual-mode is no longer load-bearing. Plan #18 was inserted mid-flight of plan #7a part 5 (seed-pass) because task 6 (validate via sample recalls) needs a worthwhile embedding model for validation signal to be meaningful; seed-pass resumes at task 6 with the new model after this release pair ships.

Decision rationale + 4 load-bearing assumptions with re-audit triggers in ADR 0001's 2026-05-20 amendment (operator decision: amend rather than write new ADR 0007). The parent MemoryVault design doc body was rewritten in-place across 12 substantive references to match the v0.9.2 state; Document History row 10 captures the rewrite scope.

Added

  • AGENT_TOOLKIT_EMBEDDING_MODEL env var escape hatch in skills/memory/scripts/embed.py — operators on low-spec hosts swap the BGE-large default for a smaller local model (e.g. all-MiniLM-L6-v2) without code changes. Still local-only — no API option ever.
  • rebuild subcommand in skills/memory/scripts/vec_index.py — drops entries virtual table + entry_meta table + recreates at current EMBEDDING_DIM. Preserves the embedding queue file. Returns stats dict (old_dim, new_dim, entries_dropped, queue_preserved) or {skipped: true, ...} for graceful-skip. Exit 0 on success / exit 2 on graceful-skip (matches size pattern).
  • Dim-mismatch detection in vec_index.py's _open_index() — introspects existing virtual-table schema via sqlite_master + the new _DIM_REGEX; on mismatch prints [vec_index] dim mismatch ... rebuild required: python3 vec_index.py rebuild --vault-path <path> to stderr + closes conn + returns None (graceful-skip; never blocks the prompt). Same path fires from drain_queue so dim-mismatch surfaces there too.
  • requirements.txt at repo root with the canonical Python dep list: pyyaml>=6.0, sqlite-vec>=0.1.0, sentence-transformers>=2.0. Comments document manual install + PEP 668 escape (--break-system-packages) + virtualenv pattern.
  • --no-python-deps / -NoPythonDeps flag in install.sh + install.ps1 — operator escape hatch for operators who manage Python deps via virtualenv / conda / system packages, or for CI to avoid the ~1.3GB sentence-transformers download per workflow run.
  • install_python_deps() function in install.sh (+ Install-PythonDeps in install.ps1) — best-effort pip-install of requirements.txt after the customization install loop. Idempotent quick-path checks importability before attempting install. Non-fatal failure with operator-facing hint for PEP 668 systems.
  • Local-mode integration test in scripts/smoke-install-{bash.sh,pwsh.ps1} guarded by SKIP_LOCAL_MODE_INTEGRATION env var (set by all 3 OS CI workflows to skip the BGE-large download). Operators with sentence-transformers installed run the test locally — invokes embed.py --mode local, asserts 1024-d JSON list, asserts all numeric values.
  • ADR 0001's 2026-05-20 amendment (43 new lines) — the v0.9.2 amendment block following the existing 2026-05-17 amendment shape. WHY narrowing + WHY NOT 4 alternatives + 5 operational changes + 4 load-bearing assumptions with re-audit triggers.

Changed

  • skills/memory/scripts/embed.py rewrite (109 ins / 115 del; 216 lines net). Removed: _embed_api() function + _VOYAGE_ENDPOINT / _VOYAGE_MODEL constants + dim-truncation logic + _resolve_mode()'s API branch + MEMORY_USE_API_EMBEDDINGS / VOYAGE_API_KEY / ANTHROPIC_API_KEY env var reads. Added: _DEFAULT_LOCAL_MODEL = "BAAI/bge-large-en-v1.5", EMBEDDING_DIM = 1024, _resolve_model() with AGENT_TOOLKIT_EMBEDDING_MODEL env var override, informative ValueError for "api" invocations pointing at v0.9.2 + ADR amendment. CLI --mode choices reduced from ["api","local","stub"] to ["local","stub"].
  • skills/memory/scripts/vec_index.py (209 ins / 18 del). Schema dim 384 → 1024. _open_index() extended with dim-mismatch detection. drain_queue() switched to use _open_index() as gating probe so dim-mismatch surfaces from drain. New rebuild_index() function + rebuild CLI subcommand.
  • skills/memory/scripts/recall.py + vec_index.py CLI --mode choices reduced from ["api","local","stub"] to ["local","stub"] (alignment with embed.py).
  • install.sh + install.ps1 install Python deps from requirements.txt by default (was: not installed at all; operators followed wiki docs to install manually). Same default-on-with-opt-out pattern as --no-pre-push-hook.
  • MemoryVault design doc body rewritten in-place across 12 substantive references (overview, infrastructure, recall engine, dependencies, tech debt #2 + #9, security network surface, reliability, privacy opt-out, latency budgets, project management § DD #7, operations monitoring). Each rewritten section cross-links to ADR 0001's amendment. Document History row 10 captures the rewrite scope. Old dual-mode narrative preserved only in the pre-existing 2026-05-15 / 2026-05-16 / 2026-05-17 Document History rows as historical record.
  • wiki/how-to/Use-The-Memory-Skill.md updates — Prereqs callout updated; § Embedding mode (was "Embedding modes" plural) rewritten with BGE-large + model swap escape hatch; troubleshooting entries for "embedding skipped" + "embedding unavailable" updated for local-only state; offline-capable recall paragraph updated.
  • Design doc parts files (write-primitives.md, recall-loop.md) updated to match v0.9.2 state — references to memory.use_api_embeddings flag + Anthropic API replaced with single-mode local sentence-transformers narrative + ADR amendment cross-refs.

Internal

  • Smoke install bash + pwsh tests updated: all 5 install.sh + 5 install.ps1 invocations pass --no-python-deps / -NoPythonDeps so CI doesn't pay the ~1.3GB sentence-transformers download × 5 install scenarios × 3 OS per workflow run. Default-mode-resolution test changed from 'api' expectation to 'local'; new tests for v0.9.2 --mode api ValueError, EMBEDDING_DIM=1024, AGENT_TOOLKIT_EMBEDDING_MODEL escape hatch, stub-mode 1024-d output, rebuild subcommand happy + graceful-skip paths, _DIM_REGEX parse correctness.
  • 8 commits across plan #18: 222fea6 (embed.py refactor) + 6f0383b + ce5b110 (task-1 CI fixups) + 18941ae + fb83437 (task-2 vec_index.py + CI fixup) + 4a9c74a (task-3 local-mode integration test) + 6633943 (task-4 install scripts) + 1b956f2 (task-5 ADR amendment) + this v0.9.2 release commit.