Skip to content

embeddings: pi spawn-on-miss bug + make openclaw an embedding producer #178

@efenocchi

Description

@efenocchi

Follow-up to #168. Surfaced during review by @kaghni who flagged that pi and openclaw had no/minimal changes in the embed-daemon fix PR. Investigation found two related gaps worth fixing together.

Problem 1 — pi: documented-but-missing spawn-on-miss

`pi/extension-source/hivemind.ts:166-175` documents auto-spawn-on-miss:

If the socket isn't there yet, we spawn the canonical daemon at `~/.hivemind/embed-deps/embed-daemon.js` (deposited by `hivemind embeddings install`) and wait for it to listen, mirroring the auto-spawn-on-miss logic in `src/embeddings/client.ts`. Subsequent agents (codex, CC, cursor, hermes, ...) connect to the SAME daemon — pi pays the cold-start cost only when it's the first user on the box.

But the implementation at lines 183-210 (`tryEmbedOverSocket`) only calls `connect()` and settles `null` on any error. No spawn ever happens. Result: if pi is the first agent to talk to the daemon on a given box (e.g. after reboot), it silently writes NULL into `message_embedding` until some other agent spawns the daemon. Same regression vector this PR #168 was opened to close, on a different code path.

Problem 2 — openclaw doesn't produce embeddings

OpenClaw is MCP-mode (`hivemind_*` tool contracts) and currently writes NULL on every vector column:

  • `upsertRowSql` (memory): `summary_embedding = NULL` on UPDATE, `NULL` literal on INSERT
  • session capture (`dist/index.js:1884`): `message_embedding` omitted from the column list → defaults NULL

It already imports from `../../src/`, already uses `createRequire(import.meta.url)` to bypass the esbuild stub on `node:child_process` (lines 79-80), and already spawns long-lived workers (`spawnOpenclawSkillifyWorker` at line 406). So the capability to spawn the daemon and embed is already there — we just don't.

OpenClaw must remain a consumer-only if the user hasn't run `hivemind embeddings install` (respect the opt-in invariant set in `src/user-config.ts` from #168 — no 600MB transformers download as a side effect). It should become a producer only when `~/.hivemind/embed-deps/embed-daemon.js` is already present.

Plan

  1. Extract a shared spawn-on-miss state machine in `src/embeddings/standalone-embed-client.ts` (~80 LOC). Used by openclaw (which imports from `src/`). Pi can't import from `src/` (raw .ts ship constraint) — gets a parallel inline implementation, but the test suite covers both via the shared helper.
  2. Fix pi: replace the broken `tryEmbedOverSocket` with the spawn-on-miss version. Bug fix.
  3. Wire openclaw as producer: `tryEmbedOverSocket` + spawn-on-miss in `upsertRowSql` and the sessions INSERT.

Three focused commits, matching the repo's "never >3 src files per commit across different layers" rule.

Edge case matrix (test coverage required)

# Scenario Expected behavior
1 Binary `~/.hivemind/embed-deps/embed-daemon.js` missing NULL silently; no spawn attempt (user hasn't opted in)
2 Binary present, socket missing, pidfile missing Spawn → wait-for-socket (≤5s) → embed
3 Binary present, socket alive, daemon healthy Connect directly → embed (happy path)
4 Socket file present but daemon dead (stale socket) Cleanup socket+pid → spawn → embed
5 Pidfile points to dead PID Stale → spawn
6 Pidfile points to live PID but socket absent (crash post-spawn) Respect the PID, wait for socket; on timeout → NULL (no cross-process SIGTERM — PID reuse risk, same problem #168 just fixed in `client.ts`)
7 Two agents race to spawn (e.g. claude_code + openclaw concurrent first-write) Winner via `wx` flag on pidfile; loser connects to winner's socket once it's up
8 Spawn fails (deps deleted between binary check and spawn) NULL silently
9 Daemon spawned but never opens socket (startup crash) 5s timeout → NULL
10 Embed request times out NULL
11 Daemon old version → `unknown op` on embed NULL (graceful, consistent with #168 narrative)

Acceptance criteria

  • `src/embeddings/standalone-embed-client.ts` exists with unit tests covering all 11 edge cases above (mock spawn + real socket against a stub daemon)
  • `pi/extension-source/hivemind.ts` actually spawns the daemon when the socket is absent (matches its own documentation)
  • `openclaw/src/index.ts`: `upsertRowSql` writes the document embedding (not NULL) when the daemon is available; session capture INSERT includes `message_embedding` in column list
  • E2E verification on `test_plugin/default/sessions_test` (never prod) showing openclaw writes produce non-NULL embeddings with semantic recall working
  • Coverage targets met (per-file ≥90% on all 3 new/modified files)

Out of scope

  • Auto-installing the transformers deps (`hivemind embeddings install`) from openclaw. Must remain explicit opt-in per fix(embeddings): silent NULL embeddings after marketplace upgrades #168.
  • Recycle-after-hello-mismatch logic in pi or openclaw. The shared canonical daemon binary is updated when the user reruns `embeddings install`; pi/openclaw just consume whatever's there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions