Skip to content

fix(embeddings): silent NULL embeddings after marketplace upgrades#168

Merged
efenocchi merged 27 commits into
mainfrom
fix/embed-daemon-transformers-resolution
May 18, 2026
Merged

fix(embeddings): silent NULL embeddings after marketplace upgrades#168
efenocchi merged 27 commits into
mainfrom
fix/embed-daemon-transformers-resolution

Conversation

@efenocchi
Copy link
Copy Markdown
Collaborator

@efenocchi efenocchi commented May 15, 2026

Summary

Embeddings have been silently failing in production. The capture hook
spawns the embed daemon from the marketplace plugin bundle path
(~/.claude/plugins/cache/hivemind/hivemind/<version>/bundle/embeddings/embed-daemon.js),
but @huggingface/transformers is installed only at
~/.hivemind/embed-deps/node_modules/ — reachable from the bundle via
a one-time symlink that hivemind embeddings install creates. Every
Claude Code marketplace auto-upgrade drops a new versioned cache dir
without the symlink, so the bundle daemon's
import("@huggingface/transformers") returns MODULE_NOT_FOUND,
embed() returns null, and sessions.message_embedding writes through
as NULL forever after. The user-facing log shows ~30 minutes of
embed err: Cannot find package lines.

This PR is a fix-it-forever overhaul of the embed daemon lifecycle:

  1. Explicit transformers resolver in the daemon. nomic.ts now
    loads transformers via
    createRequire(pathToFileURL("~/.hivemind/embed-deps/")).resolve()
    followed by an absolute-URL dynamic import, with a bare-specifier
    fallback for dev trees. The daemon resolves transformers regardless
    of which bundle path it was spawned from.

  2. Persistent opt-in via ~/.deeplake/config.json. The legacy
    HIVEMIND_EMBEDDINGS env var is read ONCE on first run to seed
    embeddings.enabled, then never consulted again. Source of truth
    is now a real file shared across all 4 agents.

  3. New CLI surface. hivemind embeddings install is the heavy
    path (deps + symlinks + enabled:true). enable / disable are
    light config flips. uninstall [--prune] is the heavy counterpart
    (removes symlinks + optionally wipes shared deps). Every agent's
    SessionStart injection advertises the full surface.

  4. Hello handshake + stuck-daemon recycle. The socket is per-UID,
    not per-version, so one stuck daemon poisons every subsequent
    session for up to 10 minutes of idle-out. The client now sends a
    hello on first connect; on path mismatch (typical post-upgrade)
    it SIGTERMs the daemon and clears its sock/pid. Same recycle fires
    if the daemon returns a transformers-missing error during
    embed(). Next session spawns fresh from the current bundle.

  5. Visible one-time notification. On transformers-missing, the
    client enqueues a warn-severity notification routed through the
    existing SessionStart drain so the user sees a clear "run
    hivemind embeddings install" message. Suppressed when the user
    has explicitly disabled embeddings — no nag for opt-outs.

  6. Self-heal symlink across cached versions. Each capture hook
    runs ensurePluginNodeModulesLink({ bundleDir }) at top-level.
    The first capture under a new plugin version (post-marketplace-
    upgrade) creates <pluginDir>/node_modules → ~/.hivemind/embed-deps/node_modules
    atomically; subsequent calls are O(1) no-ops. Conservative — never
    clobbers a real node_modules, never overrides a symlink to a
    different valid target, but cleans up dangling links so it can
    recreate them on the next call. Refuses to act outside the
    production bundle/ layout (guard against tests).

  7. Bundle-scan guards. New tests/claude-code/embeddings-bundle-scan.test.ts
    reads each agent's shipped embed-daemon.js, capture.js, and
    the CLI bundle, and asserts every literal string the runtime
    depends on survives bundling. 30-second reviewer guardrail.

Behavioral migration

Was Now
HIVEMIND_EMBEDDINGS=false env var ~/.deeplake/config.jsonembeddings.enabled: false
EmbeddingsStatus = "env-disabled" EmbeddingsStatus = "user-disabled"
enableEmbeddings() (heavy) installEmbeddings()
disableEmbeddings({ prune }) (heavy) uninstallEmbeddings({ prune })
(n/a) enableEmbeddings() (light) — flip config flag, warn if deps missing
(n/a) disableEmbeddings() (light) — flip flag + SIGTERM daemon
First-run migration env-var read exactly once: false/unset → enabled: false; else enabled: true

Most users have the env var unset, so on first run after this lands
they'll get enabled: false and need to run hivemind embeddings enable
or install to opt back in. That's intentional — explicit consent
instead of implicit on-by-default.

Test plan

  • npm test → 2457 passing (up from ~2371 on main).
  • tsc --noEmit clean.
  • Bundle scan asserts every agent's shipped artifact contains the
    fix.
  • Self-heal helper handles every fs combination (missing target,
    existing real dir, existing symlink-elsewhere, dangling symlink,
    non-bundle layout).
  • Manual sandbox capture test (per CLAUDE.md "never touch prod
    tables")
    :
    • Switch to test org:
      node codex/bundle/commands/auth-login.js org switch test_plugin
    • Point sessionsTableName at sessions_test in src/config.ts,
      rebuild, reinstall plugin.
    • Kill any pre-existing daemon:
      kill $(cat /tmp/hivemind-embed-$(id -u).pid 2>/dev/null) ; rm -f /tmp/hivemind-embed-$(id -u).{sock,pid}
    • Drive a real Claude Code session for a few prompts.
    • SELECT COUNT(*) AS total, COUNT(message_embedding) AS with_embed FROM sessions_test WHERE creation_date > NOW() - INTERVAL '5 minutes';
    • Expect with_embed == total.
    • Revert src/config.ts before merging.
  • Manual disable/enable path:
    • hivemind embeddings disable → next capture writes NULL, no
      daemon spawn, no notification.
    • hivemind embeddings enable → daemon spawns, embedding column
      populated.
  • Manual self-heal:
    • rm <marketplace-cache>/<version>/node_modules
    • Next capture event → symlink recreated.

Out of scope (filed separately)

  • 403 retries from the Deeplake SQL endpoint visible in
    ~/.deeplake/hook-debug.log. Different auth-layer issue,
    orthogonal to embeddings.

Summary by CodeRabbit

  • New Features

    • Added hivemind embeddings install to manage transformer dependencies separately from enablement
    • Added hivemind embeddings uninstall [--prune] for full cleanup with optional dependency pruning
    • Embeddings configuration now persists in ~/.deeplake/config.json for cross-session preference preservation
    • Daemon verification on startup to detect stale bundles and auto-recovery
    • User notifications when required dependencies are missing
  • Bug Fixes

    • Improved handling of stuck or incompatible embedding daemons with automatic recycling
    • Better error detection and messaging for missing transformer dependencies

efenocchi added 7 commits May 15, 2026 00:00
The embed daemon was failing silently after every marketplace plugin
upgrade because Node's bundle-relative module resolution could not find
`@huggingface/transformers`. The package is installed once at
`~/.hivemind/embed-deps/` by `hivemind embeddings install`, but new
versioned plugin cache dirs land without a `node_modules` symlink, so
the daemon's bare `import("@huggingface/transformers")` walks up to a
location that does not have the package. Each embed request then
returned `Cannot find package`, the client coerced it to a null
embedding, and `sessions.message_embedding` columns were written as
NULL with no surface error.

Rework `NomicEmbedder.load()` to resolve the package explicitly via
`createRequire(pathToFileURL("~/.hivemind/embed-deps/")).resolve(...)`
followed by `import(pathToFileURL(absMain))`. This bypasses Node's
upward walk entirely, so the daemon resolves transformers correctly
regardless of which bundle path it was spawned from. The bare-specifier
import remains as a fallback for dev-tree usage where the package is
colocated. If both fail, the thrown error message mentions
`hivemind embeddings install` so the failure is actionable in logs.

Tests use DI to inject the importer so they run identically on
machines that already have `~/.hivemind/embed-deps/` populated (which
would otherwise shadow `vi.mock("@huggingface/transformers")`).
Replace the implicit "embeddings on by default" + `HIVEMIND_EMBEDDINGS`
env-var override with an explicit, persistent opt-in stored on disk.
The new contract:

  - `~/.deeplake/config.json` → `embeddings.enabled: boolean` is the
    sole source of truth, shared across all agents (claude-code, codex,
    cursor, hermes, pi) because they all read the same `~/.deeplake/`.
  - Embeddings run only when `enabled === true`.
  - The legacy `HIVEMIND_EMBEDDINGS` env var is read EXACTLY ONCE — on
    the first run that has no `embeddings.enabled` key — to seed the
    persistent value. Migration rule: env=`false` or unset writes
    `enabled: false`; any truthy value writes `enabled: true`. After
    the seed is written, the env var is never consulted again.

New module `src/user-config.ts` provides `readUserConfig`,
`writeUserConfig` (atomic write + deep merge), `getEmbeddingsEnabled`
(with one-shot migration), and `setEmbeddingsEnabled`. Path is
overridable via `HIVEMIND_CONFIG_PATH` for tests.

`src/embeddings/disable.ts` no longer reads the env var directly.
`EmbeddingsStatus` renames the env-disabled variant to `user-disabled`,
which now reflects both legacy env-disabled and the new config-disabled
cases (both fold into the same user-driven opt-out). The transformers
probe is reordered to match the daemon's import resolution order
(canonical shared-deps first, bundle walk fallback), eliminating the
prior probe/use asymmetry where the probe could succeed and the daemon
still throw MODULE_NOT_FOUND.

A vitest setupFile (`tests/test-setup.ts`) pins `HIVEMIND_CONFIG_PATH`
to a per-process tmp dir so tests never mutate the developer's real
`~/.deeplake/config.json`, and defaults the test environment to
`HIVEMIND_EMBEDDINGS=true` so suites that don't explicitly exercise
the disabled path keep running with embeddings on.

Tests that previously set `HIVEMIND_EMBEDDINGS=false` to exercise the
disabled path now write a throwaway config file with
`embeddings.enabled: false` and point `HIVEMIND_CONFIG_PATH` at it.
Previously `hivemind embeddings install` ≡ `enable` (both did the heavy
deps + symlink work) and `disable` ≡ `uninstall` (both removed symlinks).
With the new persistent-config contract that mapping is wrong:
opting in / out of embeddings should be a lightweight config flip, while
managing the on-disk install is a separate, heavier operation.

New surface (all reflected in `--help` and every agent's SessionStart
injection):

  hivemind embeddings install       (heavy) npm-installs @huggingface/
                                    transformers into ~/.hivemind/embed-deps,
                                    symlinks every detected agent plugin to
                                    it, and sets embeddings.enabled=true.

  hivemind embeddings enable        (light) sets embeddings.enabled=true.
                                    Warns if shared deps missing.

  hivemind embeddings disable       (light) sets embeddings.enabled=false
                                    and SIGTERMs the running daemon +
                                    clears its sock/pid files so the
                                    change takes effect immediately,
                                    instead of waiting 10 min for idle-out.
                                    Shared deps stay on disk.

  hivemind embeddings uninstall     (heavy) removes every agent's symlink
   [--prune]                        into the shared deps, optionally
                                    prunes the shared dir itself, sets
                                    enabled=false, and SIGTERMs the daemon.

  hivemind embeddings status        Extended to show the config flag
                                    state alongside the deps + per-agent
                                    state, and a one-line action hint
                                    when the two disagree.

The CLI dispatcher matches these subcommands exactly (no more
install/enable aliasing), and `--with-embeddings` runs the heavy install
path. Per-agent SessionStart blocks for claude-code, codex, cursor, and
hermes now advertise all five subcommands so the model can suggest the
right one without guessing (per agents-deployment-session-start-injection
skill).

A best-effort `killEmbedDaemon()` helper reads the standard
`/tmp/hivemind-embed-${uid}.pid`, SIGTERMs the process, and unlinks
both the pidfile and socket. Tolerant of every missing-file combination.
…gnal

The embed daemon socket is per-UID, not per-plugin-version. After a
marketplace plugin upgrade replaces the bundle, the older daemon
process keeps its socket alive (up to 10 minutes of idle-out), so every
new session on every newer plugin version connects to the same stuck
daemon. When that stuck daemon can't resolve transformers from its own
(now-orphaned) bundle path, it returns MODULE_NOT_FOUND on every embed
call, and the rest of the session writes NULL into the embedding column
with no surface error. We've now seen this in production: the local
log shows ~30 minutes of `embed err: Cannot find package` lines on a
freshly-upgraded plugin.

Three additions:

1. **Hello handshake** (`protocol.ts` + `daemon.ts` + `client.ts`).
   First connect per `EmbedClient` instance sends `{ op: "hello" }`;
   the daemon answers with its own `daemonPath` (= `process.argv[1]`)
   and `pid`. If the running daemon's path doesn't match the client's
   configured `daemonEntry`, the client SIGTERMs the daemon and clears
   its socket + pidfile so the next call spawns a fresh daemon from
   the current bundle. Verified at most once per EmbedClient.

2. **Stuck-daemon recycle on transformers error** (`client.ts`).
   Embed responses matching `isTransformersMissingError` (the wrapper
   we throw from `defaultImportTransformers`, plus Node's standard
   MODULE_NOT_FOUND form) trigger the same recycle. Process-local
   guard so only the first failing call kills + cleans up.

3. **Visible one-time notification** (`client.ts`).
   On the same transformers-missing trigger, enqueue a `warn`-severity
   notification (`id: "embed-deps-missing"`, dedupKey carries the
   error detail) so the next SessionStart drain tells the user to run
   `hivemind embeddings install`. Suppressed when
   `embeddingsStatus() === "user-disabled"` — users who explicitly
   opted out via config don't get nagged. Process-local dedup so a
   single capture session enqueues at most one notification even if
   the embed call fires many times.

Net effect: a poisoned daemon survives at most one failed embed; the
first session after a plugin upgrade recycles it; the user gets a
clear, actionable notification instead of silent NULLs in the
sessions table.
`hivemind embeddings install` symlinks `<pluginDir>/node_modules` to
`~/.hivemind/embed-deps/node_modules` so Node's standard module
resolution finds @huggingface/transformers from anywhere inside
`<pluginDir>/bundle/…`. That works for the plugin version present at
install time — but Claude Code's marketplace auto-upgrades drop new
versioned cache dirs (`cache/hivemind/hivemind/0.7.27/`,
`0.7.28/`, …) WITHOUT the symlink. The user would have to manually
re-run `hivemind embeddings install` after every upgrade — and most
won't, so embeddings silently degrade.

New helper `src/embeddings/self-heal.ts` runs from each agent's
capture hook on every invocation. The first capture under a new
plugin version creates the symlink atomically (symlink to a `.tmp`
suffix, then rename); subsequent calls are O(1) no-ops once
`already-linked` is observed.

Conservative behavior — the helper NEVER:
- Clobbers an existing real `node_modules` directory.
- Overrides a symlink that points elsewhere to a valid target (user
  installed their own dependency tree).
- Acts when the shared-deps `node_modules` doesn't exist (returns
  `shared-deps-missing`; notification path covers user-facing surface).
- Acts when `bundleDir` basename isn't `bundle` (guards against the
  source-tree path being passed during tests — without this gate, a
  test importing `src/hooks/capture.ts` would symlink `src/node_modules`
  to the user's real shared deps).

What it DOES heal:
- Missing link → create.
- Dangling symlink (target deleted out from under it) → remove +
  next call re-creates.

Wired into all four capture hooks (claude-code, codex, cursor, hermes)
at top-level after the bundleDir is computed. Gated on
`!embeddingsDisabled()` so user-disabled installs don't accumulate
symlinks they don't want.
Per the project testing philosophy: source tests prove the helpers
are correct, bundle tests prove the build didn't drop them, re-inline
an old pattern, or otherwise regress on the shipped artifact. A
30-second reviewer guardrail.

For each of claude-code, codex, cursor, hermes:

- `bundle/embeddings/embed-daemon.js` contains the canonical
  shared-deps path fragments (".hivemind" + "embed-deps"),
  `createRequire` (proving the explicit-path resolver survived
  bundling), and the actionable error string "hivemind embeddings
  install" (proving the error message users will see in logs is in
  the shipped artifact).

- `bundle/capture.js` invokes `ensurePluginNodeModulesLink` (the
  self-heal helper), carries the `embed-deps-missing` notification
  dedupKey, and still names the `user-disabled` status (proving the
  opt-out guard survives bundling).

For the CLI:

- `bundle/cli.js` recognises all five embeddings subcommands
  (install/enable/disable/uninstall/status) and references
  `~/.deeplake/config.json` so the SessionStart injection text and
  the dispatcher agree.
Final step before staging per bundle-rebuild-before-staging skill.
All five source commits in this branch (C1–C5) modify shared
`src/embeddings/` and per-agent capture hooks, so every agent's
`bundle/` artifacts need refreshing:

  claude-code/bundle/{capture,embed-daemon,session-start,…}.js
  codex/bundle/{capture,embed-daemon,stop,…}.js
  cursor/bundle/{capture,embed-daemon,…}.js
  hermes/bundle/{capture,embed-daemon,…}.js
  pi/bundle/wiki-worker.js
  bundle/cli.js
  embeddings/embed-daemon.js (standalone daemon)

No source changes — `npm run build` output only. The companion
bundle-scan guards in tests/claude-code/embeddings-bundle-scan.test.ts
pass against these artifacts.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 12dc047f-564b-45cf-a7ef-e63f94da6a5e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a hello handshake to embeddings protocol, client verification and recycle, user-config–backed enablement with env migration, transformers shared-deps resolution, CLI install/enable/disable/uninstall/status, plugin node_modules self-heal, and broad bundle/help/test updates.

Changes

Embeddings lifecycle and CLI split

Layer / File(s) Summary
Protocol hello and daemon identity
src/embeddings/protocol.ts, src/embeddings/daemon.ts, */bundle/embeddings/embed-daemon.js
Adds hello request/response with daemonPath/pid/protocolVersion; daemons implement hello and track daemonPath.
Client verify/recycle and transformers-missing handling
src/embeddings/client.ts, `/bundle/(capture
pre-tool-use
User config, migration, and status gating
src/user-config.ts, src/embeddings/disable.ts, tests
Persists embeddings.enabled in ~/.deeplake/config.json with one-time env migration; status uses user-config; tests updated.
Transformers shared-deps importer
src/embeddings/nomic.ts, */bundle/embeddings/embed-daemon.js
Importer prefers canonical shared-deps and falls back to bare import; Nomic uses importer; actionable errors.
CLI commands and routing
src/cli/embeddings.ts, src/cli/index.ts, bundle/cli.js
Adds install/enable/disable/uninstall/status; install is heavy; enable/disable light; uninstall optional prune; status reflects config.
Self-heal and hook wiring
src/embeddings/self-heal.ts, src/hooks/*/capture.ts, bundles
Adds atomic symlink self-heal for plugin node_modules; hooks perform best-effort self-heal when embeddings enabled.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant UserConfig
  participant Daemon
  participant Client
  CLI->>UserConfig: set embeddings.enabled = true/false
  Client->>Daemon: hello {id}
  Daemon-->>Client: {daemonPath, pid, protocolVersion}
  Client->>Daemon: embed {text}
  Daemon-->>Client: {embedding|error}
  Client-->>UserConfig: enqueue notification (deps missing) [when enabled]
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Possibly related PRs

Suggested reviewers

  • kaghni

Poem

A rabbit taps the daemon’s door—hello! what path are you?
If wires cross, it sweeps the burrow, grows the links anew.
It files a note in Deeplake’s nook, “enable if you wish,”
And when transformers hide away, it serves a gentle dish:
“Run embeddings install,” it squeaks—then hops to prune or swish. 🐇

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/embed-daemon-transformers-resolution

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 92.24% (🎯 90%) 1402 / 1520
🟢 Statements 90.82% (🎯 90%) 1633 / 1798
🟢 Functions 91.59% (🎯 90%) 207 / 226
🔴 Branches 83.47% (🎯 90%) 843 / 1010
File Coverage — 19 files changed
File Stmts Branches Functions Lines
src/cli/embeddings.ts 🔴 49.0% 🔴 50.8% 🔴 75.0% 🔴 48.1%
src/cli/index.ts 🟢 97.4% 🟢 95.9% 🟢 100.0% 🟢 97.3%
src/embeddings/client.ts 🟢 95.2% 🔴 83.3% 🟢 93.5% 🟢 96.6%
src/embeddings/daemon.ts 🟢 92.9% 🔴 75.0% 🔴 78.9% 🟢 97.1%
src/embeddings/disable.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/embeddings/nomic.ts 🟢 97.3% 🟢 94.6% 🟢 100.0% 🟢 100.0%
src/embeddings/protocol.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/embeddings/self-heal.ts 🟢 91.9% 🔴 66.7% 🟢 100.0% 🟢 91.2%
src/hooks/capture.ts 🟢 98.7% 🔴 86.4% 🟢 100.0% 🟢 100.0%
src/hooks/codex/capture.ts 🟢 100.0% 🔴 86.8% 🟢 100.0% 🟢 100.0%
src/hooks/cursor/capture.ts 🟢 94.0% 🟢 90.0% 🟢 100.0% 🟢 95.8%
src/hooks/cursor/session-start.ts 🟢 100.0% 🟢 95.0% 🟢 100.0% 🟢 100.0%
src/hooks/hermes/capture.ts 🔴 86.5% 🔴 83.6% 🟢 100.0% 🔴 86.5%
src/hooks/hermes/session-start.ts 🟢 100.0% 🟢 93.8% 🟢 100.0% 🟢 100.0%
src/hooks/session-start-setup.ts 🟢 100.0% 🟢 95.0% 🟢 100.0% 🟢 100.0%
src/hooks/session-start.ts 🟢 100.0% 🟢 90.5% 🟢 100.0% 🟢 100.0%
src/notifications/queue.ts 🟢 97.2% 🔴 79.2% 🟢 100.0% 🟢 98.4%
src/shell/deeplake-fs.ts 🔴 89.7% 🔴 81.4% 🔴 86.4% 🟢 93.9%
src/user-config.ts 🟢 95.2% 🔴 74.3% 🟢 90.9% 🟢 94.8%

Generated for commit ea0af71.

@coderabbitai coderabbitai Bot requested a review from kaghni May 15, 2026 00:34
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/cursor/cursor-capture-hook.test.ts (1)

272-289: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing cleanup: temp directory leaks on every test run.

The test creates a temp directory and config file but never removes them. This will accumulate garbage in the system temp dir over repeated test runs.

🧹 Proposed fix to clean up temp resources
  it("user-disabled embeddings short-circuit to NULL without invoking EmbedClient", async () => {
    stdinMock.mockResolvedValue({
      conversation_id: "sid-emb-3",
      hook_event_name: "beforeSubmitPrompt",
      prompt: "disabled",
    });
    // Point user-config at a throwaway path that says enabled:false.
    const { writeFileSync, mkdtempSync } = await import("node:fs");
+   const { rmSync } = await import("node:fs");
    const { tmpdir } = await import("node:os");
    const { join } = await import("node:path");
    const dir = mkdtempSync(join(tmpdir(), "cursor-cap-disabled-"));
    const cfgPath = join(dir, "config.json");
    writeFileSync(cfgPath, JSON.stringify({ embeddings: { enabled: false } }), "utf-8");
-   await runHook({ HIVEMIND_CONFIG_PATH: cfgPath });
+   try {
+     await runHook({ HIVEMIND_CONFIG_PATH: cfgPath });
+   } finally {
+     rmSync(dir, { recursive: true, force: true });
+   }
    const sql = queryMock.mock.calls[0][0] as string;
    expect(sql).toContain("'::jsonb, NULL,");
    expect(sql).toMatch(/, message_embedding,/);
  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cursor/cursor-capture-hook.test.ts` around lines 272 - 289, The test
creates a temp directory (mkdtempSync -> dir) and config file (cfgPath) but
never removes them; update the test around runHook({ HIVEMIND_CONFIG_PATH:
cfgPath }) in cursor-capture-hook.test.ts to ensure the temp directory is
cleaned (use fs.rmSync or fs.promises.rm with recursive/force) in a finally
block or after/teardown so the temp dir and config file created by writeFileSync
are removed regardless of test outcome; reference mkdtempSync, writeFileSync,
cfgPath, dir and runHook when implementing the cleanup.
🧹 Nitpick comments (1)
hermes/bundle/shell/deeplake-shell.js (1)

67811-67823: 💤 Low value

deepMerge performs only one level of nested merging, not recursive.

The function merges only one level deep: nested objects are spread ({...baseVal, ...patchVal}) rather than recursively merged. For the current embeddings.enabled use case this is fine, but deeper structures like {a: {b: {c: 1}}} patched with {a: {b: {d: 2}}} would lose c.

Consider renaming to shallowMergeNested or adding a comment clarifying the single-level constraint if deeper configs are anticipated.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hermes/bundle/shell/deeplake-shell.js` around lines 67811 - 67823, The
function deepMerge only merges nested objects one level deep (it spreads baseVal
and patchVal rather than recursively merging), so rename deepMerge to
shallowMergeNested (or equivalent) and add a concise JSDoc/comment on the
function explaining it intentionally performs only a single-level nested merge;
update any callers referencing deepMerge to use the new name (search for
deepMerge in the bundle) so callers and future maintainers understand the
single-level constraint.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bundle/cli.js`:
- Around line 3825-3837: The killEmbedDaemon function currently reads a PID from
pidPathFor and sends SIGTERM blindly; instead verify the PID refers to the embed
daemon before signaling: after reading pid (readFileSync8) and before calling
process.kill, check the live process identity (e.g., inspect /proc/<pid>/cmdline
or use a small handshake via socketPathFor to confirm it responds as the embed
daemon) and only signal when that validation succeeds; if validation fails treat
the pidfile/socket as stale and remove pidPath/socketPath rather than killing an
unrelated process.
- Around line 3774-3777: The linkAgent function currently always calls
symlinkForce(join11(install.pluginDir, "node_modules")) which unlinks any
preexisting path and causes EISDIR if an agent already has a real node_modules
directory; before calling symlinkForce, check the fs.lstat/lstatSync on the
target link (the join11(..., "node_modules") path): if it exists and is a real
directory (stats.isDirectory() && !stats.isSymbolicLink()) rename or move it
(e.g., append ".own" or similar) and log that we preserved the agent's own
node_modules, otherwise if it’s a symlink or file remove it and then call
symlinkForce(SHARED_NODE_MODULES, link); update linkAgent to perform this guard
using the same link variable so symlinkForce and SHARED_NODE_MODULES usage
remain unchanged.

In `@claude-code/bundle/capture.js`:
- Around line 1766-1785: The current branch that handles a dangling symlink
(when linkStat.isSymbolicLink() is true) removes the stale symlink with
rmSync(link) and returns without recreating it, leaving self-heal incomplete;
change the catch block inside the dangling-link path so that after successfully
rmSync(link) you immediately recreate the symlink pointing at target using
symlinkSync(target, link[, type]) (choose 'junction' for Windows if needed), and
then return a success kind (e.g., "recreated-link" with link and target) instead
of "stale-link-removed"; update any callers/consumers expecting
"stale-link-removed" to handle the new return kind if necessary.

In `@claude-code/bundle/pre-tool-use.js`:
- Around line 1316-1343: In verifyDaemonOnce, don't set this.helloVerified at
the top; instead only set it after confirming a compatible hello response — move
the assignment to after you've confirmed hello.daemonPath exists and equals
this.daemonEntry (i.e., after the equality check where currently you return on
match), so that failed probes, missing daemonPath, or path mismatches (and
subsequent recycleDaemon calls) do not permanently mark the client as verified;
keep existing behavior for early returns when !this.daemonEntry or on exceptions
by not setting helloVerified in those code paths.

In `@claude-code/bundle/shell/deeplake-shell.js`:
- Around line 68288-68292: resolveEmbedDaemonPath currently builds the daemon
path as join(dirname(fileURLToPath(import.meta.url)), "embeddings",
"embed-daemon.js") which points to bundle/shell/embeddings/embed-daemon.js but
the actual daemon lives in the sibling bundle/embeddings directory; update
resolveEmbedDaemonPath (same function called when spawning EmbedClient) to
include the parent-traversal segment (add ".." into the join parts, mirroring
the pattern used by resolveGrepEmbedDaemonPath) so the join becomes dirname ->
".." -> "embeddings" -> "embed-daemon.js".

In `@codex/bundle/shell/deeplake-shell.js`:
- Around line 68290-68292: resolveEmbedDaemonPath currently constructs the path
"embeddings/embed-daemon.js" relative to the shell bundle directory, which
resolves to codex/bundle/shell/embeddings/embed-daemon.js but the actual bundled
file is at codex/bundle/embeddings/embed-daemon.js; update the path construction
in resolveEmbedDaemonPath so it points to the parent
"embeddings/embed-daemon.js" (i.e., join the dirname of import.meta.url with
".." and "embeddings" then "embed-daemon.js") so daemonEntry references the
correct bundled file.

In `@codex/bundle/stop.js`:
- Around line 1184-1187: enqueueNotification currently always appends the
notification which allows duplicates across processes; change it to honor
n.dedupKey by reading the existing queue via readQueue(), checking if any queued
item has the same dedupKey (and skip push if found), and only push/writeQueue
when dedupKey is absent or unique; reference the enqueueNotification function
and use readQueue/writeQueue and the notification object's dedupKey field to
implement the guard so duplicate "embed-deps-missing" entries are not persisted.

In `@cursor/bundle/shell/deeplake-shell.js`:
- Around line 67910-67918: The embed() flow can send the request on a stale
socket because verifyDaemonOnce() may recycle the daemon; fix by making
verifyDaemonOnce() indicate when it recycled (e.g., return a boolean or the
fresh socket) and then, inside embed(), re-acquire the current socket before
calling sendAndWait() so the request is sent to the new daemon; update embed()
(and the similar block around the 67942-67968 range) to check the return from
verifyDaemonOnce(), refresh/get the socket (the same socket variable passed to
sendAndWait) when a recycle occurred, and only then construct/send the req via
this.sendAndWait().
- Around line 68182-68184: The current isTransformersMissingError function is
too broad because it matches any "MODULE_NOT_FOUND"; change it to only return
true when the missing-module error specifically references transformers-related
packages. Update isTransformersMissingError to check err.code ===
'MODULE_NOT_FOUND' (or the error string) AND test err.message/err for package
names like `@huggingface/transformers`, transformers, hivemind, or accelerate
(e.g. /(`@huggingface`\/transformers|transformers|hivemind|accelerate)/i) so
unrelated missing-module errors are not misclassified.
- Around line 68290-68292: The resolveEmbedDaemonPath function currently builds
a path to "embeddings/embed-daemon.js" relative to the shell bundle directory,
but the emitted daemon actually lives in the parent bundle's embeddings folder;
update resolveEmbedDaemonPath (and the join11/dirname5/fileURLToPath usage) to
ascend one directory before "embeddings" (e.g.
join11(dirname5(fileURLToPath(import.meta.url)), "..", "embeddings",
"embed-daemon.js")) so the resolved path points to
bundle/embeddings/embed-daemon.js instead of shell/embeddings/embed-daemon.js.

In `@hermes/bundle/capture.js`:
- Around line 614-618: enqueueNotification currently always appends
notifications to the persisted queue so cross-process runs re-enqueue “one-time”
warnings; change enqueueNotification to read the queued items (via readQueue),
check for an existing notification with the same unique id/type (e.g.
"embed-deps-missing") or a oneTime flag before pushing, and only push if not
already present, then writeQueue; use the same identifying key the runtime uses
for suppression (the embed-deps-missing id or a new oneTime property) so
_signalledMissingDeps in-memory suppression remains complementary to the
persisted dedupe.
- Around line 1099-1118: The stale-symlink branch currently removes the dangling
link and returns { kind: "stale-link-removed" }, which prevents the caller from
getting a usable node_modules link; instead, after successfully rmSync(link) in
the catch block you should not return immediately but fall through to the same
symlink-creation path used when the link does not exist. Concretely: in the
block under if (linkStat.isSymbolicLink()) where you catch a failed statSync,
remove the early return ({ kind: "stale-link-removed" }); after rmSync(link)
continue execution so the subsequent code that creates the symlink (the same
logic that runs when no link exists) runs and returns the successful "linked"
(or existing success) result; keep use of readlinkSync, statSync, rmSync and the
existing result kinds consistent.

In `@hermes/bundle/wiki-worker.js`:
- Around line 650-651: The current isTransformersMissingError detector is too
broad because it matches any "MODULE_NOT_FOUND"; update
isTransformersMissingError to only treat errors as transformer-missing when the
message explicitly references transformer-related identifiers (for example
'@huggingface/transformers', the word 'transformers' near the MODULE_NOT_FOUND
text, or the specific install hint 'hivemind embeddings install'), rather than
matching bare MODULE_NOT_FOUND; locate the isTransformersMissingError function
and tighten the regex/logic to require transformer-specific context (or prefer
checking for a transformer-specific sentinel error code returned by the daemon
if available) so unrelated packaging/runtime failures are not misclassified.

In `@src/embeddings/client.ts`:
- Around line 128-214: Add focused unit tests for EmbedClient to hit the
conditional branches in verifyDaemonOnce, handleTransformersMissing, and
recycleDaemon: mock sendAndWait to return a HelloResponse with no daemonPath to
assert no recycle; mock sendAndWait to return a mismatched daemonPath and assert
recycleDaemon called once and _recycledStuckDaemon becomes true (and calling
verifyDaemonOnce again does not re-trigger recycle); for
handleTransformersMissing stub embeddingsStatus to return "user-disabled" and
verify enqueueNotification is not called, and stub it to return enabled and
verify enqueueNotification is called once and _signalledMissingDeps is set; for
recycleDaemon mock readFileSync to supply a pid, spy process.kill and unlinkSync
to verify pid kill + socket/pid unlink path and also test the null reportedPid
branch where pidfile is missing. Reset module-level flags (_recycledStuckDaemon,
_signalledMissingDeps) between tests and use spies/mocks for sendAndWait,
recycleDaemon, enqueueNotification, embeddingsStatus, readFileSync, unlinkSync,
and process.kill to assert expected side effects.

In `@tests/claude-code/session-start-setup-hook.test.ts`:
- Around line 230-235: This test sets process.env.EMBEDDINGS_DISABLED_FOR_TEST
but never clears it, causing later tests to inherit the disabled state; after
calling runHook(...) in the "skips warmup..." test, restore or delete
process.env.EMBEDDINGS_DISABLED_FOR_TEST (e.g., delete
process.env.EMBEDDINGS_DISABLED_FOR_TEST or set it back to its original value)
so subsequent calls to runHook() behave deterministically; update the test
around runHook, embedWarmupMock, and debugLogMock assertions to ensure the env
key is cleaned up before the test ends.

In `@tests/hermes/hermes-capture-hook.test.ts`:
- Around line 275-281: Test mutates HIVEMIND_CONFIG_PATH and leaves a temp dir;
wrap the runHook call in a try/finally that saves the original
process.env.HIVEMIND_CONFIG_PATH, sets it to cfgPath, and in finally restores
the original env value (or deletes the key if it was undefined) and removes the
temp directory (use fs.rmSync(dir, { recursive: true, force: true }) or
fs.rmdirSync for compatibility). Locate the code around runHook, cfgPath,
mkdtempSync, writeFileSync and update the test to import/removal functions and
perform cleanup to avoid leaking state between tests.

---

Outside diff comments:
In `@tests/cursor/cursor-capture-hook.test.ts`:
- Around line 272-289: The test creates a temp directory (mkdtempSync -> dir)
and config file (cfgPath) but never removes them; update the test around
runHook({ HIVEMIND_CONFIG_PATH: cfgPath }) in cursor-capture-hook.test.ts to
ensure the temp directory is cleaned (use fs.rmSync or fs.promises.rm with
recursive/force) in a finally block or after/teardown so the temp dir and config
file created by writeFileSync are removed regardless of test outcome; reference
mkdtempSync, writeFileSync, cfgPath, dir and runHook when implementing the
cleanup.

---

Nitpick comments:
In `@hermes/bundle/shell/deeplake-shell.js`:
- Around line 67811-67823: The function deepMerge only merges nested objects one
level deep (it spreads baseVal and patchVal rather than recursively merging), so
rename deepMerge to shallowMergeNested (or equivalent) and add a concise
JSDoc/comment on the function explaining it intentionally performs only a
single-level nested merge; update any callers referencing deepMerge to use the
new name (search for deepMerge in the bundle) so callers and future maintainers
understand the single-level constraint.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c577da79-afa6-4d82-9398-f8195e33ee25

📥 Commits

Reviewing files that changed from the base of the PR and between bdffdf0 and ed4b564.

📒 Files selected for processing (61)
  • bundle/cli.js
  • claude-code/bundle/capture.js
  • claude-code/bundle/embeddings/embed-daemon.js
  • claude-code/bundle/pre-tool-use.js
  • claude-code/bundle/session-start-setup.js
  • claude-code/bundle/session-start.js
  • claude-code/bundle/shell/deeplake-shell.js
  • claude-code/bundle/wiki-worker.js
  • codex/bundle/capture.js
  • codex/bundle/embeddings/embed-daemon.js
  • codex/bundle/pre-tool-use.js
  • codex/bundle/session-start.js
  • codex/bundle/shell/deeplake-shell.js
  • codex/bundle/stop.js
  • codex/bundle/wiki-worker.js
  • cursor/bundle/capture.js
  • cursor/bundle/embeddings/embed-daemon.js
  • cursor/bundle/pre-tool-use.js
  • cursor/bundle/session-start.js
  • cursor/bundle/shell/deeplake-shell.js
  • cursor/bundle/wiki-worker.js
  • embeddings/embed-daemon.js
  • hermes/bundle/capture.js
  • hermes/bundle/embeddings/embed-daemon.js
  • hermes/bundle/pre-tool-use.js
  • hermes/bundle/session-start.js
  • hermes/bundle/shell/deeplake-shell.js
  • hermes/bundle/wiki-worker.js
  • pi/bundle/wiki-worker.js
  • src/cli/embeddings.ts
  • src/cli/index.ts
  • src/embeddings/client.ts
  • src/embeddings/daemon.ts
  • src/embeddings/disable.ts
  • src/embeddings/nomic.ts
  • src/embeddings/protocol.ts
  • src/embeddings/self-heal.ts
  • src/hooks/capture.ts
  • src/hooks/codex/capture.ts
  • src/hooks/codex/session-start.ts
  • src/hooks/cursor/capture.ts
  • src/hooks/cursor/session-start.ts
  • src/hooks/hermes/capture.ts
  • src/hooks/hermes/session-start.ts
  • src/hooks/session-start-setup.ts
  • src/hooks/session-start.ts
  • src/user-config.ts
  • tests/claude-code/embeddings-bundle-scan.test.ts
  • tests/claude-code/embeddings-client.test.ts
  • tests/claude-code/embeddings-disable.test.ts
  • tests/claude-code/embeddings-nomic.test.ts
  • tests/claude-code/embeddings-self-heal.test.ts
  • tests/claude-code/session-start-setup-hook.test.ts
  • tests/claude-code/user-config.test.ts
  • tests/claude-code/wiki-worker-plugin-version.test.ts
  • tests/cli/cli-embeddings.test.ts
  • tests/cli/cli-index.test.ts
  • tests/cursor/cursor-capture-hook.test.ts
  • tests/hermes/hermes-capture-hook.test.ts
  • tests/test-setup.ts
  • vitest.config.ts

Comment thread bundle/cli.js
Comment thread bundle/cli.js Outdated
Comment thread claude-code/bundle/capture.js
Comment thread claude-code/bundle/pre-tool-use.js Outdated
Comment thread claude-code/bundle/pre-tool-use.js
Comment thread hermes/bundle/capture.js
Comment thread hermes/bundle/wiki-worker.js Outdated
Comment thread src/embeddings/client.ts
Comment thread tests/claude-code/session-start-setup-hook.test.ts
Comment thread tests/hermes/hermes-capture-hook.test.ts
efenocchi added 18 commits May 15, 2026 00:48
Two issues surfaced during real e2e verification against the
test_plugin sandbox table.

1) **CJS default-export unwrap in nomic.ts.**
   `createRequire(base).resolve("@huggingface/transformers")` honors the
   package's `"require"` conditional and returns the path to the CJS
   bundle (`./dist/transformers.node.cjs`). A subsequent dynamic
   `import(pathToFileURL(absMain).href)` wraps the CJS module as
   `{ default: <exports>, __esModule: true }`, so the daemon's
   `mod.env.allowLocalModels = false` line threw
   `Cannot set properties of undefined`. Added a
   `normalizeTransformersModule` helper that returns `mod.default`
   when it carries `pipeline`, else returns the bare module — works
   for both the CJS-resolved-by-require path and the ESM-resolved-by-
   import path (dev tree).

2) **Recycle on `unknown op` from hello.**
   A pre-handshake daemon (i.e. anything before this PR lands) answers
   `{ op: "hello" }` with `{ id, error: "unknown op" }` and no
   `daemonPath`. The previous check skipped the mismatch path in that
   case ("no daemonPath; skipping mismatch check") — meaning a stuck
   older daemon would keep poisoning sessions forever. Now treat a
   missing `daemonPath` the same as a path mismatch: the running
   daemon doesn't speak the current protocol, so it can't be trusted
   and gets recycled.

End-to-end verification on the test_plugin org's `sessions_test`
table: with both fixes in place, a real capture hook run produced a
row with `len=768` for `message_embedding` — the first non-NULL
embedding written to that table since the regression was filed.
…gent thrash)

The hello-handshake recycle was too aggressive. The original logic
fired SIGTERM whenever the running daemon's `daemonPath` differed
from the client's expected `daemonEntry`. On a single-agent machine
that catches the legitimate "marketplace upgrade replaced the
bundle, old daemon still running with old code" case. But on
multi-agent machines (a Hivemind user running claude-code + codex,
or anyone using pi) it causes endless thrash:

  - claude-code spawns daemon at <cc-bundle>/embed-daemon.js
  - codex fires capture, connects, hello returns cc's path,
    codex expects codex's path → MISMATCH → recycle the working
    daemon
  - codex spawns its own daemon at <cod-bundle>/embed-daemon.js
  - claude-code's next capture → mismatch → recycle codex's
  - ...forever

Differentiate "stale (GC'd) bundle" from "different but
functionally-equivalent bundle" via filesystem check:

  - `!hello.daemonPath`           → older daemon, no handshake support → recycle
  - `daemonPath !== entry`
      AND `!existsSync(daemonPath)` → orphaned bundle (GC'd) → recycle
      AND `existsSync(daemonPath)`  → multi-agent share → KEEP DAEMON

The marketplace-upgrade case still works because Claude Code's
plugin-cache-gc eventually prunes old versioned dirs. Until then,
the embed-error trigger (still in place) catches a stuck daemon
whose code is buggy regardless of path.

Verified live: spawned claude-code's daemon, then connected as
codex from a separate client. Codex got its embedding via
claude-code's daemon (vec length 768) without triggering recycle.
Socket and daemon survived intact.

Pi is also covered without changes — it passes no `daemonEntry`
(uses the canonical shared daemon at ~/.hivemind/embed-deps/
embed-daemon.js). When pi runs alongside any agent, its expected
entry differs from the running agent's bundle path, but both files
exist → no recycle, pi happily reuses whatever's warm.

Openclaw is out of scope — it doesn't embed locally (uses MCP
contracts that delegate to the cloud).
Brings in #129 (skillify mine-local) and v0.7.27 plumbing. Conflict
surface was the SessionStart injection blocks where both branches
touched the same per-agent command-list strings.

Resolution:
- src/hooks/codex/session-start.ts → take origin/main's version. Codex's
  hook stdout is rendered user-visible by its harness (see
  AGENT_CHANNELS.md → Codex and tests/claude-code/skillify-session-start-injection.test.ts),
  so the verbose context+command block must NOT live there. The embeddings
  commands stay discoverable to codex via bash + the auto-loaded
  deeplake-memory skill.
- src/hooks/cursor/session-start.ts, src/hooks/hermes/session-start.ts →
  switch the inline SKILLS block to origin/main's ${renderSkillifyCommands()}
  helper and append the existing Embeddings block right after, so both
  surfaces show up in the injected context without duplicating the
  skillify list.
- bundle/cli.js, codex/bundle/session-start.js, cursor/bundle/session-start.js,
  hermes/bundle/session-start.js → resolved by `npm run build` rather than
  hand-merging generated output.

Verification: full vitest run is 2646/2646 green across 130 test files,
including the new skillify drift/injection guards from #129.
CI was failing on the `src/embeddings/nomic.ts` 90% coverage gate:
the three helpers introduced for the canonical-shared-deps resolver
(`importFromCanonicalSharedDeps`, `importFromBareSpecifier`,
`normalizeTransformersModule`) were only exercised indirectly through
`defaultImportTransformers(canonical, bare)` with stubbed DI, leaving
lines 38-62 uncovered (lines 82.53%, functions 68.75%, statements 80.51%,
branches 75% — all below the 90% per-file threshold).

Changes:
- Export the three helpers with `_` prefix so tests can drive them
  directly without going through `defaultImportTransformers`.
- Replace the two `() => defaultImportTransformers()` arrow wrappers
  (initial value of `_importTransformers` and the reset target) with a
  direct reference to `defaultImportTransformers` — both call sites
  pass zero args, the function has all-defaulted params, and the arrow
  was just dead surface area that v8 counted as an extra function.

Tests added in tests/claude-code/embeddings-nomic.test.ts:
- `_normalizeTransformersModule` × 4: CJS-wrapped under `.default`,
  ESM-shape at top level, `.default`-without-pipeline passthrough,
  `.default`-falsy passthrough.
- `_importFromBareSpecifier` × 1: returns the vi.mocked module after
  normalization (also forces `default: undefined` in the mock factory
  so vitest's auto-mock proxy doesn't trip the normalizer's `.default`
  probe).
- `_importFromCanonicalSharedDeps` × 2: real on-disk fixture with a
  minimal `@huggingface/transformers` package under
  `<tmpDir>/node_modules/`, plus the missing-package error path.
- `defaultImportTransformers` × 1: non-Error rejection (string/object)
  flows through the String() fallback in the wrapped error message.

Result on `src/embeddings/nomic.ts`:
  stmts 80.51% → 97.29% | branches 75% → 94.59%
  funcs 68.75% → 100%   | lines    82.53% → 100%
All four metrics now clear the 90% gate.

Bundles regenerated (transformers resolver lives inside the embed
daemon, so each `*/bundle/embeddings/embed-daemon.js` artifact picks
up the renamed-to-`_`-prefix helpers).
…ephew

The shell-bundle daemon resolver in src/shell/deeplake-fs.ts built
`dirname(import.meta.url) + "embeddings/embed-daemon.js"`. The shell
gets bundled to `<agent>/bundle/shell/deeplake-shell.js`, so the
computed daemonEntry pointed at the non-existent path
`<agent>/bundle/shell/embeddings/embed-daemon.js` — the real daemon
lives one level up at `<agent>/bundle/embeddings/embed-daemon.js`.

The capture/wiki-worker hooks always got this right because their
source files live at `src/hooks/` and bundle to `<agent>/bundle/`
(same level as `embeddings/`). Only the shell entry point was off,
which silently broke the pre-tool-use shell embed path on every agent.
`src/shell/grep-interceptor.ts` already had the correct `..` form, so
this was a one-off oversight rather than a systemic resolver bug.

Fix: add the missing `..` so the shell resolver walks up to
`bundle/` before descending into `embeddings/`.

Regression guard in tests/claude-code/embeddings-bundle-scan.test.ts:
for each agent's shipped shell bundle, assert (a) the bundle exists,
(b) the daemon file actually exists at the parent-of-shell location,
and (c) the literal `".."` survived bundling immediately before
`"embeddings"` in the path-join sequence — without it the bundle reads
as `"embeddings", "embed-daemon.js"` (the broken shape).
Before: `verifyDaemonOnce()` set `this.helloVerified = true` *before*
awaiting the probe response. A transient probe failure (socket dies
mid-RTT, daemon hiccup) on the first connect therefore permanently
disabled verification for every subsequent embed call on the same
EmbedClient, exactly the safeguard the handshake exists to provide.

The original comment claimed this was intentional ("mark verified
even on mismatch so we don't re-issue the hello against the next,
fresh connection"). That reasoning only holds for the recycle path,
where we've actively chosen to spawn a fresh daemon from
`this.daemonEntry` — so verifying the newcomer is redundant. It does
NOT hold for a probe failure, which is by definition inconclusive.

Fix: only set `helloVerified = true` once we've observed a compatible
response. All other branches — probe catch, recycle, dedup-skip —
leave the flag false so the next reconnect re-runs verification (one
extra hello round-trip in the recycle path, which is a no-op against
a daemon we just spawned).

Test in tests/claude-code/embeddings-client.test.ts simulates a
genuine transient probe failure (server destroys the socket before
responding to the first hello) and asserts the second embed call
re-issues the probe — previously it would silently skip and proceed
on an unverified socket.
Before: `embed()` called `verifyDaemonOnce(sock)` and then proceeded to
send the embed request on the SAME socket, even when the verify step
had just SIGTERMed the daemon and removed its sock/pid files. The
embed silently dropped onto a dead connection — the very session that
caused the recycle saw a NULL `message_embedding` even though the
recycle freshly fixed the underlying state. That defeats the whole
"first call recovers" promise of the recycle path.

Fix:
- `verifyDaemonOnce()` now returns a boolean: `true` if it killed the
  daemon mid-call ("recycled"), `false` otherwise.
- `embed()` splits into a thin outer wrapper + `embedAttempt()`. When
  the attempt sees the recycled signal, the outer wrapper spawns a
  fresh daemon (autoSpawn) and retries once, polling for the new sock
  via `waitForDaemonReady()`. With autoSpawn off (tests, pi's canonical-
  shared-deps fallback) the outer wrapper just returns null — the caller
  treats it the same as any other transient miss.

Tests:
- "recycled probe + autoSpawn=false returns null cleanly (no hang on
  dead socket)" asserts the dead-socket embed_attempts === 0, i.e. the
  embed request is NOT sent post-recycle. Before the fix this counter
  would increment.
- Three pre-existing tests that relied on the dev-machine
  SHARED_DAEMON_PATH fallback to mask the bug now explicitly opt out
  of verification via `daemonEntry: ""` — they exercise transformers-
  missing / embed semantics, not the handshake, and were never
  meant to drive the recycle path.
Before: `enqueueNotification()` did read-modify-write on the shared
JSON queue at ~/.deeplake/notifications-queue.json with no locking, so
two hook processes producing notifications concurrently would both
read the same starting state, push their own entry, and the second
`rename(2)` would clobber the first writer's addition. That makes the
new persistent queue lossy under any concurrent hook/CLI activity —
exactly the workload it was added to handle.

Fix: wrap the read-modify-write in `withQueueLock()` — an O_EXCL
`.lock` file with bounded retry-with-backoff and a stale-lock reclaim
threshold (5s, well above any plausible enqueue duration). Lock is
unlinked on release; if a holder dies, the next caller reclaims after
5s. If the lock can't be acquired after 50 attempts (~6s total),
we proceed unlocked rather than blocking the hook hot path — degrades
to last-writer-wins under pathological contention, which is what the
old code did unconditionally.

Test: 12 parallel subprocesses each call `enqueueNotification` once,
all targeting the same $HOME-rooted queue. Assert the final queue has
exactly 12 entries and every producer index 0..11 appears once. Before
the fix this count would be 1-3 (sequential overwrites); with the lock
it's stably 12 across runs.
Before: `_signalledMissingDeps` in src/embeddings/client.ts guarded the
"first transformers-missing notification" path, but the flag was a
module-level variable — it only survived inside one hook process.
Every subsequent hook (a new node subprocess for each PostToolUse /
UserPromptSubmit) saw the flag reset to false, hit the transformers-
missing path again, and `enqueueNotification` happily appended another
copy onto the persistent queue. Between SessionStart drains the queue
could accumulate dozens of identical `embed-deps-missing` entries.

Fix: move the dedup gate into `enqueueNotification()` itself. Before
appending, check if a queued notification already has the same
`(id, dedupKey)` — if so, return without modifying the queue. This
runs under the same `withQueueLock()` introduced for #4, so the
read/compare/write is atomic across processes.

Why at the queue layer vs. at every caller:
- It's generic: every producer with a dedupKey now gets cross-process
  one-time semantics without each one having to plumb its own
  persistence.
- The drain layer already dedups by (id, dedupKey) against the
  *shown* state in state.ts; the queue guard is the corresponding
  invariant for the *queued* state.
- Producers that genuinely want N entries can vary the dedupKey
  (timestamp, counter), the same convention the drain layer uses.

Test in tests/claude-code/notifications.test.ts: 3 sequential
subprocesses each enqueue the same `embed-deps-missing` with identical
dedupKey; assert the final queue has exactly 1 entry. Sister parallel-
producer test (unique dedupKey per subprocess) still passes — 12
distinct keys → 12 entries.
Before: when `ensurePluginNodeModulesLink()` found the
`<pluginDir>/node_modules` symlink dangling (its target was GC'd by a
marketplace upgrade), it removed the link and returned
`stale-link-removed`. The current capture run then proceeded without
a working node_modules — only the NEXT hook invocation would actually
re-create the link. That defeats the whole self-heal contract: the
session that exposes the breakage should also fix it.

Fix: after `rmSync(link)` on the dangling-link branch, fall through
to `createSymlinkAtomic(target, link)` so the link is repaired in the
same call. The `stale-link-removed` return variant is preserved (with
the original `danglingTarget`) so callers that log self-heal events
still see "a stale link was repaired" — the semantics shift from
"removed, retry later" to "removed and re-created in-place".

Test updated in tests/claude-code/embeddings-self-heal.test.ts: the
existing assertion `expect(existsSync(link)).toBe(false)` after the
first call is now `toBe(true)` plus `readlinkSync(link) ===
sharedNodeModules`, and the redundant second-call assertion block is
removed. Single-pass recovery is the new contract.
Before: `linkAgent` blindly called `symlinkForce(SHARED_NODE_MODULES,
link)` for every detected hivemind install. `symlinkForce` does
`unlinkSync(link)` to clear the way before the new `symlinkSync`,
which fails with EISDIR if `link` is a real directory. The result:
`hivemind embeddings install` threw on the first agent that already
owned its own `node_modules` and aborted before linking the others.

The state already had a name in `linkStateFor()` — `owns-own-node-modules`
— and `status` already surfaced it. The install path just didn't act
on it. Fix is to call `linkStateFor()` first and skip symlink
replacement when the link path is a real directory; everything else
(no link / stale symlink / link to elsewhere) goes through
`symlinkForce` as before.

Two tests in tests/cli/cli-embeddings.test.ts:
- "skips linking when a real node_modules directory already exists at
  the link path" — sets up a real `node_modules/` with a marker file,
  asserts the call does NOT throw, the dir stays a real directory,
  and the marker file is untouched.
- "still replaces a stale symlink at the link path" — confirms the
  normal install path is unaffected: a pre-existing symlink at the
  link path is replaced (still a symlink after the call).
… PID

Before: `killEmbedDaemon()` read the PID from the pidfile and called
`process.kill(pid, "SIGTERM")` unconditionally. If the daemon had
crashed and left the pidfile behind, the OS may have already recycled
that PID to a totally unrelated user process — so `hivemind
embeddings disable` (or `uninstall`) could silently kill the user's
text editor / shell / browser tab. PID identity has no filesystem
anchor on its own.

Fix: gate the SIGTERM on `_isDaemonAliveOnSocket(sockPath)`, a short
synchronous probe that tries to `net.connect` to the UDS path with a
200 ms timeout. UDS paths are filesystem-rooted, so a successful
connect proves SOME process is bound to the daemon's socket — and the
only producer of that socket is the daemon whose pidfile sits next to
it. Anything else (no socket file, ECONNREFUSED, timeout) means the
daemon isn't actually live, the pidfile is stale, and we skip the
kill. The sock/pid file cleanup still runs unconditionally so
subsequent installs aren't blocked.

Test: writes the test runner's OWN process pid into the daemon
pidfile, removes the socket, then calls `killEmbedDaemon`. Without
the fix the runner would receive SIGTERM and the test process would
die mid-run. With the fix the socket-alive probe returns false, the
SIGTERM is skipped, and the test continues — explicitly asserting
both the probe return value AND that the file cleanup happened.

Implementation note: the probe uses `spawnSync("node", ["-e", …])`
rather than an inline `net.connect` so it doesn't pollute the calling
process's event loop with hanging socket handles when the connect
fails. The child times out cleanly and exits.
…fic msgs

Before: `isTransformersMissingError()` classified any string containing
bare `MODULE_NOT_FOUND` as a transformers-installation issue. That
overshoots — Node's `MODULE_NOT_FOUND` fires for ANY unresolvable
import (onnxruntime-node, sharp, missing peer dep, …). Each false
positive triggered the daemon recycle path AND surfaced "run
`hivemind embeddings install`" to the user, which can't fix
non-transformers packaging problems and just hides the real failure.

Fix: only classify as transformers-missing when the error string
contains either:
  - the literal `hivemind embeddings install` (the actionable wrapper
    we throw from `defaultImportTransformers`, which always names
    transformers in the same message), OR
  - a direct reference to `@huggingface/transformers` (Node's
    `Cannot find module '@huggingface/transformers'` /
    `ERR_MODULE_NOT_FOUND: Cannot find package '@huggingface/transformers'`).

Bare `MODULE_NOT_FOUND` without a package name no longer matches. The
existing positive tests pass against both Node import forms. New
negative tests explicitly assert that `MODULE_NOT_FOUND while loading
onnxruntime-node` / `Cannot find module 'sharp'` are NOT classified as
transformers issues — that surface area was previously broken.
CodeRabbit #16/#17: two test files mutate process env (and one creates
a tmp dir) inside `runHook(env)` calls without restoring them. Since
`runHook()` only updates the env keys passed in, a test that sets e.g.
`EMBEDDINGS_DISABLED_FOR_TEST=1` would leak the disabled state into
every later test in the same vitest worker, making the suite order-
dependent. The hermes test additionally leaves a tmp config dir on disk.

Fix in both files:
- Track the original value of each touched env key the first time
  `runHook()` mutates it (so re-mutation in the same test doesn't
  overwrite the snapshot with an intermediate value).
- In `afterEach`, restore every captured key (delete if it was unset
  before, otherwise reassign) and clear the snapshot map for the
  next test.
- For tests/hermes/hermes-capture-hook.test.ts: also track mkdtemp'd
  config dirs in `_tmpDirsToClean` and `rmSync` them in afterEach.

No behavior change in the tests themselves; this is pure isolation
hardening to remove the order-dependence CodeRabbit flagged.
CI was failing after the queue-lock and retry-after-recycle fixes
because the new branches in src/notifications/queue.ts and
src/embeddings/client.ts dropped both files below the per-file
thresholds in vitest.config.ts.

src/notifications/queue.ts:
- Added `_setLockTimingForTesting` / `_resetLockTimingForTesting` so
  tests can shrink LOCK_RETRY_MAX / LOCK_RETRY_BASE_MS / LOCK_STALE_MS
  to milliseconds. The production values (50 attempts × 5ms backoff,
  5s stale window) make exercising the give-up and reclaim branches
  prohibitively slow under real time.
- New tests in tests/claude-code/notifications-queue-lock.test.ts:
  - stale-lock reclaim: utimesSync ages a pre-existing lock past the
    (test-shrunk) stale window; assert enqueue succeeds and the lock
    is gone.
  - give-up after MAX retries: hold a fresh-mtime lock; assert
    enqueue still persists (degraded last-writer-wins) and the lock
    we held is still present (we didn't own it, didn't unlink it).
  - readQueue malformed JSON / wrong shape branches.
  - sameDedupKey: same (id, dedupKey) collapses; differing id keeps
    both; differing dedupKey keeps both.

Coverage on queue.ts:
  stmts 68.85% → 97.14% | branches 44.44% → 75% (target 70)
  funcs 80%    → 100%   | lines    70.37% → 98.33%

src/embeddings/client.ts:
- Added one test that drives the retry-after-recycle path with
  autoSpawn=true: a fake daemon recycles on first hello, then the
  outer wrapper calls trySpawnDaemon → waitForDaemonReady → retries
  embedAttempt. daemonEntry points at a non-existent file so the
  spawn no-ops and waitForDaemonReady exhausts its deadline; assert
  return is null and no embed was sent on the dead connection. This
  covers the previously-untested waitForDaemonReady poll loop and
  the outer retry composition branches.

Coverage on client.ts:
  branches 78.49% → 81.72% (target 80) — clear without bloat.

Also re-committed pi/bundle/wiki-worker.js: esbuild's local-variable
renaming pass (openSync2 → openSync3, etc.) is non-deterministic across
rebuilds and didn't get re-staged in an earlier commit.
Independent reviewer (subagent with no prior context) flagged 4 real
problems on top of CodeRabbit's set. All fall into the same family of
"my fix introduced or left in a different version of a bug I also
fixed elsewhere."

1. (Critical) `recycleDaemon` in src/embeddings/client.ts SIGTERMs
   `hello.pid` (or the pidfile value) without confirming the daemon is
   still bound to the socket. Same TOCTOU PID-reuse risk that the
   earlier killEmbedDaemon fix already gates against. Add an
   `existsSync(this.socketPath)` precondition: socket file gone →
   daemon gone → skip SIGTERM, only clean up file artifacts.

2. (High) `withQueueLock`'s contention backoff was a synchronous
   `while (Date.now() < end) {}` busy-wait. At production defaults
   (50 attempts × 5 ms × triangular growth) the worst case spins the
   event loop for ~6 s, freezing every other timer/IO callback in the
   hook — including the in-flight embed daemon response. Convert
   `withQueueLock` and `enqueueNotification` to async, swap the busy
   loop for `await sleep(delay)`. The sole src caller in
   `handleTransformersMissing` now fires-and-forgets the enqueue with
   a `.catch` so the capture hot path never blocks on a notification
   write. Subprocess test fixtures and same-process callers updated to
   `await enqueueNotification(...)`.

3. (High) `tests/cli/cli-embeddings.test.ts` killEmbedDaemon test
   wrote to the real per-uid `/tmp/hivemind-embed-<uid>.{sock,pid}`
   paths. On a dev box or CI runner that already has an embed daemon
   running for the same uid, the test would clobber its sock file.
   Added an optional `socketDir` parameter to `killEmbedDaemon()`
   (defaults to `/tmp` via protocol's default) and pointed the test
   at a per-test `mkdtemp` dir.

4. (High) `tests/claude-code/notifications-queue-lock.test.ts`'s
   "writeQueue outside-HOME guard" test hedged in a multi-paragraph
   comment and asserted the happy path. The `throw new Error("write
   blocked: …")` branch was uncovered. Extracted the path/home check
   into `_isQueuePathInsideHome(path, home)` and replaced the hedged
   test with 6 cases that exercise the predicate directly: child of
   home, equals home, trailing-slash home, sibling dir, prefix-match
   attack (`/home/userspace` vs `/home/user`), and a path that
   resolves outside a tmp home.

No regressions in the full suite: 2685/2685 (previously 2680, +5 from
the new guard tests, with the buggy hedged test removed). Build clean.
}
}
async dispatch(req) {
if (req.op === "hello") {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai what hello stands for, here, why is hardcoded?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaghni"hello" here is the wire-protocol operation name for a new IPC handshake message, not an arbitrary magic string. It plays the same role as "ping" and "embed" in the daemon's dispatch switch.

Here's what it does end-to-end:

sequenceDiagram
    participant C as EmbedClient
    participant D as EmbedDaemon

    C->>D: { op: "hello", id: "..." }
    D-->>C: { id, daemonPath, pid, protocolVersion }
    Note over C: Does daemonPath match my daemonEntry?
    alt mismatch (stale daemon from old bundle)
        C->>D: SIGTERM
        C->>C: clear sock/pid → spawn fresh daemon
    else match
        C->>C: proceed with embed requests
    end
Loading

Why it exists: After a marketplace plugin auto-upgrade, a new bundle directory is created but the old daemon process can still be running (from the previous bundle path). Without a handshake, the client would silently talk to a stale daemon that may not have the correct @huggingface/transformers path, producing NULL embeddings. The hello op lets the client detect this path mismatch on the very first connection and recycle the poisoned daemon.

Why the string is "hardcoded": This is a compiled/bundled file. In the TypeScript source (src/embeddings/protocol.ts), HelloRequest is declared as a discriminated union literal:

interface HelloRequest { op: "hello"; id: string }
type DaemonRequest = EmbedRequest | PingRequest | HelloRequest;

So "hello" is the typed discriminator value — the same string the client sends as op: "hello" and the daemon checks with req.op === "hello". It's as "hardcoded" as "ping" or "embed" are, which is intentional and correct for a simple line-delimited JSON IPC protocol.

Comment thread src/cli/embeddings.ts
// `owns-own-node-modules`. (Symlinks at this path, including stale ones
// pointing at a defunct shared-deps target, still go through
// `symlinkForce` so we replace them with the canonical target.)
const state = linkStateFor(install);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai, why is there even a problem of overriding node_modules?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaghni Great question! The problem comes down to how symlinkForce works internally: it calls unlinkSync to remove the existing path before creating the new symlink. unlinkSync handles files and symlinks just fine, but throws EISDIR when the target is a real directory.

So the danger is:

  1. An agent (especially one installed via a marketplace) may have a real node_modules/ directory at <pluginDir>/node_modules rather than a symlink.
  2. If symlinkForce is called on such a path, unlinkSync throws EISDIR and the entire hivemind embeddings install command crashes mid-loop.
  3. The result is a partially linked state — some agents got their symlinks, others didn't — which is worse than a clean skip.

The fix introduced here checks linkStateFor(install) first:

  • If kind === "owns-own-node-modules" → skip with a warning (it's a real dir; unsafe to unlink)
  • For everything else (no-node-modules, linked-elsewhere, linked-to-shared) → symlinkForce is safe because those are either absent or already a symlink, which unlinkSync can remove without error.

In short: it's not that overriding is philosophically wrong — it's that the underlying unlinkSync operation is incapable of removing a directory, so attempting it would crash the install loop halfway through.

Comment thread src/embeddings/daemon.ts
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai what daemon.ts is for and where is it used? briefily

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaghni, here's a brief overview:


What daemon.ts is for

src/embeddings/daemon.ts implements EmbedDaemon — a long-lived background process that:

  • Loads the Nomic embedding model into RAM once and keeps it there.
  • Listens on a per-user Unix socket (/tmp/...) for IPC requests from the rest of the plugin.
  • Handles three operations over that socket:
    • ping — readiness/health check.
    • embed — vectorize a text string.
    • hello (added in this PR) — handshake that returns the daemon's file path, PID, and protocol version so the client can detect stale/mismatched daemons after upgrades.
  • Shuts itself down after an idle timeout (~200 MB RAM savings when unused).

Where it is used

Consumer Role
src/embeddings/client.ts (EmbedClient) Spawns the daemon on demand, connects to its socket, and sends hello/embed/ping requests
src/hooks/session-start-setup.ts Warms up the daemon at session start
src/hooks/{claude-code,codex,cursor,hermes}/capture.ts Embeds captured messages via EmbedClient
*/bundle/wiki-worker.js Embeds session summaries before uploading
*/bundle/shell/deeplake-shell.js Powers semantic search in the shell
src/cli/embeddings.ts killEmbedDaemon() SIGTERMs the daemon on disable/uninstall

The bundled copies (claude-code/bundle/embeddings/embed-daemon.js, codex/..., cursor/..., hermes/..., embeddings/embed-daemon.js) are the shipped artifacts — each platform's plugin bundle includes its own copy of this file so it can be spawned as a subprocess.

Comment thread src/embeddings/nomic.ts
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai same here, brief usage and description of nomic.ts

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaghni, here's a brief overview of src/embeddings/nomic.ts:


src/embeddings/nomic.ts — Overview

Purpose: A lightweight daemon-only wrapper around @huggingface/transformers that drives Nomic-embed text embeddings. It is intentionally isolated so the heavyweight transformer dependency is never pulled into hook bundles.


Transformers Resolution (new in this PR)

Function Role
_importFromCanonicalSharedDeps(sharedDir?) Resolves @huggingface/transformers from the canonical shared-deps directory (~/.hivemind/embed-deps) via createRequire. Handles both ESM (.mjs) and CJS (.cjs) export shapes.
_importFromBareSpecifier() Fallback — bare import("@huggingface/transformers") for dev trees or collocated installs.
_normalizeTransformersModule(mod) Normalizes the two possible module shapes: CJS wraps exports under .default; ESM exposes them at the top level.
defaultImportTransformers(canonical?, bare?) Orchestrates the two-step resolution: tries canonical first, falls back to bare, throws an actionable error (hivemind embeddings install) if both fail.

The indirection through let _importTransformers allows tests to inject a mock without touching real disk paths.


NomicEmbedder Class

Manages the nomic-embed-text model pipeline lifecycle:

  • load() — lazily initializes the feature-extraction pipeline (idempotent, deduplicates concurrent calls via a shared Promise).
  • embed(text, kind) — embeds a single string with a query/document prefix (Nomic's asymmetric retrieval convention), then truncates + re-normalizes to dims dimensions (Matryoshka).
  • embedBatch(texts, kind) — batches multiple strings through the pipeline in one call, then splits and truncates each result.
  • truncate(vec) — slices to dims and re-normalizes the resulting sub-vector to unit length (Matryoshka re-norm; skips if norm is zero).

Test Seam

_setTransformersImporterForTesting / _resetTransformersImporterForTesting replace the module-level _importTransformers pointer so unit tests can bypass real disk resolution entirely without relying on vi.mock ordering.

Copy link
Copy Markdown
Collaborator

@kaghni kaghni May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai how protocol works for skillify?

@kaghni
Copy link
Copy Markdown
Collaborator

kaghni commented May 18, 2026

@efenocchi, Pi and OpenClaw have no changes, but I'm not sure they should.

@efenocchi
Copy link
Copy Markdown
Collaborator Author

@kaghni Confirmed correct on both — verified the surface:

Pi: pi/extension-source/hivemind.ts ships as raw TS with a 30-LOC tryEmbedOverSocket that fails silent (no EmbeddingsClient, no enqueueNotification, no hello-handshake, no recycle) — none of the surfaces this PR touches exist client-side. The one pi bundle rebuilt (pi/bundle/wiki-worker.js) is the shared wiki-worker, unrelated. Pi does pick up the canonical daemon fix (embeddings/embed-daemon.js: +55/-4) on the next hivemind embeddings install.

OpenClaw: MCP-mode plugin with hivemind_* tool contracts only. It currently doesn't produce embeddings at all — every write explicitly NULLs the vector columns:

  • upsertRowSql: summary_embedding = NULL on UPDATE, NULL literal on INSERT
  • session capture: message_embedding omitted from the column list → defaults NULL

It only consumes embeddings (semantic search takes queryEmbedding as an input parameter and filters WHERE ARRAY_LENGTH(summary_embedding, 1) > 0, i.e. rows written by other agents). So there's no surface for this PR to touch.

Follow-up PR — tracked separately, not in this one. We're going to:

  1. Fix a real bug in pi: the comment at pi/extension-source/hivemind.ts:166-175 promises auto-spawn-on-miss, but the implementation only calls connect() and settles `null` on error. So pi silently degrades to NULL embeddings any time it's the first agent to talk to the daemon on a given box. That's a regression vector identical to what this PR is closing — just on a different code path.
  2. Make OpenClaw an embedding producer: ~60 LOC of `tryEmbedOverSocket` + spawn-on-miss, wired into the two write sites. Talks to the same `/tmp/hivemind-embed-.sock` daemon the other agents already spawn.

Why a separate PR rather than tacking it onto this one:

  • Scope: this PR is "fix silent NULL embeddings after marketplace upgrades" — restoring producers that should have worked. Adding a new producer (openclaw) is a feature, not a fix; bundling them obscures the bisect/revert story if either side regresses.
  • Opt-in invariant: this PR established (`src/user-config.ts`) that embeddings are explicit opt-in — env unset → `enabled: false`, no 600MB download by surprise. OpenClaw becoming a producer needs to respect that: it can only embed if `~/.hivemind/embed-deps/embed-daemon.js` is already present (deposited by `hivemind embeddings install` via some other agent). That decision and its NULL-fallback behavior deserve their own review surface, not a buried sub-commit.
  • Edge case coverage: spawn-on-miss has 11 distinct states I want covered by tests (binary-missing, stale-socket, pidfile-points-to-dead-PID, two-process spawn race, pidfile-alive-but-socket-gone, etc.). Those tests double the test footprint of this PR — cleaner to land them alongside the producer code they exercise.

I'll open the follow-up against `main` once #168 is merged.

…sformers-resolution

# Conflicts:
#	bundle/cli.js
#	claude-code/bundle/capture.js
#	claude-code/bundle/embeddings/embed-daemon.js
#	codex/bundle/embeddings/embed-daemon.js
#	codex/bundle/stop.js
#	cursor/bundle/capture.js
#	cursor/bundle/embeddings/embed-daemon.js
#	embeddings/embed-daemon.js
#	hermes/bundle/capture.js
#	hermes/bundle/embeddings/embed-daemon.js
Copy link
Copy Markdown
Collaborator

@kaghni kaghni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@

…sformers-resolution

# Conflicts:
#	bundle/cli.js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants