fix(embeddings): silent NULL embeddings after marketplace upgrades#168
Conversation
The embed daemon was failing silently after every marketplace plugin
upgrade because Node's bundle-relative module resolution could not find
`@huggingface/transformers`. The package is installed once at
`~/.hivemind/embed-deps/` by `hivemind embeddings install`, but new
versioned plugin cache dirs land without a `node_modules` symlink, so
the daemon's bare `import("@huggingface/transformers")` walks up to a
location that does not have the package. Each embed request then
returned `Cannot find package`, the client coerced it to a null
embedding, and `sessions.message_embedding` columns were written as
NULL with no surface error.
Rework `NomicEmbedder.load()` to resolve the package explicitly via
`createRequire(pathToFileURL("~/.hivemind/embed-deps/")).resolve(...)`
followed by `import(pathToFileURL(absMain))`. This bypasses Node's
upward walk entirely, so the daemon resolves transformers correctly
regardless of which bundle path it was spawned from. The bare-specifier
import remains as a fallback for dev-tree usage where the package is
colocated. If both fail, the thrown error message mentions
`hivemind embeddings install` so the failure is actionable in logs.
Tests use DI to inject the importer so they run identically on
machines that already have `~/.hivemind/embed-deps/` populated (which
would otherwise shadow `vi.mock("@huggingface/transformers")`).
Replace the implicit "embeddings on by default" + `HIVEMIND_EMBEDDINGS`
env-var override with an explicit, persistent opt-in stored on disk.
The new contract:
- `~/.deeplake/config.json` → `embeddings.enabled: boolean` is the
sole source of truth, shared across all agents (claude-code, codex,
cursor, hermes, pi) because they all read the same `~/.deeplake/`.
- Embeddings run only when `enabled === true`.
- The legacy `HIVEMIND_EMBEDDINGS` env var is read EXACTLY ONCE — on
the first run that has no `embeddings.enabled` key — to seed the
persistent value. Migration rule: env=`false` or unset writes
`enabled: false`; any truthy value writes `enabled: true`. After
the seed is written, the env var is never consulted again.
New module `src/user-config.ts` provides `readUserConfig`,
`writeUserConfig` (atomic write + deep merge), `getEmbeddingsEnabled`
(with one-shot migration), and `setEmbeddingsEnabled`. Path is
overridable via `HIVEMIND_CONFIG_PATH` for tests.
`src/embeddings/disable.ts` no longer reads the env var directly.
`EmbeddingsStatus` renames the env-disabled variant to `user-disabled`,
which now reflects both legacy env-disabled and the new config-disabled
cases (both fold into the same user-driven opt-out). The transformers
probe is reordered to match the daemon's import resolution order
(canonical shared-deps first, bundle walk fallback), eliminating the
prior probe/use asymmetry where the probe could succeed and the daemon
still throw MODULE_NOT_FOUND.
A vitest setupFile (`tests/test-setup.ts`) pins `HIVEMIND_CONFIG_PATH`
to a per-process tmp dir so tests never mutate the developer's real
`~/.deeplake/config.json`, and defaults the test environment to
`HIVEMIND_EMBEDDINGS=true` so suites that don't explicitly exercise
the disabled path keep running with embeddings on.
Tests that previously set `HIVEMIND_EMBEDDINGS=false` to exercise the
disabled path now write a throwaway config file with
`embeddings.enabled: false` and point `HIVEMIND_CONFIG_PATH` at it.
Previously `hivemind embeddings install` ≡ `enable` (both did the heavy
deps + symlink work) and `disable` ≡ `uninstall` (both removed symlinks).
With the new persistent-config contract that mapping is wrong:
opting in / out of embeddings should be a lightweight config flip, while
managing the on-disk install is a separate, heavier operation.
New surface (all reflected in `--help` and every agent's SessionStart
injection):
hivemind embeddings install (heavy) npm-installs @huggingface/
transformers into ~/.hivemind/embed-deps,
symlinks every detected agent plugin to
it, and sets embeddings.enabled=true.
hivemind embeddings enable (light) sets embeddings.enabled=true.
Warns if shared deps missing.
hivemind embeddings disable (light) sets embeddings.enabled=false
and SIGTERMs the running daemon +
clears its sock/pid files so the
change takes effect immediately,
instead of waiting 10 min for idle-out.
Shared deps stay on disk.
hivemind embeddings uninstall (heavy) removes every agent's symlink
[--prune] into the shared deps, optionally
prunes the shared dir itself, sets
enabled=false, and SIGTERMs the daemon.
hivemind embeddings status Extended to show the config flag
state alongside the deps + per-agent
state, and a one-line action hint
when the two disagree.
The CLI dispatcher matches these subcommands exactly (no more
install/enable aliasing), and `--with-embeddings` runs the heavy install
path. Per-agent SessionStart blocks for claude-code, codex, cursor, and
hermes now advertise all five subcommands so the model can suggest the
right one without guessing (per agents-deployment-session-start-injection
skill).
A best-effort `killEmbedDaemon()` helper reads the standard
`/tmp/hivemind-embed-${uid}.pid`, SIGTERMs the process, and unlinks
both the pidfile and socket. Tolerant of every missing-file combination.
…gnal
The embed daemon socket is per-UID, not per-plugin-version. After a
marketplace plugin upgrade replaces the bundle, the older daemon
process keeps its socket alive (up to 10 minutes of idle-out), so every
new session on every newer plugin version connects to the same stuck
daemon. When that stuck daemon can't resolve transformers from its own
(now-orphaned) bundle path, it returns MODULE_NOT_FOUND on every embed
call, and the rest of the session writes NULL into the embedding column
with no surface error. We've now seen this in production: the local
log shows ~30 minutes of `embed err: Cannot find package` lines on a
freshly-upgraded plugin.
Three additions:
1. **Hello handshake** (`protocol.ts` + `daemon.ts` + `client.ts`).
First connect per `EmbedClient` instance sends `{ op: "hello" }`;
the daemon answers with its own `daemonPath` (= `process.argv[1]`)
and `pid`. If the running daemon's path doesn't match the client's
configured `daemonEntry`, the client SIGTERMs the daemon and clears
its socket + pidfile so the next call spawns a fresh daemon from
the current bundle. Verified at most once per EmbedClient.
2. **Stuck-daemon recycle on transformers error** (`client.ts`).
Embed responses matching `isTransformersMissingError` (the wrapper
we throw from `defaultImportTransformers`, plus Node's standard
MODULE_NOT_FOUND form) trigger the same recycle. Process-local
guard so only the first failing call kills + cleans up.
3. **Visible one-time notification** (`client.ts`).
On the same transformers-missing trigger, enqueue a `warn`-severity
notification (`id: "embed-deps-missing"`, dedupKey carries the
error detail) so the next SessionStart drain tells the user to run
`hivemind embeddings install`. Suppressed when
`embeddingsStatus() === "user-disabled"` — users who explicitly
opted out via config don't get nagged. Process-local dedup so a
single capture session enqueues at most one notification even if
the embed call fires many times.
Net effect: a poisoned daemon survives at most one failed embed; the
first session after a plugin upgrade recycles it; the user gets a
clear, actionable notification instead of silent NULLs in the
sessions table.
`hivemind embeddings install` symlinks `<pluginDir>/node_modules` to `~/.hivemind/embed-deps/node_modules` so Node's standard module resolution finds @huggingface/transformers from anywhere inside `<pluginDir>/bundle/…`. That works for the plugin version present at install time — but Claude Code's marketplace auto-upgrades drop new versioned cache dirs (`cache/hivemind/hivemind/0.7.27/`, `0.7.28/`, …) WITHOUT the symlink. The user would have to manually re-run `hivemind embeddings install` after every upgrade — and most won't, so embeddings silently degrade. New helper `src/embeddings/self-heal.ts` runs from each agent's capture hook on every invocation. The first capture under a new plugin version creates the symlink atomically (symlink to a `.tmp` suffix, then rename); subsequent calls are O(1) no-ops once `already-linked` is observed. Conservative behavior — the helper NEVER: - Clobbers an existing real `node_modules` directory. - Overrides a symlink that points elsewhere to a valid target (user installed their own dependency tree). - Acts when the shared-deps `node_modules` doesn't exist (returns `shared-deps-missing`; notification path covers user-facing surface). - Acts when `bundleDir` basename isn't `bundle` (guards against the source-tree path being passed during tests — without this gate, a test importing `src/hooks/capture.ts` would symlink `src/node_modules` to the user's real shared deps). What it DOES heal: - Missing link → create. - Dangling symlink (target deleted out from under it) → remove + next call re-creates. Wired into all four capture hooks (claude-code, codex, cursor, hermes) at top-level after the bundleDir is computed. Gated on `!embeddingsDisabled()` so user-disabled installs don't accumulate symlinks they don't want.
Per the project testing philosophy: source tests prove the helpers
are correct, bundle tests prove the build didn't drop them, re-inline
an old pattern, or otherwise regress on the shipped artifact. A
30-second reviewer guardrail.
For each of claude-code, codex, cursor, hermes:
- `bundle/embeddings/embed-daemon.js` contains the canonical
shared-deps path fragments (".hivemind" + "embed-deps"),
`createRequire` (proving the explicit-path resolver survived
bundling), and the actionable error string "hivemind embeddings
install" (proving the error message users will see in logs is in
the shipped artifact).
- `bundle/capture.js` invokes `ensurePluginNodeModulesLink` (the
self-heal helper), carries the `embed-deps-missing` notification
dedupKey, and still names the `user-disabled` status (proving the
opt-out guard survives bundling).
For the CLI:
- `bundle/cli.js` recognises all five embeddings subcommands
(install/enable/disable/uninstall/status) and references
`~/.deeplake/config.json` so the SessionStart injection text and
the dispatcher agree.
Final step before staging per bundle-rebuild-before-staging skill.
All five source commits in this branch (C1–C5) modify shared
`src/embeddings/` and per-agent capture hooks, so every agent's
`bundle/` artifacts need refreshing:
claude-code/bundle/{capture,embed-daemon,session-start,…}.js
codex/bundle/{capture,embed-daemon,stop,…}.js
cursor/bundle/{capture,embed-daemon,…}.js
hermes/bundle/{capture,embed-daemon,…}.js
pi/bundle/wiki-worker.js
bundle/cli.js
embeddings/embed-daemon.js (standalone daemon)
No source changes — `npm run build` output only. The companion
bundle-scan guards in tests/claude-code/embeddings-bundle-scan.test.ts
pass against these artifacts.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a hello handshake to embeddings protocol, client verification and recycle, user-config–backed enablement with env migration, transformers shared-deps resolution, CLI install/enable/disable/uninstall/status, plugin node_modules self-heal, and broad bundle/help/test updates. ChangesEmbeddings lifecycle and CLI split
Sequence Diagram(s)sequenceDiagram
participant CLI
participant UserConfig
participant Daemon
participant Client
CLI->>UserConfig: set embeddings.enabled = true/false
Client->>Daemon: hello {id}
Daemon-->>Client: {daemonPath, pid, protocolVersion}
Client->>Daemon: embed {text}
Daemon-->>Client: {embedding|error}
Client-->>UserConfig: enqueue notification (deps missing) [when enabled]
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 19 files changed
Generated for commit ea0af71. |
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/cursor/cursor-capture-hook.test.ts (1)
272-289:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing cleanup: temp directory leaks on every test run.
The test creates a temp directory and config file but never removes them. This will accumulate garbage in the system temp dir over repeated test runs.
🧹 Proposed fix to clean up temp resources
it("user-disabled embeddings short-circuit to NULL without invoking EmbedClient", async () => { stdinMock.mockResolvedValue({ conversation_id: "sid-emb-3", hook_event_name: "beforeSubmitPrompt", prompt: "disabled", }); // Point user-config at a throwaway path that says enabled:false. const { writeFileSync, mkdtempSync } = await import("node:fs"); + const { rmSync } = await import("node:fs"); const { tmpdir } = await import("node:os"); const { join } = await import("node:path"); const dir = mkdtempSync(join(tmpdir(), "cursor-cap-disabled-")); const cfgPath = join(dir, "config.json"); writeFileSync(cfgPath, JSON.stringify({ embeddings: { enabled: false } }), "utf-8"); - await runHook({ HIVEMIND_CONFIG_PATH: cfgPath }); + try { + await runHook({ HIVEMIND_CONFIG_PATH: cfgPath }); + } finally { + rmSync(dir, { recursive: true, force: true }); + } const sql = queryMock.mock.calls[0][0] as string; expect(sql).toContain("'::jsonb, NULL,"); expect(sql).toMatch(/, message_embedding,/); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cursor/cursor-capture-hook.test.ts` around lines 272 - 289, The test creates a temp directory (mkdtempSync -> dir) and config file (cfgPath) but never removes them; update the test around runHook({ HIVEMIND_CONFIG_PATH: cfgPath }) in cursor-capture-hook.test.ts to ensure the temp directory is cleaned (use fs.rmSync or fs.promises.rm with recursive/force) in a finally block or after/teardown so the temp dir and config file created by writeFileSync are removed regardless of test outcome; reference mkdtempSync, writeFileSync, cfgPath, dir and runHook when implementing the cleanup.
🧹 Nitpick comments (1)
hermes/bundle/shell/deeplake-shell.js (1)
67811-67823: 💤 Low value
deepMergeperforms only one level of nested merging, not recursive.The function merges only one level deep: nested objects are spread (
{...baseVal, ...patchVal}) rather than recursively merged. For the currentembeddings.enableduse case this is fine, but deeper structures like{a: {b: {c: 1}}}patched with{a: {b: {d: 2}}}would losec.Consider renaming to
shallowMergeNestedor adding a comment clarifying the single-level constraint if deeper configs are anticipated.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hermes/bundle/shell/deeplake-shell.js` around lines 67811 - 67823, The function deepMerge only merges nested objects one level deep (it spreads baseVal and patchVal rather than recursively merging), so rename deepMerge to shallowMergeNested (or equivalent) and add a concise JSDoc/comment on the function explaining it intentionally performs only a single-level nested merge; update any callers referencing deepMerge to use the new name (search for deepMerge in the bundle) so callers and future maintainers understand the single-level constraint.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bundle/cli.js`:
- Around line 3825-3837: The killEmbedDaemon function currently reads a PID from
pidPathFor and sends SIGTERM blindly; instead verify the PID refers to the embed
daemon before signaling: after reading pid (readFileSync8) and before calling
process.kill, check the live process identity (e.g., inspect /proc/<pid>/cmdline
or use a small handshake via socketPathFor to confirm it responds as the embed
daemon) and only signal when that validation succeeds; if validation fails treat
the pidfile/socket as stale and remove pidPath/socketPath rather than killing an
unrelated process.
- Around line 3774-3777: The linkAgent function currently always calls
symlinkForce(join11(install.pluginDir, "node_modules")) which unlinks any
preexisting path and causes EISDIR if an agent already has a real node_modules
directory; before calling symlinkForce, check the fs.lstat/lstatSync on the
target link (the join11(..., "node_modules") path): if it exists and is a real
directory (stats.isDirectory() && !stats.isSymbolicLink()) rename or move it
(e.g., append ".own" or similar) and log that we preserved the agent's own
node_modules, otherwise if it’s a symlink or file remove it and then call
symlinkForce(SHARED_NODE_MODULES, link); update linkAgent to perform this guard
using the same link variable so symlinkForce and SHARED_NODE_MODULES usage
remain unchanged.
In `@claude-code/bundle/capture.js`:
- Around line 1766-1785: The current branch that handles a dangling symlink
(when linkStat.isSymbolicLink() is true) removes the stale symlink with
rmSync(link) and returns without recreating it, leaving self-heal incomplete;
change the catch block inside the dangling-link path so that after successfully
rmSync(link) you immediately recreate the symlink pointing at target using
symlinkSync(target, link[, type]) (choose 'junction' for Windows if needed), and
then return a success kind (e.g., "recreated-link" with link and target) instead
of "stale-link-removed"; update any callers/consumers expecting
"stale-link-removed" to handle the new return kind if necessary.
In `@claude-code/bundle/pre-tool-use.js`:
- Around line 1316-1343: In verifyDaemonOnce, don't set this.helloVerified at
the top; instead only set it after confirming a compatible hello response — move
the assignment to after you've confirmed hello.daemonPath exists and equals
this.daemonEntry (i.e., after the equality check where currently you return on
match), so that failed probes, missing daemonPath, or path mismatches (and
subsequent recycleDaemon calls) do not permanently mark the client as verified;
keep existing behavior for early returns when !this.daemonEntry or on exceptions
by not setting helloVerified in those code paths.
In `@claude-code/bundle/shell/deeplake-shell.js`:
- Around line 68288-68292: resolveEmbedDaemonPath currently builds the daemon
path as join(dirname(fileURLToPath(import.meta.url)), "embeddings",
"embed-daemon.js") which points to bundle/shell/embeddings/embed-daemon.js but
the actual daemon lives in the sibling bundle/embeddings directory; update
resolveEmbedDaemonPath (same function called when spawning EmbedClient) to
include the parent-traversal segment (add ".." into the join parts, mirroring
the pattern used by resolveGrepEmbedDaemonPath) so the join becomes dirname ->
".." -> "embeddings" -> "embed-daemon.js".
In `@codex/bundle/shell/deeplake-shell.js`:
- Around line 68290-68292: resolveEmbedDaemonPath currently constructs the path
"embeddings/embed-daemon.js" relative to the shell bundle directory, which
resolves to codex/bundle/shell/embeddings/embed-daemon.js but the actual bundled
file is at codex/bundle/embeddings/embed-daemon.js; update the path construction
in resolveEmbedDaemonPath so it points to the parent
"embeddings/embed-daemon.js" (i.e., join the dirname of import.meta.url with
".." and "embeddings" then "embed-daemon.js") so daemonEntry references the
correct bundled file.
In `@codex/bundle/stop.js`:
- Around line 1184-1187: enqueueNotification currently always appends the
notification which allows duplicates across processes; change it to honor
n.dedupKey by reading the existing queue via readQueue(), checking if any queued
item has the same dedupKey (and skip push if found), and only push/writeQueue
when dedupKey is absent or unique; reference the enqueueNotification function
and use readQueue/writeQueue and the notification object's dedupKey field to
implement the guard so duplicate "embed-deps-missing" entries are not persisted.
In `@cursor/bundle/shell/deeplake-shell.js`:
- Around line 67910-67918: The embed() flow can send the request on a stale
socket because verifyDaemonOnce() may recycle the daemon; fix by making
verifyDaemonOnce() indicate when it recycled (e.g., return a boolean or the
fresh socket) and then, inside embed(), re-acquire the current socket before
calling sendAndWait() so the request is sent to the new daemon; update embed()
(and the similar block around the 67942-67968 range) to check the return from
verifyDaemonOnce(), refresh/get the socket (the same socket variable passed to
sendAndWait) when a recycle occurred, and only then construct/send the req via
this.sendAndWait().
- Around line 68182-68184: The current isTransformersMissingError function is
too broad because it matches any "MODULE_NOT_FOUND"; change it to only return
true when the missing-module error specifically references transformers-related
packages. Update isTransformersMissingError to check err.code ===
'MODULE_NOT_FOUND' (or the error string) AND test err.message/err for package
names like `@huggingface/transformers`, transformers, hivemind, or accelerate
(e.g. /(`@huggingface`\/transformers|transformers|hivemind|accelerate)/i) so
unrelated missing-module errors are not misclassified.
- Around line 68290-68292: The resolveEmbedDaemonPath function currently builds
a path to "embeddings/embed-daemon.js" relative to the shell bundle directory,
but the emitted daemon actually lives in the parent bundle's embeddings folder;
update resolveEmbedDaemonPath (and the join11/dirname5/fileURLToPath usage) to
ascend one directory before "embeddings" (e.g.
join11(dirname5(fileURLToPath(import.meta.url)), "..", "embeddings",
"embed-daemon.js")) so the resolved path points to
bundle/embeddings/embed-daemon.js instead of shell/embeddings/embed-daemon.js.
In `@hermes/bundle/capture.js`:
- Around line 614-618: enqueueNotification currently always appends
notifications to the persisted queue so cross-process runs re-enqueue “one-time”
warnings; change enqueueNotification to read the queued items (via readQueue),
check for an existing notification with the same unique id/type (e.g.
"embed-deps-missing") or a oneTime flag before pushing, and only push if not
already present, then writeQueue; use the same identifying key the runtime uses
for suppression (the embed-deps-missing id or a new oneTime property) so
_signalledMissingDeps in-memory suppression remains complementary to the
persisted dedupe.
- Around line 1099-1118: The stale-symlink branch currently removes the dangling
link and returns { kind: "stale-link-removed" }, which prevents the caller from
getting a usable node_modules link; instead, after successfully rmSync(link) in
the catch block you should not return immediately but fall through to the same
symlink-creation path used when the link does not exist. Concretely: in the
block under if (linkStat.isSymbolicLink()) where you catch a failed statSync,
remove the early return ({ kind: "stale-link-removed" }); after rmSync(link)
continue execution so the subsequent code that creates the symlink (the same
logic that runs when no link exists) runs and returns the successful "linked"
(or existing success) result; keep use of readlinkSync, statSync, rmSync and the
existing result kinds consistent.
In `@hermes/bundle/wiki-worker.js`:
- Around line 650-651: The current isTransformersMissingError detector is too
broad because it matches any "MODULE_NOT_FOUND"; update
isTransformersMissingError to only treat errors as transformer-missing when the
message explicitly references transformer-related identifiers (for example
'@huggingface/transformers', the word 'transformers' near the MODULE_NOT_FOUND
text, or the specific install hint 'hivemind embeddings install'), rather than
matching bare MODULE_NOT_FOUND; locate the isTransformersMissingError function
and tighten the regex/logic to require transformer-specific context (or prefer
checking for a transformer-specific sentinel error code returned by the daemon
if available) so unrelated packaging/runtime failures are not misclassified.
In `@src/embeddings/client.ts`:
- Around line 128-214: Add focused unit tests for EmbedClient to hit the
conditional branches in verifyDaemonOnce, handleTransformersMissing, and
recycleDaemon: mock sendAndWait to return a HelloResponse with no daemonPath to
assert no recycle; mock sendAndWait to return a mismatched daemonPath and assert
recycleDaemon called once and _recycledStuckDaemon becomes true (and calling
verifyDaemonOnce again does not re-trigger recycle); for
handleTransformersMissing stub embeddingsStatus to return "user-disabled" and
verify enqueueNotification is not called, and stub it to return enabled and
verify enqueueNotification is called once and _signalledMissingDeps is set; for
recycleDaemon mock readFileSync to supply a pid, spy process.kill and unlinkSync
to verify pid kill + socket/pid unlink path and also test the null reportedPid
branch where pidfile is missing. Reset module-level flags (_recycledStuckDaemon,
_signalledMissingDeps) between tests and use spies/mocks for sendAndWait,
recycleDaemon, enqueueNotification, embeddingsStatus, readFileSync, unlinkSync,
and process.kill to assert expected side effects.
In `@tests/claude-code/session-start-setup-hook.test.ts`:
- Around line 230-235: This test sets process.env.EMBEDDINGS_DISABLED_FOR_TEST
but never clears it, causing later tests to inherit the disabled state; after
calling runHook(...) in the "skips warmup..." test, restore or delete
process.env.EMBEDDINGS_DISABLED_FOR_TEST (e.g., delete
process.env.EMBEDDINGS_DISABLED_FOR_TEST or set it back to its original value)
so subsequent calls to runHook() behave deterministically; update the test
around runHook, embedWarmupMock, and debugLogMock assertions to ensure the env
key is cleaned up before the test ends.
In `@tests/hermes/hermes-capture-hook.test.ts`:
- Around line 275-281: Test mutates HIVEMIND_CONFIG_PATH and leaves a temp dir;
wrap the runHook call in a try/finally that saves the original
process.env.HIVEMIND_CONFIG_PATH, sets it to cfgPath, and in finally restores
the original env value (or deletes the key if it was undefined) and removes the
temp directory (use fs.rmSync(dir, { recursive: true, force: true }) or
fs.rmdirSync for compatibility). Locate the code around runHook, cfgPath,
mkdtempSync, writeFileSync and update the test to import/removal functions and
perform cleanup to avoid leaking state between tests.
---
Outside diff comments:
In `@tests/cursor/cursor-capture-hook.test.ts`:
- Around line 272-289: The test creates a temp directory (mkdtempSync -> dir)
and config file (cfgPath) but never removes them; update the test around
runHook({ HIVEMIND_CONFIG_PATH: cfgPath }) in cursor-capture-hook.test.ts to
ensure the temp directory is cleaned (use fs.rmSync or fs.promises.rm with
recursive/force) in a finally block or after/teardown so the temp dir and config
file created by writeFileSync are removed regardless of test outcome; reference
mkdtempSync, writeFileSync, cfgPath, dir and runHook when implementing the
cleanup.
---
Nitpick comments:
In `@hermes/bundle/shell/deeplake-shell.js`:
- Around line 67811-67823: The function deepMerge only merges nested objects one
level deep (it spreads baseVal and patchVal rather than recursively merging), so
rename deepMerge to shallowMergeNested (or equivalent) and add a concise
JSDoc/comment on the function explaining it intentionally performs only a
single-level nested merge; update any callers referencing deepMerge to use the
new name (search for deepMerge in the bundle) so callers and future maintainers
understand the single-level constraint.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c577da79-afa6-4d82-9398-f8195e33ee25
📒 Files selected for processing (61)
bundle/cli.jsclaude-code/bundle/capture.jsclaude-code/bundle/embeddings/embed-daemon.jsclaude-code/bundle/pre-tool-use.jsclaude-code/bundle/session-start-setup.jsclaude-code/bundle/session-start.jsclaude-code/bundle/shell/deeplake-shell.jsclaude-code/bundle/wiki-worker.jscodex/bundle/capture.jscodex/bundle/embeddings/embed-daemon.jscodex/bundle/pre-tool-use.jscodex/bundle/session-start.jscodex/bundle/shell/deeplake-shell.jscodex/bundle/stop.jscodex/bundle/wiki-worker.jscursor/bundle/capture.jscursor/bundle/embeddings/embed-daemon.jscursor/bundle/pre-tool-use.jscursor/bundle/session-start.jscursor/bundle/shell/deeplake-shell.jscursor/bundle/wiki-worker.jsembeddings/embed-daemon.jshermes/bundle/capture.jshermes/bundle/embeddings/embed-daemon.jshermes/bundle/pre-tool-use.jshermes/bundle/session-start.jshermes/bundle/shell/deeplake-shell.jshermes/bundle/wiki-worker.jspi/bundle/wiki-worker.jssrc/cli/embeddings.tssrc/cli/index.tssrc/embeddings/client.tssrc/embeddings/daemon.tssrc/embeddings/disable.tssrc/embeddings/nomic.tssrc/embeddings/protocol.tssrc/embeddings/self-heal.tssrc/hooks/capture.tssrc/hooks/codex/capture.tssrc/hooks/codex/session-start.tssrc/hooks/cursor/capture.tssrc/hooks/cursor/session-start.tssrc/hooks/hermes/capture.tssrc/hooks/hermes/session-start.tssrc/hooks/session-start-setup.tssrc/hooks/session-start.tssrc/user-config.tstests/claude-code/embeddings-bundle-scan.test.tstests/claude-code/embeddings-client.test.tstests/claude-code/embeddings-disable.test.tstests/claude-code/embeddings-nomic.test.tstests/claude-code/embeddings-self-heal.test.tstests/claude-code/session-start-setup-hook.test.tstests/claude-code/user-config.test.tstests/claude-code/wiki-worker-plugin-version.test.tstests/cli/cli-embeddings.test.tstests/cli/cli-index.test.tstests/cursor/cursor-capture-hook.test.tstests/hermes/hermes-capture-hook.test.tstests/test-setup.tsvitest.config.ts
Two issues surfaced during real e2e verification against the
test_plugin sandbox table.
1) **CJS default-export unwrap in nomic.ts.**
`createRequire(base).resolve("@huggingface/transformers")` honors the
package's `"require"` conditional and returns the path to the CJS
bundle (`./dist/transformers.node.cjs`). A subsequent dynamic
`import(pathToFileURL(absMain).href)` wraps the CJS module as
`{ default: <exports>, __esModule: true }`, so the daemon's
`mod.env.allowLocalModels = false` line threw
`Cannot set properties of undefined`. Added a
`normalizeTransformersModule` helper that returns `mod.default`
when it carries `pipeline`, else returns the bare module — works
for both the CJS-resolved-by-require path and the ESM-resolved-by-
import path (dev tree).
2) **Recycle on `unknown op` from hello.**
A pre-handshake daemon (i.e. anything before this PR lands) answers
`{ op: "hello" }` with `{ id, error: "unknown op" }` and no
`daemonPath`. The previous check skipped the mismatch path in that
case ("no daemonPath; skipping mismatch check") — meaning a stuck
older daemon would keep poisoning sessions forever. Now treat a
missing `daemonPath` the same as a path mismatch: the running
daemon doesn't speak the current protocol, so it can't be trusted
and gets recycled.
End-to-end verification on the test_plugin org's `sessions_test`
table: with both fixes in place, a real capture hook run produced a
row with `len=768` for `message_embedding` — the first non-NULL
embedding written to that table since the regression was filed.
…gent thrash)
The hello-handshake recycle was too aggressive. The original logic
fired SIGTERM whenever the running daemon's `daemonPath` differed
from the client's expected `daemonEntry`. On a single-agent machine
that catches the legitimate "marketplace upgrade replaced the
bundle, old daemon still running with old code" case. But on
multi-agent machines (a Hivemind user running claude-code + codex,
or anyone using pi) it causes endless thrash:
- claude-code spawns daemon at <cc-bundle>/embed-daemon.js
- codex fires capture, connects, hello returns cc's path,
codex expects codex's path → MISMATCH → recycle the working
daemon
- codex spawns its own daemon at <cod-bundle>/embed-daemon.js
- claude-code's next capture → mismatch → recycle codex's
- ...forever
Differentiate "stale (GC'd) bundle" from "different but
functionally-equivalent bundle" via filesystem check:
- `!hello.daemonPath` → older daemon, no handshake support → recycle
- `daemonPath !== entry`
AND `!existsSync(daemonPath)` → orphaned bundle (GC'd) → recycle
AND `existsSync(daemonPath)` → multi-agent share → KEEP DAEMON
The marketplace-upgrade case still works because Claude Code's
plugin-cache-gc eventually prunes old versioned dirs. Until then,
the embed-error trigger (still in place) catches a stuck daemon
whose code is buggy regardless of path.
Verified live: spawned claude-code's daemon, then connected as
codex from a separate client. Codex got its embedding via
claude-code's daemon (vec length 768) without triggering recycle.
Socket and daemon survived intact.
Pi is also covered without changes — it passes no `daemonEntry`
(uses the canonical shared daemon at ~/.hivemind/embed-deps/
embed-daemon.js). When pi runs alongside any agent, its expected
entry differs from the running agent's bundle path, but both files
exist → no recycle, pi happily reuses whatever's warm.
Openclaw is out of scope — it doesn't embed locally (uses MCP
contracts that delegate to the cloud).
Brings in #129 (skillify mine-local) and v0.7.27 plumbing. Conflict surface was the SessionStart injection blocks where both branches touched the same per-agent command-list strings. Resolution: - src/hooks/codex/session-start.ts → take origin/main's version. Codex's hook stdout is rendered user-visible by its harness (see AGENT_CHANNELS.md → Codex and tests/claude-code/skillify-session-start-injection.test.ts), so the verbose context+command block must NOT live there. The embeddings commands stay discoverable to codex via bash + the auto-loaded deeplake-memory skill. - src/hooks/cursor/session-start.ts, src/hooks/hermes/session-start.ts → switch the inline SKILLS block to origin/main's ${renderSkillifyCommands()} helper and append the existing Embeddings block right after, so both surfaces show up in the injected context without duplicating the skillify list. - bundle/cli.js, codex/bundle/session-start.js, cursor/bundle/session-start.js, hermes/bundle/session-start.js → resolved by `npm run build` rather than hand-merging generated output. Verification: full vitest run is 2646/2646 green across 130 test files, including the new skillify drift/injection guards from #129.
CI was failing on the `src/embeddings/nomic.ts` 90% coverage gate: the three helpers introduced for the canonical-shared-deps resolver (`importFromCanonicalSharedDeps`, `importFromBareSpecifier`, `normalizeTransformersModule`) were only exercised indirectly through `defaultImportTransformers(canonical, bare)` with stubbed DI, leaving lines 38-62 uncovered (lines 82.53%, functions 68.75%, statements 80.51%, branches 75% — all below the 90% per-file threshold). Changes: - Export the three helpers with `_` prefix so tests can drive them directly without going through `defaultImportTransformers`. - Replace the two `() => defaultImportTransformers()` arrow wrappers (initial value of `_importTransformers` and the reset target) with a direct reference to `defaultImportTransformers` — both call sites pass zero args, the function has all-defaulted params, and the arrow was just dead surface area that v8 counted as an extra function. Tests added in tests/claude-code/embeddings-nomic.test.ts: - `_normalizeTransformersModule` × 4: CJS-wrapped under `.default`, ESM-shape at top level, `.default`-without-pipeline passthrough, `.default`-falsy passthrough. - `_importFromBareSpecifier` × 1: returns the vi.mocked module after normalization (also forces `default: undefined` in the mock factory so vitest's auto-mock proxy doesn't trip the normalizer's `.default` probe). - `_importFromCanonicalSharedDeps` × 2: real on-disk fixture with a minimal `@huggingface/transformers` package under `<tmpDir>/node_modules/`, plus the missing-package error path. - `defaultImportTransformers` × 1: non-Error rejection (string/object) flows through the String() fallback in the wrapped error message. Result on `src/embeddings/nomic.ts`: stmts 80.51% → 97.29% | branches 75% → 94.59% funcs 68.75% → 100% | lines 82.53% → 100% All four metrics now clear the 90% gate. Bundles regenerated (transformers resolver lives inside the embed daemon, so each `*/bundle/embeddings/embed-daemon.js` artifact picks up the renamed-to-`_`-prefix helpers).
…ephew The shell-bundle daemon resolver in src/shell/deeplake-fs.ts built `dirname(import.meta.url) + "embeddings/embed-daemon.js"`. The shell gets bundled to `<agent>/bundle/shell/deeplake-shell.js`, so the computed daemonEntry pointed at the non-existent path `<agent>/bundle/shell/embeddings/embed-daemon.js` — the real daemon lives one level up at `<agent>/bundle/embeddings/embed-daemon.js`. The capture/wiki-worker hooks always got this right because their source files live at `src/hooks/` and bundle to `<agent>/bundle/` (same level as `embeddings/`). Only the shell entry point was off, which silently broke the pre-tool-use shell embed path on every agent. `src/shell/grep-interceptor.ts` already had the correct `..` form, so this was a one-off oversight rather than a systemic resolver bug. Fix: add the missing `..` so the shell resolver walks up to `bundle/` before descending into `embeddings/`. Regression guard in tests/claude-code/embeddings-bundle-scan.test.ts: for each agent's shipped shell bundle, assert (a) the bundle exists, (b) the daemon file actually exists at the parent-of-shell location, and (c) the literal `".."` survived bundling immediately before `"embeddings"` in the path-join sequence — without it the bundle reads as `"embeddings", "embed-daemon.js"` (the broken shape).
Before: `verifyDaemonOnce()` set `this.helloVerified = true` *before*
awaiting the probe response. A transient probe failure (socket dies
mid-RTT, daemon hiccup) on the first connect therefore permanently
disabled verification for every subsequent embed call on the same
EmbedClient, exactly the safeguard the handshake exists to provide.
The original comment claimed this was intentional ("mark verified
even on mismatch so we don't re-issue the hello against the next,
fresh connection"). That reasoning only holds for the recycle path,
where we've actively chosen to spawn a fresh daemon from
`this.daemonEntry` — so verifying the newcomer is redundant. It does
NOT hold for a probe failure, which is by definition inconclusive.
Fix: only set `helloVerified = true` once we've observed a compatible
response. All other branches — probe catch, recycle, dedup-skip —
leave the flag false so the next reconnect re-runs verification (one
extra hello round-trip in the recycle path, which is a no-op against
a daemon we just spawned).
Test in tests/claude-code/embeddings-client.test.ts simulates a
genuine transient probe failure (server destroys the socket before
responding to the first hello) and asserts the second embed call
re-issues the probe — previously it would silently skip and proceed
on an unverified socket.
Before: `embed()` called `verifyDaemonOnce(sock)` and then proceeded to
send the embed request on the SAME socket, even when the verify step
had just SIGTERMed the daemon and removed its sock/pid files. The
embed silently dropped onto a dead connection — the very session that
caused the recycle saw a NULL `message_embedding` even though the
recycle freshly fixed the underlying state. That defeats the whole
"first call recovers" promise of the recycle path.
Fix:
- `verifyDaemonOnce()` now returns a boolean: `true` if it killed the
daemon mid-call ("recycled"), `false` otherwise.
- `embed()` splits into a thin outer wrapper + `embedAttempt()`. When
the attempt sees the recycled signal, the outer wrapper spawns a
fresh daemon (autoSpawn) and retries once, polling for the new sock
via `waitForDaemonReady()`. With autoSpawn off (tests, pi's canonical-
shared-deps fallback) the outer wrapper just returns null — the caller
treats it the same as any other transient miss.
Tests:
- "recycled probe + autoSpawn=false returns null cleanly (no hang on
dead socket)" asserts the dead-socket embed_attempts === 0, i.e. the
embed request is NOT sent post-recycle. Before the fix this counter
would increment.
- Three pre-existing tests that relied on the dev-machine
SHARED_DAEMON_PATH fallback to mask the bug now explicitly opt out
of verification via `daemonEntry: ""` — they exercise transformers-
missing / embed semantics, not the handshake, and were never
meant to drive the recycle path.
Before: `enqueueNotification()` did read-modify-write on the shared JSON queue at ~/.deeplake/notifications-queue.json with no locking, so two hook processes producing notifications concurrently would both read the same starting state, push their own entry, and the second `rename(2)` would clobber the first writer's addition. That makes the new persistent queue lossy under any concurrent hook/CLI activity — exactly the workload it was added to handle. Fix: wrap the read-modify-write in `withQueueLock()` — an O_EXCL `.lock` file with bounded retry-with-backoff and a stale-lock reclaim threshold (5s, well above any plausible enqueue duration). Lock is unlinked on release; if a holder dies, the next caller reclaims after 5s. If the lock can't be acquired after 50 attempts (~6s total), we proceed unlocked rather than blocking the hook hot path — degrades to last-writer-wins under pathological contention, which is what the old code did unconditionally. Test: 12 parallel subprocesses each call `enqueueNotification` once, all targeting the same $HOME-rooted queue. Assert the final queue has exactly 12 entries and every producer index 0..11 appears once. Before the fix this count would be 1-3 (sequential overwrites); with the lock it's stably 12 across runs.
Before: `_signalledMissingDeps` in src/embeddings/client.ts guarded the "first transformers-missing notification" path, but the flag was a module-level variable — it only survived inside one hook process. Every subsequent hook (a new node subprocess for each PostToolUse / UserPromptSubmit) saw the flag reset to false, hit the transformers- missing path again, and `enqueueNotification` happily appended another copy onto the persistent queue. Between SessionStart drains the queue could accumulate dozens of identical `embed-deps-missing` entries. Fix: move the dedup gate into `enqueueNotification()` itself. Before appending, check if a queued notification already has the same `(id, dedupKey)` — if so, return without modifying the queue. This runs under the same `withQueueLock()` introduced for #4, so the read/compare/write is atomic across processes. Why at the queue layer vs. at every caller: - It's generic: every producer with a dedupKey now gets cross-process one-time semantics without each one having to plumb its own persistence. - The drain layer already dedups by (id, dedupKey) against the *shown* state in state.ts; the queue guard is the corresponding invariant for the *queued* state. - Producers that genuinely want N entries can vary the dedupKey (timestamp, counter), the same convention the drain layer uses. Test in tests/claude-code/notifications.test.ts: 3 sequential subprocesses each enqueue the same `embed-deps-missing` with identical dedupKey; assert the final queue has exactly 1 entry. Sister parallel- producer test (unique dedupKey per subprocess) still passes — 12 distinct keys → 12 entries.
Before: when `ensurePluginNodeModulesLink()` found the `<pluginDir>/node_modules` symlink dangling (its target was GC'd by a marketplace upgrade), it removed the link and returned `stale-link-removed`. The current capture run then proceeded without a working node_modules — only the NEXT hook invocation would actually re-create the link. That defeats the whole self-heal contract: the session that exposes the breakage should also fix it. Fix: after `rmSync(link)` on the dangling-link branch, fall through to `createSymlinkAtomic(target, link)` so the link is repaired in the same call. The `stale-link-removed` return variant is preserved (with the original `danglingTarget`) so callers that log self-heal events still see "a stale link was repaired" — the semantics shift from "removed, retry later" to "removed and re-created in-place". Test updated in tests/claude-code/embeddings-self-heal.test.ts: the existing assertion `expect(existsSync(link)).toBe(false)` after the first call is now `toBe(true)` plus `readlinkSync(link) === sharedNodeModules`, and the redundant second-call assertion block is removed. Single-pass recovery is the new contract.
Before: `linkAgent` blindly called `symlinkForce(SHARED_NODE_MODULES, link)` for every detected hivemind install. `symlinkForce` does `unlinkSync(link)` to clear the way before the new `symlinkSync`, which fails with EISDIR if `link` is a real directory. The result: `hivemind embeddings install` threw on the first agent that already owned its own `node_modules` and aborted before linking the others. The state already had a name in `linkStateFor()` — `owns-own-node-modules` — and `status` already surfaced it. The install path just didn't act on it. Fix is to call `linkStateFor()` first and skip symlink replacement when the link path is a real directory; everything else (no link / stale symlink / link to elsewhere) goes through `symlinkForce` as before. Two tests in tests/cli/cli-embeddings.test.ts: - "skips linking when a real node_modules directory already exists at the link path" — sets up a real `node_modules/` with a marker file, asserts the call does NOT throw, the dir stays a real directory, and the marker file is untouched. - "still replaces a stale symlink at the link path" — confirms the normal install path is unaffected: a pre-existing symlink at the link path is replaced (still a symlink after the call).
… PID
Before: `killEmbedDaemon()` read the PID from the pidfile and called
`process.kill(pid, "SIGTERM")` unconditionally. If the daemon had
crashed and left the pidfile behind, the OS may have already recycled
that PID to a totally unrelated user process — so `hivemind
embeddings disable` (or `uninstall`) could silently kill the user's
text editor / shell / browser tab. PID identity has no filesystem
anchor on its own.
Fix: gate the SIGTERM on `_isDaemonAliveOnSocket(sockPath)`, a short
synchronous probe that tries to `net.connect` to the UDS path with a
200 ms timeout. UDS paths are filesystem-rooted, so a successful
connect proves SOME process is bound to the daemon's socket — and the
only producer of that socket is the daemon whose pidfile sits next to
it. Anything else (no socket file, ECONNREFUSED, timeout) means the
daemon isn't actually live, the pidfile is stale, and we skip the
kill. The sock/pid file cleanup still runs unconditionally so
subsequent installs aren't blocked.
Test: writes the test runner's OWN process pid into the daemon
pidfile, removes the socket, then calls `killEmbedDaemon`. Without
the fix the runner would receive SIGTERM and the test process would
die mid-run. With the fix the socket-alive probe returns false, the
SIGTERM is skipped, and the test continues — explicitly asserting
both the probe return value AND that the file cleanup happened.
Implementation note: the probe uses `spawnSync("node", ["-e", …])`
rather than an inline `net.connect` so it doesn't pollute the calling
process's event loop with hanging socket handles when the connect
fails. The child times out cleanly and exits.
…fic msgs
Before: `isTransformersMissingError()` classified any string containing
bare `MODULE_NOT_FOUND` as a transformers-installation issue. That
overshoots — Node's `MODULE_NOT_FOUND` fires for ANY unresolvable
import (onnxruntime-node, sharp, missing peer dep, …). Each false
positive triggered the daemon recycle path AND surfaced "run
`hivemind embeddings install`" to the user, which can't fix
non-transformers packaging problems and just hides the real failure.
Fix: only classify as transformers-missing when the error string
contains either:
- the literal `hivemind embeddings install` (the actionable wrapper
we throw from `defaultImportTransformers`, which always names
transformers in the same message), OR
- a direct reference to `@huggingface/transformers` (Node's
`Cannot find module '@huggingface/transformers'` /
`ERR_MODULE_NOT_FOUND: Cannot find package '@huggingface/transformers'`).
Bare `MODULE_NOT_FOUND` without a package name no longer matches. The
existing positive tests pass against both Node import forms. New
negative tests explicitly assert that `MODULE_NOT_FOUND while loading
onnxruntime-node` / `Cannot find module 'sharp'` are NOT classified as
transformers issues — that surface area was previously broken.
CodeRabbit #16/#17: two test files mutate process env (and one creates a tmp dir) inside `runHook(env)` calls without restoring them. Since `runHook()` only updates the env keys passed in, a test that sets e.g. `EMBEDDINGS_DISABLED_FOR_TEST=1` would leak the disabled state into every later test in the same vitest worker, making the suite order- dependent. The hermes test additionally leaves a tmp config dir on disk. Fix in both files: - Track the original value of each touched env key the first time `runHook()` mutates it (so re-mutation in the same test doesn't overwrite the snapshot with an intermediate value). - In `afterEach`, restore every captured key (delete if it was unset before, otherwise reassign) and clear the snapshot map for the next test. - For tests/hermes/hermes-capture-hook.test.ts: also track mkdtemp'd config dirs in `_tmpDirsToClean` and `rmSync` them in afterEach. No behavior change in the tests themselves; this is pure isolation hardening to remove the order-dependence CodeRabbit flagged.
CI was failing after the queue-lock and retry-after-recycle fixes
because the new branches in src/notifications/queue.ts and
src/embeddings/client.ts dropped both files below the per-file
thresholds in vitest.config.ts.
src/notifications/queue.ts:
- Added `_setLockTimingForTesting` / `_resetLockTimingForTesting` so
tests can shrink LOCK_RETRY_MAX / LOCK_RETRY_BASE_MS / LOCK_STALE_MS
to milliseconds. The production values (50 attempts × 5ms backoff,
5s stale window) make exercising the give-up and reclaim branches
prohibitively slow under real time.
- New tests in tests/claude-code/notifications-queue-lock.test.ts:
- stale-lock reclaim: utimesSync ages a pre-existing lock past the
(test-shrunk) stale window; assert enqueue succeeds and the lock
is gone.
- give-up after MAX retries: hold a fresh-mtime lock; assert
enqueue still persists (degraded last-writer-wins) and the lock
we held is still present (we didn't own it, didn't unlink it).
- readQueue malformed JSON / wrong shape branches.
- sameDedupKey: same (id, dedupKey) collapses; differing id keeps
both; differing dedupKey keeps both.
Coverage on queue.ts:
stmts 68.85% → 97.14% | branches 44.44% → 75% (target 70)
funcs 80% → 100% | lines 70.37% → 98.33%
src/embeddings/client.ts:
- Added one test that drives the retry-after-recycle path with
autoSpawn=true: a fake daemon recycles on first hello, then the
outer wrapper calls trySpawnDaemon → waitForDaemonReady → retries
embedAttempt. daemonEntry points at a non-existent file so the
spawn no-ops and waitForDaemonReady exhausts its deadline; assert
return is null and no embed was sent on the dead connection. This
covers the previously-untested waitForDaemonReady poll loop and
the outer retry composition branches.
Coverage on client.ts:
branches 78.49% → 81.72% (target 80) — clear without bloat.
Also re-committed pi/bundle/wiki-worker.js: esbuild's local-variable
renaming pass (openSync2 → openSync3, etc.) is non-deterministic across
rebuilds and didn't get re-staged in an earlier commit.
Independent reviewer (subagent with no prior context) flagged 4 real
problems on top of CodeRabbit's set. All fall into the same family of
"my fix introduced or left in a different version of a bug I also
fixed elsewhere."
1. (Critical) `recycleDaemon` in src/embeddings/client.ts SIGTERMs
`hello.pid` (or the pidfile value) without confirming the daemon is
still bound to the socket. Same TOCTOU PID-reuse risk that the
earlier killEmbedDaemon fix already gates against. Add an
`existsSync(this.socketPath)` precondition: socket file gone →
daemon gone → skip SIGTERM, only clean up file artifacts.
2. (High) `withQueueLock`'s contention backoff was a synchronous
`while (Date.now() < end) {}` busy-wait. At production defaults
(50 attempts × 5 ms × triangular growth) the worst case spins the
event loop for ~6 s, freezing every other timer/IO callback in the
hook — including the in-flight embed daemon response. Convert
`withQueueLock` and `enqueueNotification` to async, swap the busy
loop for `await sleep(delay)`. The sole src caller in
`handleTransformersMissing` now fires-and-forgets the enqueue with
a `.catch` so the capture hot path never blocks on a notification
write. Subprocess test fixtures and same-process callers updated to
`await enqueueNotification(...)`.
3. (High) `tests/cli/cli-embeddings.test.ts` killEmbedDaemon test
wrote to the real per-uid `/tmp/hivemind-embed-<uid>.{sock,pid}`
paths. On a dev box or CI runner that already has an embed daemon
running for the same uid, the test would clobber its sock file.
Added an optional `socketDir` parameter to `killEmbedDaemon()`
(defaults to `/tmp` via protocol's default) and pointed the test
at a per-test `mkdtemp` dir.
4. (High) `tests/claude-code/notifications-queue-lock.test.ts`'s
"writeQueue outside-HOME guard" test hedged in a multi-paragraph
comment and asserted the happy path. The `throw new Error("write
blocked: …")` branch was uncovered. Extracted the path/home check
into `_isQueuePathInsideHome(path, home)` and replaced the hedged
test with 6 cases that exercise the predicate directly: child of
home, equals home, trailing-slash home, sibling dir, prefix-match
attack (`/home/userspace` vs `/home/user`), and a path that
resolves outside a tmp home.
No regressions in the full suite: 2685/2685 (previously 2680, +5 from
the new guard tests, with the buggy hedged test removed). Build clean.
| } | ||
| } | ||
| async dispatch(req) { | ||
| if (req.op === "hello") { |
There was a problem hiding this comment.
@coderabbitai what hello stands for, here, why is hardcoded?
There was a problem hiding this comment.
@kaghni — "hello" here is the wire-protocol operation name for a new IPC handshake message, not an arbitrary magic string. It plays the same role as "ping" and "embed" in the daemon's dispatch switch.
Here's what it does end-to-end:
sequenceDiagram
participant C as EmbedClient
participant D as EmbedDaemon
C->>D: { op: "hello", id: "..." }
D-->>C: { id, daemonPath, pid, protocolVersion }
Note over C: Does daemonPath match my daemonEntry?
alt mismatch (stale daemon from old bundle)
C->>D: SIGTERM
C->>C: clear sock/pid → spawn fresh daemon
else match
C->>C: proceed with embed requests
end
Why it exists: After a marketplace plugin auto-upgrade, a new bundle directory is created but the old daemon process can still be running (from the previous bundle path). Without a handshake, the client would silently talk to a stale daemon that may not have the correct @huggingface/transformers path, producing NULL embeddings. The hello op lets the client detect this path mismatch on the very first connection and recycle the poisoned daemon.
Why the string is "hardcoded": This is a compiled/bundled file. In the TypeScript source (src/embeddings/protocol.ts), HelloRequest is declared as a discriminated union literal:
interface HelloRequest { op: "hello"; id: string }
type DaemonRequest = EmbedRequest | PingRequest | HelloRequest;So "hello" is the typed discriminator value — the same string the client sends as op: "hello" and the daemon checks with req.op === "hello". It's as "hardcoded" as "ping" or "embed" are, which is intentional and correct for a simple line-delimited JSON IPC protocol.
| // `owns-own-node-modules`. (Symlinks at this path, including stale ones | ||
| // pointing at a defunct shared-deps target, still go through | ||
| // `symlinkForce` so we replace them with the canonical target.) | ||
| const state = linkStateFor(install); |
There was a problem hiding this comment.
@coderabbitai, why is there even a problem of overriding node_modules?
There was a problem hiding this comment.
@kaghni Great question! The problem comes down to how symlinkForce works internally: it calls unlinkSync to remove the existing path before creating the new symlink. unlinkSync handles files and symlinks just fine, but throws EISDIR when the target is a real directory.
So the danger is:
- An agent (especially one installed via a marketplace) may have a real
node_modules/directory at<pluginDir>/node_modulesrather than a symlink. - If
symlinkForceis called on such a path,unlinkSyncthrowsEISDIRand the entirehivemind embeddings installcommand crashes mid-loop. - The result is a partially linked state — some agents got their symlinks, others didn't — which is worse than a clean skip.
The fix introduced here checks linkStateFor(install) first:
- If
kind === "owns-own-node-modules"→ skip with a warning (it's a real dir; unsafe tounlink) - For everything else (
no-node-modules,linked-elsewhere,linked-to-shared) →symlinkForceis safe because those are either absent or already a symlink, whichunlinkSynccan remove without error.
In short: it's not that overriding is philosophically wrong — it's that the underlying unlinkSync operation is incapable of removing a directory, so attempting it would crash the install loop halfway through.
There was a problem hiding this comment.
@coderabbitai what daemon.ts is for and where is it used? briefily
There was a problem hiding this comment.
@kaghni, here's a brief overview:
What daemon.ts is for
src/embeddings/daemon.ts implements EmbedDaemon — a long-lived background process that:
- Loads the Nomic embedding model into RAM once and keeps it there.
- Listens on a per-user Unix socket (
/tmp/...) for IPC requests from the rest of the plugin. - Handles three operations over that socket:
ping— readiness/health check.embed— vectorize a text string.hello(added in this PR) — handshake that returns the daemon's file path, PID, and protocol version so the client can detect stale/mismatched daemons after upgrades.
- Shuts itself down after an idle timeout (~200 MB RAM savings when unused).
Where it is used
| Consumer | Role |
|---|---|
src/embeddings/client.ts (EmbedClient) |
Spawns the daemon on demand, connects to its socket, and sends hello/embed/ping requests |
src/hooks/session-start-setup.ts |
Warms up the daemon at session start |
src/hooks/{claude-code,codex,cursor,hermes}/capture.ts |
Embeds captured messages via EmbedClient |
*/bundle/wiki-worker.js |
Embeds session summaries before uploading |
*/bundle/shell/deeplake-shell.js |
Powers semantic search in the shell |
src/cli/embeddings.ts |
killEmbedDaemon() SIGTERMs the daemon on disable/uninstall |
The bundled copies (claude-code/bundle/embeddings/embed-daemon.js, codex/..., cursor/..., hermes/..., embeddings/embed-daemon.js) are the shipped artifacts — each platform's plugin bundle includes its own copy of this file so it can be spawned as a subprocess.
There was a problem hiding this comment.
@coderabbitai same here, brief usage and description of nomic.ts
There was a problem hiding this comment.
@kaghni, here's a brief overview of src/embeddings/nomic.ts:
src/embeddings/nomic.ts — Overview
Purpose: A lightweight daemon-only wrapper around @huggingface/transformers that drives Nomic-embed text embeddings. It is intentionally isolated so the heavyweight transformer dependency is never pulled into hook bundles.
Transformers Resolution (new in this PR)
| Function | Role |
|---|---|
_importFromCanonicalSharedDeps(sharedDir?) |
Resolves @huggingface/transformers from the canonical shared-deps directory (~/.hivemind/embed-deps) via createRequire. Handles both ESM (.mjs) and CJS (.cjs) export shapes. |
_importFromBareSpecifier() |
Fallback — bare import("@huggingface/transformers") for dev trees or collocated installs. |
_normalizeTransformersModule(mod) |
Normalizes the two possible module shapes: CJS wraps exports under .default; ESM exposes them at the top level. |
defaultImportTransformers(canonical?, bare?) |
Orchestrates the two-step resolution: tries canonical first, falls back to bare, throws an actionable error (hivemind embeddings install) if both fail. |
The indirection through let _importTransformers allows tests to inject a mock without touching real disk paths.
NomicEmbedder Class
Manages the nomic-embed-text model pipeline lifecycle:
load()— lazily initializes thefeature-extractionpipeline (idempotent, deduplicates concurrent calls via a sharedPromise).embed(text, kind)— embeds a single string with a query/document prefix (Nomic's asymmetric retrieval convention), then truncates + re-normalizes todimsdimensions (Matryoshka).embedBatch(texts, kind)— batches multiple strings through the pipeline in one call, then splits and truncates each result.truncate(vec)— slices todimsand re-normalizes the resulting sub-vector to unit length (Matryoshka re-norm; skips if norm is zero).
Test Seam
_setTransformersImporterForTesting / _resetTransformersImporterForTesting replace the module-level _importTransformers pointer so unit tests can bypass real disk resolution entirely without relying on vi.mock ordering.
There was a problem hiding this comment.
@coderabbitai how protocol works for skillify?
|
@efenocchi, Pi and OpenClaw have no changes, but I'm not sure they should. |
|
@kaghni Confirmed correct on both — verified the surface: Pi: OpenClaw: MCP-mode plugin with
It only consumes embeddings (semantic search takes Follow-up PR — tracked separately, not in this one. We're going to:
Why a separate PR rather than tacking it onto this one:
I'll open the follow-up against `main` once #168 is merged. |
…sformers-resolution # Conflicts: # bundle/cli.js # claude-code/bundle/capture.js # claude-code/bundle/embeddings/embed-daemon.js # codex/bundle/embeddings/embed-daemon.js # codex/bundle/stop.js # cursor/bundle/capture.js # cursor/bundle/embeddings/embed-daemon.js # embeddings/embed-daemon.js # hermes/bundle/capture.js # hermes/bundle/embeddings/embed-daemon.js
…sformers-resolution # Conflicts: # bundle/cli.js
Summary
Embeddings have been silently failing in production. The capture hook
spawns the embed daemon from the marketplace plugin bundle path
(
~/.claude/plugins/cache/hivemind/hivemind/<version>/bundle/embeddings/embed-daemon.js),but
@huggingface/transformersis installed only at~/.hivemind/embed-deps/node_modules/— reachable from the bundle viaa one-time symlink that
hivemind embeddings installcreates. EveryClaude Code marketplace auto-upgrade drops a new versioned cache dir
without the symlink, so the bundle daemon's
import("@huggingface/transformers")returns MODULE_NOT_FOUND,embed()returns null, andsessions.message_embeddingwrites throughas
NULLforever after. The user-facing log shows ~30 minutes ofembed err: Cannot find packagelines.This PR is a fix-it-forever overhaul of the embed daemon lifecycle:
Explicit transformers resolver in the daemon.
nomic.tsnowloads transformers via
createRequire(pathToFileURL("~/.hivemind/embed-deps/")).resolve()followed by an absolute-URL dynamic import, with a bare-specifier
fallback for dev trees. The daemon resolves transformers regardless
of which bundle path it was spawned from.
Persistent opt-in via
~/.deeplake/config.json. The legacyHIVEMIND_EMBEDDINGSenv var is read ONCE on first run to seedembeddings.enabled, then never consulted again. Source of truthis now a real file shared across all 4 agents.
New CLI surface.
hivemind embeddings installis the heavypath (deps + symlinks +
enabled:true).enable/disablearelight config flips.
uninstall [--prune]is the heavy counterpart(removes symlinks + optionally wipes shared deps). Every agent's
SessionStart injection advertises the full surface.
Hello handshake + stuck-daemon recycle. The socket is per-UID,
not per-version, so one stuck daemon poisons every subsequent
session for up to 10 minutes of idle-out. The client now sends a
helloon first connect; on path mismatch (typical post-upgrade)it SIGTERMs the daemon and clears its sock/pid. Same recycle fires
if the daemon returns a transformers-missing error during
embed(). Next session spawns fresh from the current bundle.Visible one-time notification. On transformers-missing, the
client enqueues a warn-severity notification routed through the
existing SessionStart drain so the user sees a clear "run
hivemind embeddings install" message. Suppressed when the userhas explicitly disabled embeddings — no nag for opt-outs.
Self-heal symlink across cached versions. Each capture hook
runs
ensurePluginNodeModulesLink({ bundleDir })at top-level.The first capture under a new plugin version (post-marketplace-
upgrade) creates
<pluginDir>/node_modules → ~/.hivemind/embed-deps/node_modulesatomically; subsequent calls are O(1) no-ops. Conservative — never
clobbers a real
node_modules, never overrides a symlink to adifferent valid target, but cleans up dangling links so it can
recreate them on the next call. Refuses to act outside the
production
bundle/layout (guard against tests).Bundle-scan guards. New
tests/claude-code/embeddings-bundle-scan.test.tsreads each agent's shipped
embed-daemon.js,capture.js, andthe CLI bundle, and asserts every literal string the runtime
depends on survives bundling. 30-second reviewer guardrail.
Behavioral migration
HIVEMIND_EMBEDDINGS=falseenv var~/.deeplake/config.json→embeddings.enabled: falseEmbeddingsStatus = "env-disabled"EmbeddingsStatus = "user-disabled"enableEmbeddings()(heavy)installEmbeddings()disableEmbeddings({ prune })(heavy)uninstallEmbeddings({ prune })enableEmbeddings()(light) — flip config flag, warn if deps missingdisableEmbeddings()(light) — flip flag + SIGTERM daemonfalse/unset →enabled: false; elseenabled: trueMost users have the env var unset, so on first run after this lands
they'll get
enabled: falseand need to runhivemind embeddings enableor
installto opt back in. That's intentional — explicit consentinstead of implicit on-by-default.
Test plan
npm test→ 2457 passing (up from ~2371 onmain).tsc --noEmitclean.fix.
existing real dir, existing symlink-elsewhere, dangling symlink,
non-bundle layout).
tables"):
node codex/bundle/commands/auth-login.js org switch test_pluginsessionsTableNameatsessions_testinsrc/config.ts,rebuild, reinstall plugin.
kill $(cat /tmp/hivemind-embed-$(id -u).pid 2>/dev/null) ; rm -f /tmp/hivemind-embed-$(id -u).{sock,pid}SELECT COUNT(*) AS total, COUNT(message_embedding) AS with_embed FROM sessions_test WHERE creation_date > NOW() - INTERVAL '5 minutes';with_embed == total.src/config.tsbefore merging.hivemind embeddings disable→ next capture writes NULL, nodaemon spawn, no notification.
hivemind embeddings enable→ daemon spawns, embedding columnpopulated.
rm <marketplace-cache>/<version>/node_modulesOut of scope (filed separately)
~/.deeplake/hook-debug.log. Different auth-layer issue,orthogonal to embeddings.
Summary by CodeRabbit
New Features
hivemind embeddings installto manage transformer dependencies separately from enablementhivemind embeddings uninstall [--prune]for full cleanup with optional dependency pruning~/.deeplake/config.jsonfor cross-session preference preservationBug Fixes