fix(embeddings): pi spawn-on-miss + openclaw embedding producer (#178)#183
fix(embeddings): pi spawn-on-miss + openclaw embedding producer (#178)#183efenocchi wants to merge 7 commits into
Conversation
New `src/embeddings/standalone-embed-client.ts` exposes a top-level `tryEmbedStandalone(text, kind)` for agents that don't bundle a daemon of their own (pi extension source, openclaw plugin). Mirrors the spawn-on-miss state machine in `src/embeddings/client.ts` but stripped: - No hello/handshake. Read-only consumers never recycle a stuck daemon; recycling is the hot-path client's job. Two recycle paths would race. - No singleton, no notification side-effects. - No SIGTERM on a live-PID pidfile with a missing socket. PR #168 fixed the same PID-reuse risk in `client.ts`; we mirror that lesson. Tests cover all 11 edge cases from issue #178: binary missing, happy path, spawn after socket-absent, stale socket cleanup, dead-PID cleanup, live-PID respect, concurrent-spawn race, spawn() throws, daemon never listens, request timeout, unknown-op fallback. Plus regression guards for garbage pidfiles, malformed JSON, and the FIN-without-response half-close (covered in the original client.ts but re-tested here so the standalone path stays in lockstep). No callers wired yet — pure scaffold + tests. Pi's bug fix (commit 2) and openclaw's embedding producer (commit 3) consume this helper next. Coverage threshold added to match `client.ts` tier (90/80/90/90).
Pi's existing embed() called spawn(...) bare — no O_EXCL pidfile lock,
no respect for an alive pidfile owner, no cleanup of stale state. Two
concurrent pi turns (or pi racing another agent at SessionStart) both
spawned a daemon; the second crashed on bind. The header comment block
described the canonical behavior but the code didn't implement it.
Replaces both `tryEmbedOverSocket` (connect-only) and the inline spawn
loop in `embed()` with a single spawn-on-miss state machine that
mirrors src/embeddings/standalone-embed-client.ts. The 11 edge cases
unit-tested for the shared helper are the same ones the inline version
must handle:
- binary missing → no spawn (case 1)
- happy path → connect + embed (case 3)
- stale socket → spawn; daemon unlinks on bind (case 4)
- dead PID in pidfile → cleanup + spawn (case 5)
- live PID + missing socket → wait, no SIGTERM (case 6 — same PID-reuse
risk PR #168 fixed in client.ts; we MUST mirror that lesson)
- concurrent spawn → O_EXCL `wx` lets one win; loser waits (case 7)
- spawn() throws → roll pidfile back so the next turn retries (case 8)
- daemon never opens socket → 5s timeout → NULL (case 9)
- request timeout / unknown-op error → NULL (cases 10/11)
`embed()` collapses to: env check → empty check → tryEmbedOverSocket.
The header comment block (lines 162-176) now accurately describes what
the code does. Two new regression guards in tests/pi/pi-extension-source.test.ts
lock in the O_EXCL pidfile pattern and the "never SIGTERM a live PID" rule.
Pi is shipped as raw .ts with zero non-builtin deps, so the helper is
inlined rather than imported. The shared helper's unit tests
(tests/claude-code/standalone-embed-client.test.ts, 11 cases) are the
authoritative coverage for the state machine; the inline version is
verified at the source level only.
OpenClaw previously omitted message_embedding from every sessions
INSERT — schema-strict tables had to keep that column nullable just for
openclaw's writes. Semantic recall on openclaw sessions was broken
because every row landed with NULL.
Now openclaw imports tryEmbedStandalone from the shared helper and
embeds the captured message before INSERT:
- daemon available (`hivemind embeddings install` deposited the binary)
→ real vector lands in message_embedding
- daemon missing → tryEmbedStandalone returns null → embeddingSqlLiteral
yields the string "NULL" → today's behavior preserved on every
failure mode
The helper imports `spawn` from node:child_process at the top level,
which the openclaw esbuild config replaces with a no-op stub
(stub-unused-child-process — drops CC-only exec helpers from shared
modules). Without the real spawn, the auto-spawn-on-miss fallback
silently does nothing. Fix: openclaw already has `realSpawn` from
createRequire(import.meta.url) for its own worker spawn; we inject it
into the helper at module load via the new _setSpawnImpl seam (renamed
from _setSpawnForTesting to reflect its two legitimate use cases —
tests and bundle environments that stub node:child_process).
Bundle-scan regression guard in tests/openclaw/openclaw-embed-bundle.test.ts
locks in:
- exactly one tryEmbedStandalone call site on the auto-capture path
- message_embedding present in the INSERT column list
- _setSpawnImpl(realSpawn) called at module load
- no INSERT that hardcodes a literal NULL into message_embedding
The skillify wiring test's agent_end → Auto-captured distance cap is
bumped 3500→4000 to accommodate the embed call + its comment block.
The acceptance criteria's e2e check on test_plugin/default/sessions_test
remains a manual pre-merge step — same /tmp/e2e-embed-check.mjs pattern
that verified PR #168 (socket p50=10ms, write p50=402ms, semantic
recall TOP-1 @ 0.7409).
Two P1 race conditions in the standalone embed client (mirrored in pi's inline copy) that codex's pre-merge review surfaced: P1 #1 — Empty-pidfile race. openSync(path, "wx") creates the lock file BEFORE writeSync(pid) lands. A second caller observing the gap saw an empty pidfile, ran Number("") === 0 → null → "stale", unlinked, and re-opened — both callers ended up spawning a daemon. The second daemon crashed on bind(); silent NULL embeddings followed until the surviving daemon's pidfile re-converged. Fix: readPidFile now returns a tristate (number | "empty" | null); trySpawnDaemon treats "empty" as "writer in progress, wait", never unlinks. Pi's inline version also switched from writeFileSync(path, ...) to writeSync(fd, ...) so a racing unlink can't clobber our pidfile. P1 #2 — Pidfile leak when spawn succeeds but daemon never opens socket. The placeholder PID stayed in the file with our (still-alive) process PID; future callers saw a "live owner" and waited forever, locking the session into NULL embeddings until process restart. Fix: new maybeCleanupOwnPlaceholder helper unlinks the pidfile only if it still contains process.pid (guard against clobbering the daemon's own write). Called from tryEmbedStandalone's waitForSocket-failed branch and pi's equivalent. P2 — Runtime validation at the socket boundary. Daemon JSON is untrusted at runtime even though TypeScript types claim number[]. A buggy/older daemon could ship strings / null / NaN / Infinity that flow straight into the ARRAY[...]::FLOAT4[] SQL literal. Both implementations now reject any non-finite element before returning the vector. Tests: - standalone-embed-client.test.ts: new tests for both P1s (empty-pidfile-no-respawn + own-placeholder-cleanup-enables-retry), plus two P2 tests for non-finite payload rejection. Case-6 test updated to use process.ppid as the "other live agent" PID — using process.pid would now legitimately trip the P1 #2 cleanup path and mask the live-owner-is-someone-else semantics it was supposed to lock in. - pi-extension-source.test.ts: source-level regression guards for the three fixes (empty short-circuit, maybeCleanupOwnPlaceholderInline, Number.isFinite validation). Coverage on standalone-embed-client.ts after fixes: 96.5% statements / 84.6% branches / 94.7% functions / 100% lines — above the 90/80/90/90 threshold.
Codex's follow-up review noted a residual edge: a process killed
(SIGKILL, OOM) exactly between openSync(wx) and writeSync(pid) leaves
an empty pidfile behind. Every subsequent caller then sees the empty
sentinel — correct interpretation as "writer in progress" for the
sub-microsecond legitimate window, but for a SIGKILL'd writer the file
never resolves. Result: silent NULL embeddings for that uid until
manual /tmp cleanup.
Fix: extend maybeCleanupOwnPlaceholder (and its pi mirror) to ALSO
unlink an empty pidfile. The cleanup runs only after waitForSocket has
already waited spawnWaitMs (5s) — orders of magnitude longer than the
legitimate openSync→writeSync gap, so "still empty here" means the
writer died, not "writer is in progress."
Trade-off worth naming: if a legitimate spawn somehow takes longer than
5s to writeSync its placeholder (effectively never — that path is two
syscalls), we'd race with it. The cleanup-on-timeout invariant ("we've
already waited the whole spawn window") makes this safe in practice.
Test: new regression in standalone-embed-client.test.ts asserts a
pre-existing empty pidfile gets cleared after the timeout AND the
NEXT call can spawn for real (durability of recovery, not just the
single-call case). Existing P1 #1 test scope narrowed to its
load-bearing assertion (zero double-spawn) — the post-timeout cleanup
is now exercised by the new test.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds daemon auto-spawn with race-safe pidfile locking to both pi and openclaw. A new standalone embedding client handles spawn-on-miss using O_EXCL pidfile coordination, socket polling, and request validation. Pi's implementation is refactored from non-functional to correct, and openclaw is wired to embed captured messages on-demand when the daemon is available. ChangesEmbedding daemon auto-spawn for pi and openclaw
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The PR introduces substantial new logic with race-condition semantics (pidfile O_EXCL locking, two-caller spawn de-duplication, PID liveness probing), async socket handling with timeouts, and tight integration across three code paths. While individual changes are well-bounded, the interconnected spawn coordination, validation logic, and edge-case coverage demand careful review of synchronization safety and failure paths. Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
CI's Typecheck-and-Test job runs `npm run audit:openclaw --criticals-only`
on the shipped openclaw/dist bundle. ClawHub flags any literal
`process.env` reference in a file that also `fetch()`es as
env-harvesting (CRITICAL → CI failure). My new standalone embed client
tripped this twice:
1. `getUid()` had a `process.env.USER ?? "default"` fallback for
non-Unix runtimes. On Linux/macOS `process.getuid` is always
present, so the fallback never fires in practice — but esbuild
keeps the literal in the bundle.
2. The daemon spawn passed `env: process.env` explicitly. That's
the spawn default when `env` is omitted, so the assignment is
redundant AND it leaves a `process.env` substring in the bundle.
Fix: drop both. `getUid()` returns "default" on non-Unix (only
requirement is that daemon + every client agree on the socket path,
which they will because they use the same helper). `_spawn` is called
without an `env` option — Node uses process.env by default.
Pi mirror updated in lockstep so the two implementations stay
byte-equivalent. Pi doesn't go through ClawHub itself, but keeping
divergence to zero between them is the whole reason the inline pi copy
exists.
Audit after fix: 0 critical, 2 warn (pre-existing readFileSync warns
unrelated to this PR).
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 1 file changed
Generated for commit 1444c76. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/pi/pi-extension-source.test.ts (1)
56-65: ⚡ Quick winStrengthen the SIGTERM regression test to catch implicit
process.kill(pid).The current regex only rejects explicit
SIGTERMas a second argument. In Node.js,process.kill(pid)without a signal argument also sendsSIGTERMby default, so the source could regress to bareprocess.kill(pid)and still pass this test.🔍 Suggested hardening
it("respects an alive pidfile owner instead of SIGTERMing it", () => { @@ - expect(PI_SRC).toContain("isPidAlive"); - expect(PI_SRC).not.toMatch(/process\.kill\([^,]+,\s*["']?SIGTERM/); + expect(PI_SRC).toContain("isPidAlive"); + expect(PI_SRC).toMatch(/process\.kill\(\s*pid\s*,\s*0\s*\)/); + const withoutLivenessProbe = PI_SRC.replace(/process\.kill\(\s*pid\s*,\s*0\s*\)/g, ""); + expect(withoutLivenessProbe).not.toMatch(/process\.kill\(/); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/pi/pi-extension-source.test.ts` around lines 56 - 65, Update the test that inspects PI_SRC: the current assertion expect(PI_SRC).not.toMatch(/process\.kill\([^,]+,\s*["']?SIGTERM/); only rejects explicit SIGTERM but misses bare process.kill(pid) which also defaults to SIGTERM. Modify the .not.toMatch(...) assertion on PI_SRC (the one checking for process.kill usage) so the regex also matches calls with no second argument (i.e., process.kill(pid)) as well as explicit SIGTERM, ensuring any process.kill invocation that would send SIGTERM is rejected; keep the isPidAlive presence assertion unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@openclaw/src/index.ts`:
- Around line 1280-1289: The current code calls tryEmbedStandalone(line,
"document") which embeds the full JSON envelope (UUID/timestamp/role + content)
and adds noisy per-row tokens; change the call to embed the content-only string
(e.g. text or the stable content variable used earlier) by calling
tryEmbedStandalone(text, "document") and then pass that embedding into
embeddingSqlLiteral(embedding); ensure the variable text (or the chosen
content-only variable) is in scope where tryEmbedStandalone and
embeddingSqlLiteral are invoked so you embed the message body rather than the
JSON envelope.
---
Nitpick comments:
In `@tests/pi/pi-extension-source.test.ts`:
- Around line 56-65: Update the test that inspects PI_SRC: the current assertion
expect(PI_SRC).not.toMatch(/process\.kill\([^,]+,\s*["']?SIGTERM/); only rejects
explicit SIGTERM but misses bare process.kill(pid) which also defaults to
SIGTERM. Modify the .not.toMatch(...) assertion on PI_SRC (the one checking for
process.kill usage) so the regex also matches calls with no second argument
(i.e., process.kill(pid)) as well as explicit SIGTERM, ensuring any process.kill
invocation that would send SIGTERM is rejected; keep the isPidAlive presence
assertion unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 79827026-037f-4d4b-af9b-c48418218e83
📒 Files selected for processing (8)
openclaw/src/index.tspi/extension-source/hivemind.tssrc/embeddings/standalone-embed-client.tstests/claude-code/skillify-session-start-injection.test.tstests/claude-code/standalone-embed-client.test.tstests/openclaw/openclaw-embed-bundle.test.tstests/pi/pi-extension-source.test.tsvitest.config.ts
| // Embed the captured message. Returns null whenever the | ||
| // shared daemon isn't available (binary not installed, spawn | ||
| // failed, timeout, etc.) — embeddingSqlLiteral then yields | ||
| // the literal `NULL`, preserving today's "row lands with | ||
| // NULL in message_embedding" behavior on every failure mode. | ||
| // Real vectors land only when `hivemind embeddings install` | ||
| // has populated ~/.hivemind/embed-deps/embed-daemon.js, in | ||
| // line with the explicit-opt-in rule from src/user-config.ts. | ||
| const embedding = await tryEmbedStandalone(line, "document"); | ||
| const embeddingSql = embeddingSqlLiteral(embedding); |
There was a problem hiding this comment.
Embed the message body, not the JSON envelope.
Line 1288 embeds line (UUID/timestamp/role + content JSON), which adds noisy, per-row tokens and weakens semantic similarity quality. Embed text (or another stable content-only string) instead.
Suggested fix
- const embedding = await tryEmbedStandalone(line, "document");
+ const embedding = await tryEmbedStandalone(text, "document");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openclaw/src/index.ts` around lines 1280 - 1289, The current code calls
tryEmbedStandalone(line, "document") which embeds the full JSON envelope
(UUID/timestamp/role + content) and adds noisy per-row tokens; change the call
to embed the content-only string (e.g. text or the stable content variable used
earlier) by calling tryEmbedStandalone(text, "document") and then pass that
embedding into embeddingSqlLiteral(embedding); ensure the variable text (or the
chosen content-only variable) is in scope where tryEmbedStandalone and
embeddingSqlLiteral are invoked so you embed the message body rather than the
JSON envelope.
There was a problem hiding this comment.
Reviewed and declined for this PR — the suggestion is correct in vacuum but would break cross-agent semantic comparability.
All four sibling agents that write into the same sessions.message_embedding column embed line (the JSON envelope), not text:
src/hooks/capture.ts:146(claude-code)src/hooks/codex/capture.ts:131+src/hooks/codex/stop.ts:126src/hooks/cursor/capture.ts:158src/hooks/hermes/capture.ts:138
Changing only openclaw to embed text would mix two different embedding distributions in the same column: a search_query: "user wants X" query would compare against embed({id, type, session_id, content: "user wants X", timestamp}) for the other agents and embed("user wants X") for openclaw — different magnitudes, different score ranges, ranking gets unbalanced.
The only place text is embedded today is the wiki-worker path (src/hooks/wiki-worker.ts:197 and its per-agent mirrors), which writes to a different column (memory.summary_embedding) — there the input IS just the summary text, no envelope, so text is correct.
If the team wants to switch the entire sessions capture path to content-only embeddings (which IS likely the right long-term call — less token noise, better recall), that's a separate PR that updates all five agents and re-embeds existing rows. Out of scope here.
There was a problem hiding this comment.
@efenocchi, that's a clear and well-reasoned explanation — thank you. The cross-agent consistency constraint on sessions.message_embedding is exactly the kind of context a static review misses. Accepted.
I'll record this so future reviews don't re-raise the same point on any of the five capture hooks.
Would you like me to open a follow-up issue to track the content-only embedding migration (updating all five agents + re-embedding existing rows)?
✏️ Learnings added
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 183
File: openclaw/src/index.ts:1280-1289
Timestamp: 2026-05-19T00:40:45.681Z
Learning: In the hivemind codebase, all session capture hooks (src/hooks/capture.ts, src/hooks/codex/capture.ts, src/hooks/codex/stop.ts, src/hooks/cursor/capture.ts, src/hooks/hermes/capture.ts, and openclaw/src/index.ts) intentionally embed the full JSON envelope (`line`, which includes id/type/session_id/content/timestamp) — NOT the content-only string — when populating the `sessions.message_embedding` column. This ensures cross-agent semantic comparability in the shared column. Suggesting to embed content-only (`text`) for any single agent would break ranking across agents using different embedding distributions. The `memory.summary_embedding` column (wiki-worker path) correctly uses content-only text since it is a different column with different semantics.
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 183
File: openclaw/src/index.ts:1280-1289
Timestamp: 2026-05-19T00:40:45.681Z
Learning: A future improvement to switch the entire `sessions.message_embedding` capture path to content-only embeddings (better recall, less token noise) requires updating all five agents (claude-code, codex, cursor, hermes, openclaw) and re-embedding existing rows — it is explicitly out of scope for piecemeal changes and should be a dedicated PR.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
The previous regex rejected only explicit `process.kill(pid, "SIGTERM")` patterns. Node treats `process.kill(pid)` without a second argument as SIGTERM by default, so the source could regress to a bare `process.kill(pid)` and still pass the test. New shape: positively assert the liveness probe `process.kill(pid, 0)` is present, then strip it out and assert NO other `process.kill(...)` survives. Catches both explicit signal forms AND the default-SIGTERM bare form. CodeRabbit review on PR #183.
|
@coderabbitai both comments addressed: Major — embed Nitpick — strengthen SIGTERM regression (fixed in 6cc2d9c): ported your suggested hardening verbatim. The test now (a) positively asserts the |
|
Tip For best results, initiate chat on the files or code changes.
SIGTERM regression hardening (fixed in 6cc2d9c): The two-phase approach (positive assertion for the Everything looks good from my side. Feel free to merge when the manual E2E on 🐇 ✨ If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
| // No explicit `env: process.env` — it's the spawn default, and a | ||
| // literal `process.env` reference in source kept in lockstep with | ||
| // src/embeddings/standalone-embed-client.ts (which DOES go through | ||
| // ClawHub static-scan from the openclaw bundle). |
There was a problem hiding this comment.
@coderabbitai why there is clawhub static scan here in pi?
There was a problem hiding this comment.
@kaghni Great question — those mentions are documentation artifacts carried over from lockstep mirroring, not because pi itself goes through ClawHub static-scan.
Here's the breakdown:
-
src/embeddings/standalone-embed-client.tsis bundled into openclaw, which does go through a ClawHub static-scan. That scan imposes certain coding constraints — for example:- Avoid
process.env.USERas a UID fallback (potentially flagged as insecure/non-portable). - Avoid an explicit
env: process.envargument inspawn()calls.
- Avoid
-
pi/extension-source/hivemind.tsinlines an equivalent implementation of the same daemon-spawn logic, explicitly kept byte-identical tostandalone-embed-client.ts(per the PR intent). The comments acknowledge that pi itself doesn't go through ClawHub static-scan, but the coding choices are preserved anyway so the two implementations stay in lockstep:"even though pi doesn't go through ClawHub static-scan, we keep the source in lockstep with
src/embeddings/standalone-embed-client.ts(which DOES)"
So the ClawHub references are essentially cross-implementation traceability notes — they explain why a certain pattern was chosen in the shared module, and why pi's copy preserves the same pattern even though it has no direct obligation to do so. If the two implementations ever diverge, the comment tells a future reader which file is the canonical source of truth for the constraint.
| const resp = JSON.parse(buf.slice(0, nl)); | ||
| // Daemon may return `{ error: "unknown op" }` from an older protocol — graceful NULL. | ||
| if (!Array.isArray(resp.embedding)) return settle(null); | ||
| // Codex P2 — JSON-over-socket is untrusted at runtime. Reject any |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find codex-related embedding files
fd -t f -e ts -e js "codex" --exclude node_modules
echo "---"
# Search for codex embedding socket/client usage
rg -n "tryEmbed\|embedSocket\|EMBED_SOCKET\|EMBED_PID\|connectDaemon\|trySpawnDaemon\|standalone-embed\|EmbedClient" --type=ts -C2 -g "!*.test.ts" -g "!node_modules/**"
echo "---"
# Look for codex extension source
fd -t f "codex" --exclude node_modules
Length of output: 858
🏁 Script executed:
#!/bin/bash
# Find codex hook source files
fd -t f -e ts --exclude node_modules -g "!*.test.ts" | xargs grep -l "codex\|Codex" 2>/dev/null | head -20
echo "---"
# Look for EmbedClient usage across all source files
rg -n "EmbedClient\|embed-client\|embeddings/client\|tryEmbedStandalone" --type=ts -C3 -g "!*.test.ts" -g "!node_modules/**"
echo "---"
# Check codex hooks directory
fd -t f -e ts -g "!*.test.ts" | xargs grep -l "embed\|Embed" 2>/dev/null | grep -v node_modules | head -20
Length of output: 8
🏁 Script executed:
#!/bin/bash
# Broad directory structure scan
tree -L 3 src/ 2>/dev/null | head -60
echo "---"
# Find all hooks source files
fd -t f -e ts -p "hooks" | grep -v "node_modules\|\.test\." | head -30
echo "---"
# Search for embed usage in hooks
rg -rn "embed" src/hooks/ --include="*.ts" -l 2>/dev/null
echo "---"
# Check if there's a codex-specific hooks file
fd -t f -e ts | rg "hook|capture" | grep -v "test\|node_modules" | head -20
Length of output: 2895
🏁 Script executed:
#!/bin/bash
cat src/hooks/codex/capture.ts
echo "==="
cat src/hooks/capture.ts
echo "==="
cat src/embeddings/client.ts | head -80
Length of output: 18344
🏁 Script executed:
#!/bin/bash
# Get the review comment thread at the specific comment URL
gh api repos/activeloopai/hivemind/pulls/183/comments --jq '.[] | select(.id == 3263957806 or (.in_reply_to_id == 3263957806)) | {id, in_reply_to_id, body, path, line}'
echo "---"
# Also get the comment itself
gh api repos/activeloopai/hivemind/pulls/comments/3263957806 --jq '{id, body, path, line, original_line}'
Length of output: 262
🏁 Script executed:
#!/bin/bash
# Check the full embed client for non-finite validation
cat src/embeddings/client.ts
echo "==="
# Also check what's around line 306 in the pi file
sed -n '295,320p' pi/extension-source/hivemind.ts
Length of output: 21596
@kaghni, confirmed — EmbedClient.embedAttempt() in src/embeddings/client.ts is missing the same per-element finite validation that this PR added to both the pi inline client and standalone-embed-client.ts. It currently does a bare type-cast return:
return resp.embedding;…without checking whether individual elements are actual finite numbers. A misbehaving or older daemon could still ship NaN, Infinity, strings, or null elements that flow straight into the ARRAY[...]::FLOAT4[] SQL literal for codex/CC/cursor/hermes.
Here's the fix for src/embeddings/client.ts — mirror the same guard already in the pi and standalone clients:
// in EmbedClient.embedAttempt(), after the existing null/error check:
if (resp.error || !("embedding" in resp) || !resp.embedding) {
const err = resp.error ?? "no embedding";
log(`embed err: ${err}`);
if (isTransformersMissingError(err)) {
this.handleTransformersMissing(err);
}
return null;
}
// Guard: reject any non-finite element (string, null, NaN, Infinity, object).
// JSON-over-socket is untrusted at runtime; a misbehaving daemon could ship
// bad values that flow into the ARRAY[...]::FLOAT4[] SQL literal.
for (const v of resp.embedding) {
if (typeof v !== "number" || !Number.isFinite(v)) {
log(`embed: rejected non-finite element in response`);
return null;
}
}
return resp.embedding;Would you like me to open a follow-up issue to track this, or plan the code change directly in this PR?
- Add the same non-finite element validation guard to
EmbedClient.embedAttempt()insrc/embeddings/client.ts, mirroring the guards already present inpi/extension-source/hivemind.ts(line 306) andsrc/embeddings/standalone-embed-client.ts
🧬 Code Graph Analysis Results
- File: src/utils/sql.ts, lines 12-18
export function sqlStr(value: string): string {
return value
.replace(/\\/g, "\\\\")
.replace(/'/g, "''")
.replace(/\0/g, "")
.replace(/[\x01-\x08\x0b\x0c\x0e-\x1f\x7f]/g, "");
}
- File: src/utils/sql.ts, lines 23-27
export function sqlLike(value: string): string {
return sqlStr(value)
.replace(/\\/g, "\\\\")
.replace(/%/g, "\\%")
.replace(/_/g, "\\_");
}
- File: src/hooks/summary-state.ts, lines 21-25
export interface SummaryState {
lastSummaryAt: number;
lastSummaryCount: number;
totalCount: number;
}
Failed to handle agent chat message. Please try again.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Closes #178. Follow-up to PR #168 — surfaced during review by @kaghni who flagged that pi and openclaw had no/minimal changes despite the "embeddings fully wired across agents" framing.
What this lands
Three pieces of work, separated into focused commits per the repo's "never >3 src files in one commit across layers" rule:
1.
src/embeddings/standalone-embed-client.ts+ tests —c9478ecNew helper
tryEmbedStandalone(text, kind)for agents that don't bundle a daemon of their own (pi extension source, openclaw plugin). Mirrors the spawn-on-miss state machine insrc/embeddings/client.tsbut stripped:client.ts.Coverage threshold added at the
client.tstier (90/80/90/90).2. Pi spawn-on-miss bug fix —
17f9435Pi's existing
embed()calledspawn(...)bare — no O_EXCL pidfile lock, no respect for an alive owner. Two concurrent pi turns (or pi racing another agent at SessionStart) both spawned a daemon; the second crashed on bind. The header comment block described the canonical behavior but the code didn't implement it.Replaces both
tryEmbedOverSocket(connect-only) and the inline spawn loop with a single spawn-on-miss state machine mirroring the shared helper.embed()collapses to env-check → empty-check →tryEmbedOverSocket.3. OpenClaw embedding producer —
8d7df3dOpenClaw previously omitted
message_embeddingfrom every sessions INSERT — semantic recall on openclaw sessions was broken because every row landed NULL.Now openclaw imports
tryEmbedStandaloneand embeds the captured message before INSERT. The helper importsspawnfromnode:child_processat the top level, which the openclaw esbuild config replaces with a no-op stub. Without the real spawn, the auto-spawn-on-miss fallback silently does nothing. Fix: openclaw already hasrealSpawnfromcreateRequire(import.meta.url); we inject it into the helper at module load via_setSpawnImpl(renamed from_setSpawnForTestingto reflect its two legitimate use cases — tests AND bundle environments stubbingnode:child_process).Bundle-scan regression guard in
tests/openclaw/openclaw-embed-bundle.test.tslocks in: exactly onetryEmbedStandalonecall site on the auto-capture path,message_embeddingin the INSERT column list,_setSpawnImpl(realSpawn)called at module load, and no INSERT that hardcodes literal NULL.4. Codex pre-merge review fixes —
bb9df97Pre-merge
codex reviewflagged 2 P1 + 1 P2:openSync(path, "wx")creates the lock file BEFOREwriteSync(pid)lands. A second caller observing the gap sawNumber("") === 0 → null → "stale", unlinked, and re-opened. Both callers spawned a daemon; the second crashed on bind. Fix:readPidFilenow returns a tristate (number | "empty" | null);trySpawnDaemontreats"empty"as "writer in progress, wait", never unlinks. Pi's inline version also switched fromwriteFileSync(path, ...)towriteSync(fd, ...)so a racing unlink can't clobber.maybeCleanupOwnPlaceholderunlinks ONLY if pidfile still containsprocess.pid.number[]. Both implementations now reject any non-finite element before returning the vector.4 new unit tests (empty pidfile = no respawn, retry-after-cleanup recovery, non-finite array → null, NaN/Infinity → null) + 3 source-level regression guards in pi.
5. Codex follow-up: stuck empty pidfile —
f04f00aCodex's second pass confirmed all 3 fixes correct but flagged a residual edge: a process SIGKILL'd exactly between
openSync(wx)andwriteSync(pid)leaves an empty pidfile that every subsequent caller treats as "writer in progress" — silent NULL embeddings for that uid forever. ExtendedmaybeCleanupOwnPlaceholderto also unlink an empty pidfile after the spawnWaitMs (5s) timeout — orders of magnitude longer than the legitimate openSync→writeSync gap.11-case edge matrix (all unit-tested)
Test plan
npm test— 2741 / 2741 pass (was 2733 before this branch; added 8)npm run buildcleannpx tsc --noEmitcleancodex review— final pass returned "No new [P1] or [P2] findings"src/embeddings/standalone-embed-client.ts: 96.52% statements / 84.61% branches / 94.73% functions / 100% lines (≥ 90/80/90/90 threshold added in this PR)test_plugin/default/sessions_test(NEVER prod) — manual pre-merge step using the/tmp/e2e-embed-check.mjspattern from PR fix(embeddings): silent NULL embeddings after marketplace upgrades #168 (socket p50=10ms, write p50=402ms, semantic recall TOP-1 @ 0.7409). Will run before merge.Files touched
src/embeddings/standalone-embed-client.ts(new, 305 LOC)tests/claude-code/standalone-embed-client.test.ts(new, 22 tests)pi/extension-source/hivemind.ts(replacestryEmbedOverSocket+embed()spawn logic)tests/pi/pi-extension-source.test.ts(5 new regression guards)openclaw/src/index.ts(embed call +_setSpawnImplinjection)tests/openclaw/openclaw-embed-bundle.test.ts(new bundle-scan)tests/claude-code/skillify-session-start-injection.test.ts(regex window bump)vitest.config.ts(coverage threshold)Summary by CodeRabbit
New Features
Tests
Chores