Skip to content

fix(embeddings): pi spawn-on-miss + openclaw embedding producer (#178)#183

Open
efenocchi wants to merge 7 commits into
mainfrom
feat/openclaw-embed-producer-178
Open

fix(embeddings): pi spawn-on-miss + openclaw embedding producer (#178)#183
efenocchi wants to merge 7 commits into
mainfrom
feat/openclaw-embed-producer-178

Conversation

@efenocchi
Copy link
Copy Markdown
Collaborator

@efenocchi efenocchi commented May 18, 2026

Closes #178. Follow-up to PR #168 — surfaced during review by @kaghni who flagged that pi and openclaw had no/minimal changes despite the "embeddings fully wired across agents" framing.

What this lands

Three pieces of work, separated into focused commits per the repo's "never >3 src files in one commit across layers" rule:

1. src/embeddings/standalone-embed-client.ts + tests — c9478ec

New helper tryEmbedStandalone(text, kind) for agents that don't bundle a daemon of their own (pi extension source, openclaw plugin). Mirrors the spawn-on-miss state machine in src/embeddings/client.ts but stripped:

  • No hello/handshake. Read-only consumers never recycle a stuck daemon; recycling is the hot-path client's job, two recycle paths would race.
  • No singleton, no notification side-effects.
  • No SIGTERM on a live-PID pidfile with a missing socket — same PID-reuse risk PR fix(embeddings): silent NULL embeddings after marketplace upgrades #168 fixed in client.ts.

Coverage threshold added at the client.ts tier (90/80/90/90).

2. Pi spawn-on-miss bug fix — 17f9435

Pi's existing embed() called spawn(...) bare — no O_EXCL pidfile lock, no respect for an alive owner. Two concurrent pi turns (or pi racing another agent at SessionStart) both spawned a daemon; the second crashed on bind. The header comment block described the canonical behavior but the code didn't implement it.

Replaces both tryEmbedOverSocket (connect-only) and the inline spawn loop with a single spawn-on-miss state machine mirroring the shared helper. embed() collapses to env-check → empty-check → tryEmbedOverSocket.

3. OpenClaw embedding producer — 8d7df3d

OpenClaw previously omitted message_embedding from every sessions INSERT — semantic recall on openclaw sessions was broken because every row landed NULL.

Now openclaw imports tryEmbedStandalone and embeds the captured message before INSERT. The helper imports spawn from node:child_process at the top level, which the openclaw esbuild config replaces with a no-op stub. Without the real spawn, the auto-spawn-on-miss fallback silently does nothing. Fix: openclaw already has realSpawn from createRequire(import.meta.url); we inject it into the helper at module load via _setSpawnImpl (renamed from _setSpawnForTesting to reflect its two legitimate use cases — tests AND bundle environments stubbing node:child_process).

Bundle-scan regression guard in tests/openclaw/openclaw-embed-bundle.test.ts locks in: exactly one tryEmbedStandalone call site on the auto-capture path, message_embedding in the INSERT column list, _setSpawnImpl(realSpawn) called at module load, and no INSERT that hardcodes literal NULL.

4. Codex pre-merge review fixes — bb9df97

Pre-merge codex review flagged 2 P1 + 1 P2:

  • P1 initial virtual fs implementation #1 — Empty-pidfile race. openSync(path, "wx") creates the lock file BEFORE writeSync(pid) lands. A second caller observing the gap saw Number("") === 0 → null → "stale", unlinked, and re-opened. Both callers spawned a daemon; the second crashed on bind. Fix: readPidFile now returns a tristate (number | "empty" | null); trySpawnDaemon treats "empty" as "writer in progress, wait", never unlinks. Pi's inline version also switched from writeFileSync(path, ...) to writeSync(fd, ...) so a racing unlink can't clobber.
  • P1 Feature/enriched capture #2 — Pidfile leak when spawn succeeds but daemon never opens socket. Placeholder PID stayed in the file with our (still-alive) process PID; future callers saw a "live owner" and waited forever. Fix: new maybeCleanupOwnPlaceholder unlinks ONLY if pidfile still contains process.pid.
  • P2 — Runtime validation at the socket boundary. Daemon JSON is untrusted at runtime even though TypeScript types claim number[]. Both implementations now reject any non-finite element before returning the vector.

4 new unit tests (empty pidfile = no respawn, retry-after-cleanup recovery, non-finite array → null, NaN/Infinity → null) + 3 source-level regression guards in pi.

5. Codex follow-up: stuck empty pidfile — f04f00a

Codex's second pass confirmed all 3 fixes correct but flagged a residual edge: a process SIGKILL'd exactly between openSync(wx) and writeSync(pid) leaves an empty pidfile that every subsequent caller treats as "writer in progress" — silent NULL embeddings for that uid forever. Extended maybeCleanupOwnPlaceholder to also unlink an empty pidfile after the spawnWaitMs (5s) timeout — orders of magnitude longer than the legitimate openSync→writeSync gap.

11-case edge matrix (all unit-tested)

# Scenario Expected
1 Binary missing NULL, no spawn
2 Binary present, no socket / pid Spawn → wait → embed
3 Socket alive Connect → embed
4 Stale socket, no daemon Spawn (daemon unlinks on bind)
5 Dead PID in pidfile Cleanup → spawn
6 Live PID, no socket Wait, no SIGTERM
7 Two callers race O_EXCL: one spawns, other waits
8 spawn() throws NULL, pidfile rolled back
9 Daemon never opens socket 5s timeout → NULL + cleanup
10 Embed request times out NULL
11 Daemon returns unknown-op NULL

Test plan

  • npm test — 2741 / 2741 pass (was 2733 before this branch; added 8)
  • npm run build clean
  • npx tsc --noEmit clean
  • codex review — final pass returned "No new [P1] or [P2] findings"
  • Per-file coverage on src/embeddings/standalone-embed-client.ts: 96.52% statements / 84.61% branches / 94.73% functions / 100% lines (≥ 90/80/90/90 threshold added in this PR)
  • E2E on test_plugin/default/sessions_test (NEVER prod) — manual pre-merge step using the /tmp/e2e-embed-check.mjs pattern from PR fix(embeddings): silent NULL embeddings after marketplace upgrades #168 (socket p50=10ms, write p50=402ms, semantic recall TOP-1 @ 0.7409). Will run before merge.

Files touched

  • src/embeddings/standalone-embed-client.ts (new, 305 LOC)
  • tests/claude-code/standalone-embed-client.test.ts (new, 22 tests)
  • pi/extension-source/hivemind.ts (replaces tryEmbedOverSocket + embed() spawn logic)
  • tests/pi/pi-extension-source.test.ts (5 new regression guards)
  • openclaw/src/index.ts (embed call + _setSpawnImpl injection)
  • tests/openclaw/openclaw-embed-bundle.test.ts (new bundle-scan)
  • tests/claude-code/skillify-session-start-injection.test.ts (regex window bump)
  • vitest.config.ts (coverage threshold)

Summary by CodeRabbit

  • New Features

    • Added message embedding to the auto-capture pipeline with automatic daemon spawning and graceful NULL fallback on failures.
    • Implemented improved daemon lifecycle management with race-condition safety and per-user isolation.
  • Tests

    • Added comprehensive test coverage for embedding client functionality and daemon behavior.
    • Added integration tests to prevent regressions in embedding wiring.
  • Chores

    • Updated test configuration for code coverage thresholds.

Review Change Stack

efenocchi added 5 commits May 18, 2026 21:08
New `src/embeddings/standalone-embed-client.ts` exposes a top-level
`tryEmbedStandalone(text, kind)` for agents that don't bundle a daemon
of their own (pi extension source, openclaw plugin). Mirrors the
spawn-on-miss state machine in `src/embeddings/client.ts` but stripped:

- No hello/handshake. Read-only consumers never recycle a stuck daemon;
  recycling is the hot-path client's job. Two recycle paths would race.
- No singleton, no notification side-effects.
- No SIGTERM on a live-PID pidfile with a missing socket. PR #168
  fixed the same PID-reuse risk in `client.ts`; we mirror that lesson.

Tests cover all 11 edge cases from issue #178:
binary missing, happy path, spawn after socket-absent, stale socket
cleanup, dead-PID cleanup, live-PID respect, concurrent-spawn race,
spawn() throws, daemon never listens, request timeout, unknown-op
fallback. Plus regression guards for garbage pidfiles, malformed JSON,
and the FIN-without-response half-close (covered in the original
client.ts but re-tested here so the standalone path stays in lockstep).

No callers wired yet — pure scaffold + tests. Pi's bug fix (commit 2)
and openclaw's embedding producer (commit 3) consume this helper next.

Coverage threshold added to match `client.ts` tier (90/80/90/90).
Pi's existing embed() called spawn(...) bare — no O_EXCL pidfile lock,
no respect for an alive pidfile owner, no cleanup of stale state. Two
concurrent pi turns (or pi racing another agent at SessionStart) both
spawned a daemon; the second crashed on bind. The header comment block
described the canonical behavior but the code didn't implement it.

Replaces both `tryEmbedOverSocket` (connect-only) and the inline spawn
loop in `embed()` with a single spawn-on-miss state machine that
mirrors src/embeddings/standalone-embed-client.ts. The 11 edge cases
unit-tested for the shared helper are the same ones the inline version
must handle:

  - binary missing → no spawn (case 1)
  - happy path → connect + embed (case 3)
  - stale socket → spawn; daemon unlinks on bind (case 4)
  - dead PID in pidfile → cleanup + spawn (case 5)
  - live PID + missing socket → wait, no SIGTERM (case 6 — same PID-reuse
    risk PR #168 fixed in client.ts; we MUST mirror that lesson)
  - concurrent spawn → O_EXCL `wx` lets one win; loser waits (case 7)
  - spawn() throws → roll pidfile back so the next turn retries (case 8)
  - daemon never opens socket → 5s timeout → NULL (case 9)
  - request timeout / unknown-op error → NULL (cases 10/11)

`embed()` collapses to: env check → empty check → tryEmbedOverSocket.

The header comment block (lines 162-176) now accurately describes what
the code does. Two new regression guards in tests/pi/pi-extension-source.test.ts
lock in the O_EXCL pidfile pattern and the "never SIGTERM a live PID" rule.

Pi is shipped as raw .ts with zero non-builtin deps, so the helper is
inlined rather than imported. The shared helper's unit tests
(tests/claude-code/standalone-embed-client.test.ts, 11 cases) are the
authoritative coverage for the state machine; the inline version is
verified at the source level only.
OpenClaw previously omitted message_embedding from every sessions
INSERT — schema-strict tables had to keep that column nullable just for
openclaw's writes. Semantic recall on openclaw sessions was broken
because every row landed with NULL.

Now openclaw imports tryEmbedStandalone from the shared helper and
embeds the captured message before INSERT:

  - daemon available (`hivemind embeddings install` deposited the binary)
    → real vector lands in message_embedding
  - daemon missing → tryEmbedStandalone returns null → embeddingSqlLiteral
    yields the string "NULL" → today's behavior preserved on every
    failure mode

The helper imports `spawn` from node:child_process at the top level,
which the openclaw esbuild config replaces with a no-op stub
(stub-unused-child-process — drops CC-only exec helpers from shared
modules). Without the real spawn, the auto-spawn-on-miss fallback
silently does nothing. Fix: openclaw already has `realSpawn` from
createRequire(import.meta.url) for its own worker spawn; we inject it
into the helper at module load via the new _setSpawnImpl seam (renamed
from _setSpawnForTesting to reflect its two legitimate use cases —
tests and bundle environments that stub node:child_process).

Bundle-scan regression guard in tests/openclaw/openclaw-embed-bundle.test.ts
locks in:
  - exactly one tryEmbedStandalone call site on the auto-capture path
  - message_embedding present in the INSERT column list
  - _setSpawnImpl(realSpawn) called at module load
  - no INSERT that hardcodes a literal NULL into message_embedding

The skillify wiring test's agent_end → Auto-captured distance cap is
bumped 3500→4000 to accommodate the embed call + its comment block.

The acceptance criteria's e2e check on test_plugin/default/sessions_test
remains a manual pre-merge step — same /tmp/e2e-embed-check.mjs pattern
that verified PR #168 (socket p50=10ms, write p50=402ms, semantic
recall TOP-1 @ 0.7409).
Two P1 race conditions in the standalone embed client (mirrored in pi's
inline copy) that codex's pre-merge review surfaced:

P1 #1 — Empty-pidfile race. openSync(path, "wx") creates the lock file
BEFORE writeSync(pid) lands. A second caller observing the gap saw an
empty pidfile, ran Number("") === 0 → null → "stale", unlinked, and
re-opened — both callers ended up spawning a daemon. The second daemon
crashed on bind(); silent NULL embeddings followed until the surviving
daemon's pidfile re-converged. Fix: readPidFile now returns a tristate
(number | "empty" | null); trySpawnDaemon treats "empty" as "writer in
progress, wait", never unlinks. Pi's inline version also switched from
writeFileSync(path, ...) to writeSync(fd, ...) so a racing unlink can't
clobber our pidfile.

P1 #2 — Pidfile leak when spawn succeeds but daemon never opens socket.
The placeholder PID stayed in the file with our (still-alive) process
PID; future callers saw a "live owner" and waited forever, locking the
session into NULL embeddings until process restart. Fix: new
maybeCleanupOwnPlaceholder helper unlinks the pidfile only if it still
contains process.pid (guard against clobbering the daemon's own write).
Called from tryEmbedStandalone's waitForSocket-failed branch and pi's
equivalent.

P2 — Runtime validation at the socket boundary. Daemon JSON is
untrusted at runtime even though TypeScript types claim number[]. A
buggy/older daemon could ship strings / null / NaN / Infinity that flow
straight into the ARRAY[...]::FLOAT4[] SQL literal. Both implementations
now reject any non-finite element before returning the vector.

Tests:
- standalone-embed-client.test.ts: new tests for both P1s
  (empty-pidfile-no-respawn + own-placeholder-cleanup-enables-retry),
  plus two P2 tests for non-finite payload rejection. Case-6 test
  updated to use process.ppid as the "other live agent" PID — using
  process.pid would now legitimately trip the P1 #2 cleanup path and
  mask the live-owner-is-someone-else semantics it was supposed to lock in.
- pi-extension-source.test.ts: source-level regression guards for the
  three fixes (empty short-circuit, maybeCleanupOwnPlaceholderInline,
  Number.isFinite validation).

Coverage on standalone-embed-client.ts after fixes: 96.5% statements /
84.6% branches / 94.7% functions / 100% lines — above the 90/80/90/90
threshold.
Codex's follow-up review noted a residual edge: a process killed
(SIGKILL, OOM) exactly between openSync(wx) and writeSync(pid) leaves
an empty pidfile behind. Every subsequent caller then sees the empty
sentinel — correct interpretation as "writer in progress" for the
sub-microsecond legitimate window, but for a SIGKILL'd writer the file
never resolves. Result: silent NULL embeddings for that uid until
manual /tmp cleanup.

Fix: extend maybeCleanupOwnPlaceholder (and its pi mirror) to ALSO
unlink an empty pidfile. The cleanup runs only after waitForSocket has
already waited spawnWaitMs (5s) — orders of magnitude longer than the
legitimate openSync→writeSync gap, so "still empty here" means the
writer died, not "writer is in progress."

Trade-off worth naming: if a legitimate spawn somehow takes longer than
5s to writeSync its placeholder (effectively never — that path is two
syscalls), we'd race with it. The cleanup-on-timeout invariant ("we've
already waited the whole spawn window") makes this safe in practice.

Test: new regression in standalone-embed-client.test.ts asserts a
pre-existing empty pidfile gets cleared after the timeout AND the
NEXT call can spawn for real (durability of recovery, not just the
single-call case). Existing P1 #1 test scope narrowed to its
load-bearing assertion (zero double-spawn) — the post-timeout cleanup
is now exercised by the new test.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b88654e-140e-4d58-b748-c1dc13df39e8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds daemon auto-spawn with race-safe pidfile locking to both pi and openclaw. A new standalone embedding client handles spawn-on-miss using O_EXCL pidfile coordination, socket polling, and request validation. Pi's implementation is refactored from non-functional to correct, and openclaw is wired to embed captured messages on-demand when the daemon is available.

Changes

Embedding daemon auto-spawn for pi and openclaw

Layer / File(s) Summary
Standalone embed client core
src/embeddings/standalone-embed-client.ts
New module with per-UID socket/pidfile paths, O_EXCL pidfile locking, socket connection with timeout, request/response JSON exchange, and embedding validation. Returns null on any failure. Exports tryEmbedStandalone(), StandaloneEmbedOptions, SHARED_DAEMON_PATH, and _setSpawnImpl() test hook.
Standalone client unit tests
tests/claude-code/standalone-embed-client.test.ts
Comprehensive Vitest suite with isolated temp directories and mock daemon covering all 11 edge cases: direct socket connect, spawn-on-miss, stale sockets, dead/live PIDs, two-caller race de-duplication, timeouts, protocol errors, non-finite values, and default options.
Pi hivemind spawn fix
pi/extension-source/hivemind.ts
Replaces non-functional connect-only path with inline spawn-on-miss using per-UID socket/pidfile, O_EXCL locks, and socket polling. Simplifies embed() to single call. Updates node:fs import.
Pi source regression tests
tests/pi/pi-extension-source.test.ts
Adds assertions enforcing UID-keyed socket convention, O_EXCL pidfile lock with wx flag, live-PID respect without SIGTERM, empty-pidfile race handling, own-placeholder cleanup on timeout, and finite-number embedding validation.
OpenClaw embedding integration
openclaw/src/index.ts
Imports tryEmbedStandalone and embeddingSqlLiteral, injects real spawn via _setSpawnImpl() at module load, updates auto-capture session INSERT to include computed message_embedding column.
OpenClaw bundle verification
tests/openclaw/openclaw-embed-bundle.test.ts
Tests built bundle artifact to verify standalone client imports, exactly-once tryEmbedStandalone invocation, message_embedding column in session INSERT, real spawn injection, and absence of hardcoded NULL literals.
Test window adjustment
tests/claude-code/skillify-session-start-injection.test.ts
Loosens regex distance window between agent_end and Auto-captured to accommodate expanded embedding wiring.
Coverage config
vitest.config.ts
Adds per-file coverage thresholds for src/embeddings/standalone-embed-client.ts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The PR introduces substantial new logic with race-condition semantics (pidfile O_EXCL locking, two-caller spawn de-duplication, PID liveness probing), async socket handling with timeouts, and tight integration across three code paths. While individual changes are well-bounded, the interconnected spawn coordination, validation logic, and edge-case coverage demand careful review of synchronization safety and failure paths.

Possibly related issues

  • #85: Directly adds embedding wiring for OpenClaw including per-message message_embedding persistence and standalone embed-daemon client integration.
  • #80: Addresses the exact missing-column regression by adding standalone client imports to openclaw and updating auto-capture session INSERT with embeddingSqlLiteral.

Suggested reviewers

  • kaghni

Poem

🐰 Sockets sing in harmony,
Pidfiles locked in unity,
No more NULL from spawn-on-miss—
Pi and Claw both won't dismiss!
Race conditions safely tamed, 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: pi spawn-on-miss bug fix and openclaw embedding producer wiring, matching the core objectives.
Description check ✅ Passed The description covers objectives, implementation approach, edge-case matrix, test plan, and files touched but lacks version bump per template.
Linked Issues check ✅ Passed Changes implement all #178 objectives: pi spawn-on-miss fix, openclaw embedding producer, shared standalone-embed-client, full 11-case edge matrix coverage, and opt-in semantics preserved.
Out of Scope Changes check ✅ Passed All changes align with #178 scope: spawn-on-miss logic, embedding producer wiring, tests, and coverage thresholds. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/openclaw-embed-producer-178

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

CI's Typecheck-and-Test job runs `npm run audit:openclaw --criticals-only`
on the shipped openclaw/dist bundle. ClawHub flags any literal
`process.env` reference in a file that also `fetch()`es as
env-harvesting (CRITICAL → CI failure). My new standalone embed client
tripped this twice:

  1. `getUid()` had a `process.env.USER ?? "default"` fallback for
     non-Unix runtimes. On Linux/macOS `process.getuid` is always
     present, so the fallback never fires in practice — but esbuild
     keeps the literal in the bundle.
  2. The daemon spawn passed `env: process.env` explicitly. That's
     the spawn default when `env` is omitted, so the assignment is
     redundant AND it leaves a `process.env` substring in the bundle.

Fix: drop both. `getUid()` returns "default" on non-Unix (only
requirement is that daemon + every client agree on the socket path,
which they will because they use the same helper). `_spawn` is called
without an `env` option — Node uses process.env by default.

Pi mirror updated in lockstep so the two implementations stay
byte-equivalent. Pi doesn't go through ClawHub itself, but keeping
divergence to zero between them is the whole reason the inline pi copy
exists.

Audit after fix: 0 critical, 2 warn (pre-existing readFileSync warns
unrelated to this PR).
@coderabbitai coderabbitai Bot requested a review from kaghni May 19, 2026 00:04
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 100.00% (🎯 90%) 98 / 98
🟢 Statements 96.52% (🎯 90%) 111 / 115
🟢 Functions 94.74% (🎯 90%) 18 / 19
🟢 Branches 94.23% (🎯 90%) 49 / 52
File Coverage — 1 file changed
File Stmts Branches Functions Lines
src/embeddings/standalone-embed-client.ts 🟢 96.5% 🟢 94.2% 🟢 94.7% 🟢 100.0%

Generated for commit 1444c76.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/pi/pi-extension-source.test.ts (1)

56-65: ⚡ Quick win

Strengthen the SIGTERM regression test to catch implicit process.kill(pid).

The current regex only rejects explicit SIGTERM as a second argument. In Node.js, process.kill(pid) without a signal argument also sends SIGTERM by default, so the source could regress to bare process.kill(pid) and still pass this test.

🔍 Suggested hardening
   it("respects an alive pidfile owner instead of SIGTERMing it", () => {
@@
-    expect(PI_SRC).toContain("isPidAlive");
-    expect(PI_SRC).not.toMatch(/process\.kill\([^,]+,\s*["']?SIGTERM/);
+    expect(PI_SRC).toContain("isPidAlive");
+    expect(PI_SRC).toMatch(/process\.kill\(\s*pid\s*,\s*0\s*\)/);
+    const withoutLivenessProbe = PI_SRC.replace(/process\.kill\(\s*pid\s*,\s*0\s*\)/g, "");
+    expect(withoutLivenessProbe).not.toMatch(/process\.kill\(/);
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/pi/pi-extension-source.test.ts` around lines 56 - 65, Update the test
that inspects PI_SRC: the current assertion
expect(PI_SRC).not.toMatch(/process\.kill\([^,]+,\s*["']?SIGTERM/); only rejects
explicit SIGTERM but misses bare process.kill(pid) which also defaults to
SIGTERM. Modify the .not.toMatch(...) assertion on PI_SRC (the one checking for
process.kill usage) so the regex also matches calls with no second argument
(i.e., process.kill(pid)) as well as explicit SIGTERM, ensuring any process.kill
invocation that would send SIGTERM is rejected; keep the isPidAlive presence
assertion unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openclaw/src/index.ts`:
- Around line 1280-1289: The current code calls tryEmbedStandalone(line,
"document") which embeds the full JSON envelope (UUID/timestamp/role + content)
and adds noisy per-row tokens; change the call to embed the content-only string
(e.g. text or the stable content variable used earlier) by calling
tryEmbedStandalone(text, "document") and then pass that embedding into
embeddingSqlLiteral(embedding); ensure the variable text (or the chosen
content-only variable) is in scope where tryEmbedStandalone and
embeddingSqlLiteral are invoked so you embed the message body rather than the
JSON envelope.

---

Nitpick comments:
In `@tests/pi/pi-extension-source.test.ts`:
- Around line 56-65: Update the test that inspects PI_SRC: the current assertion
expect(PI_SRC).not.toMatch(/process\.kill\([^,]+,\s*["']?SIGTERM/); only rejects
explicit SIGTERM but misses bare process.kill(pid) which also defaults to
SIGTERM. Modify the .not.toMatch(...) assertion on PI_SRC (the one checking for
process.kill usage) so the regex also matches calls with no second argument
(i.e., process.kill(pid)) as well as explicit SIGTERM, ensuring any process.kill
invocation that would send SIGTERM is rejected; keep the isPidAlive presence
assertion unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 79827026-037f-4d4b-af9b-c48418218e83

📥 Commits

Reviewing files that changed from the base of the PR and between 66ad723 and 76ecd0b.

📒 Files selected for processing (8)
  • openclaw/src/index.ts
  • pi/extension-source/hivemind.ts
  • src/embeddings/standalone-embed-client.ts
  • tests/claude-code/skillify-session-start-injection.test.ts
  • tests/claude-code/standalone-embed-client.test.ts
  • tests/openclaw/openclaw-embed-bundle.test.ts
  • tests/pi/pi-extension-source.test.ts
  • vitest.config.ts

Comment thread openclaw/src/index.ts
Comment on lines +1280 to +1289
// Embed the captured message. Returns null whenever the
// shared daemon isn't available (binary not installed, spawn
// failed, timeout, etc.) — embeddingSqlLiteral then yields
// the literal `NULL`, preserving today's "row lands with
// NULL in message_embedding" behavior on every failure mode.
// Real vectors land only when `hivemind embeddings install`
// has populated ~/.hivemind/embed-deps/embed-daemon.js, in
// line with the explicit-opt-in rule from src/user-config.ts.
const embedding = await tryEmbedStandalone(line, "document");
const embeddingSql = embeddingSqlLiteral(embedding);
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Embed the message body, not the JSON envelope.

Line 1288 embeds line (UUID/timestamp/role + content JSON), which adds noisy, per-row tokens and weakens semantic similarity quality. Embed text (or another stable content-only string) instead.

Suggested fix
-            const embedding = await tryEmbedStandalone(line, "document");
+            const embedding = await tryEmbedStandalone(text, "document");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openclaw/src/index.ts` around lines 1280 - 1289, The current code calls
tryEmbedStandalone(line, "document") which embeds the full JSON envelope
(UUID/timestamp/role + content) and adds noisy per-row tokens; change the call
to embed the content-only string (e.g. text or the stable content variable used
earlier) by calling tryEmbedStandalone(text, "document") and then pass that
embedding into embeddingSqlLiteral(embedding); ensure the variable text (or the
chosen content-only variable) is in scope where tryEmbedStandalone and
embeddingSqlLiteral are invoked so you embed the message body rather than the
JSON envelope.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and declined for this PR — the suggestion is correct in vacuum but would break cross-agent semantic comparability.

All four sibling agents that write into the same sessions.message_embedding column embed line (the JSON envelope), not text:

  • src/hooks/capture.ts:146 (claude-code)
  • src/hooks/codex/capture.ts:131 + src/hooks/codex/stop.ts:126
  • src/hooks/cursor/capture.ts:158
  • src/hooks/hermes/capture.ts:138

Changing only openclaw to embed text would mix two different embedding distributions in the same column: a search_query: "user wants X" query would compare against embed({id, type, session_id, content: "user wants X", timestamp}) for the other agents and embed("user wants X") for openclaw — different magnitudes, different score ranges, ranking gets unbalanced.

The only place text is embedded today is the wiki-worker path (src/hooks/wiki-worker.ts:197 and its per-agent mirrors), which writes to a different column (memory.summary_embedding) — there the input IS just the summary text, no envelope, so text is correct.

If the team wants to switch the entire sessions capture path to content-only embeddings (which IS likely the right long-term call — less token noise, better recall), that's a separate PR that updates all five agents and re-embeds existing rows. Out of scope here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efenocchi, that's a clear and well-reasoned explanation — thank you. The cross-agent consistency constraint on sessions.message_embedding is exactly the kind of context a static review misses. Accepted.

I'll record this so future reviews don't re-raise the same point on any of the five capture hooks.

Would you like me to open a follow-up issue to track the content-only embedding migration (updating all five agents + re-embedding existing rows)?


✏️ Learnings added
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 183
File: openclaw/src/index.ts:1280-1289
Timestamp: 2026-05-19T00:40:45.681Z
Learning: In the hivemind codebase, all session capture hooks (src/hooks/capture.ts, src/hooks/codex/capture.ts, src/hooks/codex/stop.ts, src/hooks/cursor/capture.ts, src/hooks/hermes/capture.ts, and openclaw/src/index.ts) intentionally embed the full JSON envelope (`line`, which includes id/type/session_id/content/timestamp) — NOT the content-only string — when populating the `sessions.message_embedding` column. This ensures cross-agent semantic comparability in the shared column. Suggesting to embed content-only (`text`) for any single agent would break ranking across agents using different embedding distributions. The `memory.summary_embedding` column (wiki-worker path) correctly uses content-only text since it is a different column with different semantics.

Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 183
File: openclaw/src/index.ts:1280-1289
Timestamp: 2026-05-19T00:40:45.681Z
Learning: A future improvement to switch the entire `sessions.message_embedding` capture path to content-only embeddings (better recall, less token noise) requires updating all five agents (claude-code, codex, cursor, hermes, openclaw) and re-embedding existing rows — it is explicitly out of scope for piecemeal changes and should be a dedicated PR.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

The previous regex rejected only explicit `process.kill(pid, "SIGTERM")`
patterns. Node treats `process.kill(pid)` without a second argument as
SIGTERM by default, so the source could regress to a bare `process.kill(pid)`
and still pass the test.

New shape: positively assert the liveness probe `process.kill(pid, 0)` is
present, then strip it out and assert NO other `process.kill(...)` survives.
Catches both explicit signal forms AND the default-SIGTERM bare form.

CodeRabbit review on PR #183.
@efenocchi
Copy link
Copy Markdown
Collaborator Author

@coderabbitai both comments addressed:

Major — embed text vs line (declined with rationale): replied inline at #discussion_r3262888587. Short version: all four sibling agents (claude-code, codex, cursor, hermes) embed line into the same sessions.message_embedding column; changing only openclaw would mix two embedding distributions in shared storage and unbalance ranking. Switching the whole capture path to content-only embeddings is a separate, codebase-wide PR.

Nitpick — strengthen SIGTERM regression (fixed in 6cc2d9c): ported your suggested hardening verbatim. The test now (a) positively asserts the process.kill(pid, 0) liveness probe is present, then (b) strips it out and asserts no other process.kill(...) survives — catches the bare process.kill(pid) default-SIGTERM regression you flagged. Tests green (19/19 in pi suite).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Tip

For best results, initiate chat on the files or code changes.

@efenocchi — thanks for the detailed follow-up on both threads.

text vs line (declined): The consistency argument is compelling — mixing two embedding distributions in a shared ranking column would silently degrade retrieval quality for all agents, not just openclaw. Happy to leave this as-is and revisit it as part of a codebase-wide migration when the time comes.

SIGTERM regression hardening (fixed in 6cc2d9c): The two-phase approach (positive assertion for the process.kill(pid, 0) liveness probe, then strip-and-assert-none for any bare SIGTERM) is exactly the right structure — it makes the test self-documenting and catches both regression directions. 19/19 green is a solid sign-off.

Everything looks good from my side. Feel free to merge when the manual E2E on sessions_test is confirmed. 🎉

🐇 ✨


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

// No explicit `env: process.env` — it's the spawn default, and a
// literal `process.env` reference in source kept in lockstep with
// src/embeddings/standalone-embed-client.ts (which DOES go through
// ClawHub static-scan from the openclaw bundle).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai why there is clawhub static scan here in pi?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaghni Great question — those mentions are documentation artifacts carried over from lockstep mirroring, not because pi itself goes through ClawHub static-scan.

Here's the breakdown:

  1. src/embeddings/standalone-embed-client.ts is bundled into openclaw, which does go through a ClawHub static-scan. That scan imposes certain coding constraints — for example:

    • Avoid process.env.USER as a UID fallback (potentially flagged as insecure/non-portable).
    • Avoid an explicit env: process.env argument in spawn() calls.
  2. pi/extension-source/hivemind.ts inlines an equivalent implementation of the same daemon-spawn logic, explicitly kept byte-identical to standalone-embed-client.ts (per the PR intent). The comments acknowledge that pi itself doesn't go through ClawHub static-scan, but the coding choices are preserved anyway so the two implementations stay in lockstep:

    "even though pi doesn't go through ClawHub static-scan, we keep the source in lockstep with src/embeddings/standalone-embed-client.ts (which DOES)"

So the ClawHub references are essentially cross-implementation traceability notes — they explain why a certain pattern was chosen in the shared module, and why pi's copy preserves the same pattern even though it has no direct obligation to do so. If the two implementations ever diverge, the comment tells a future reader which file is the canonical source of truth for the constraint.

const resp = JSON.parse(buf.slice(0, nl));
// Daemon may return `{ error: "unknown op" }` from an older protocol — graceful NULL.
if (!Array.isArray(resp.embedding)) return settle(null);
// Codex P2 — JSON-over-socket is untrusted at runtime. Reject any
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai same for codex

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find codex-related embedding files
fd -t f -e ts -e js "codex" --exclude node_modules
echo "---"
# Search for codex embedding socket/client usage
rg -n "tryEmbed\|embedSocket\|EMBED_SOCKET\|EMBED_PID\|connectDaemon\|trySpawnDaemon\|standalone-embed\|EmbedClient" --type=ts -C2 -g "!*.test.ts" -g "!node_modules/**"
echo "---"
# Look for codex extension source
fd -t f "codex" --exclude node_modules

Length of output: 858


🏁 Script executed:

#!/bin/bash
# Find codex hook source files
fd -t f -e ts --exclude node_modules -g "!*.test.ts" | xargs grep -l "codex\|Codex" 2>/dev/null | head -20
echo "---"
# Look for EmbedClient usage across all source files
rg -n "EmbedClient\|embed-client\|embeddings/client\|tryEmbedStandalone" --type=ts -C3 -g "!*.test.ts" -g "!node_modules/**"
echo "---"
# Check codex hooks directory
fd -t f -e ts -g "!*.test.ts" | xargs grep -l "embed\|Embed" 2>/dev/null | grep -v node_modules | head -20

Length of output: 8


🏁 Script executed:

#!/bin/bash
# Broad directory structure scan
tree -L 3 src/ 2>/dev/null | head -60
echo "---"
# Find all hooks source files
fd -t f -e ts -p "hooks" | grep -v "node_modules\|\.test\." | head -30
echo "---"
# Search for embed usage in hooks
rg -rn "embed" src/hooks/ --include="*.ts" -l 2>/dev/null
echo "---"
# Check if there's a codex-specific hooks file
fd -t f -e ts | rg "hook|capture" | grep -v "test\|node_modules" | head -20

Length of output: 2895


🏁 Script executed:

#!/bin/bash
cat src/hooks/codex/capture.ts
echo "==="
cat src/hooks/capture.ts
echo "==="
cat src/embeddings/client.ts | head -80

Length of output: 18344


🏁 Script executed:

#!/bin/bash
# Get the review comment thread at the specific comment URL
gh api repos/activeloopai/hivemind/pulls/183/comments --jq '.[] | select(.id == 3263957806 or (.in_reply_to_id == 3263957806)) | {id, in_reply_to_id, body, path, line}'
echo "---"
# Also get the comment itself
gh api repos/activeloopai/hivemind/pulls/comments/3263957806 --jq '{id, body, path, line, original_line}'

Length of output: 262


🏁 Script executed:

#!/bin/bash
# Check the full embed client for non-finite validation
cat src/embeddings/client.ts
echo "==="
# Also check what's around line 306 in the pi file
sed -n '295,320p' pi/extension-source/hivemind.ts

Length of output: 21596


@kaghni, confirmed — EmbedClient.embedAttempt() in src/embeddings/client.ts is missing the same per-element finite validation that this PR added to both the pi inline client and standalone-embed-client.ts. It currently does a bare type-cast return:

return resp.embedding;

…without checking whether individual elements are actual finite numbers. A misbehaving or older daemon could still ship NaN, Infinity, strings, or null elements that flow straight into the ARRAY[...]::FLOAT4[] SQL literal for codex/CC/cursor/hermes.

Here's the fix for src/embeddings/client.ts — mirror the same guard already in the pi and standalone clients:

// in EmbedClient.embedAttempt(), after the existing null/error check:

if (resp.error || !("embedding" in resp) || !resp.embedding) {
  const err = resp.error ?? "no embedding";
  log(`embed err: ${err}`);
  if (isTransformersMissingError(err)) {
    this.handleTransformersMissing(err);
  }
  return null;
}
// Guard: reject any non-finite element (string, null, NaN, Infinity, object).
// JSON-over-socket is untrusted at runtime; a misbehaving daemon could ship
// bad values that flow into the ARRAY[...]::FLOAT4[] SQL literal.
for (const v of resp.embedding) {
  if (typeof v !== "number" || !Number.isFinite(v)) {
    log(`embed: rejected non-finite element in response`);
    return null;
  }
}
return resp.embedding;

Would you like me to open a follow-up issue to track this, or plan the code change directly in this PR?

  • Add the same non-finite element validation guard to EmbedClient.embedAttempt() in src/embeddings/client.ts, mirroring the guards already present in pi/extension-source/hivemind.ts (line 306) and src/embeddings/standalone-embed-client.ts

🧬 Code Graph Analysis Results
  • File: src/utils/sql.ts, lines 12-18
export function sqlStr(value: string): string {
  return value
    .replace(/\\/g, "\\\\")
    .replace(/'/g, "''")
    .replace(/\0/g, "")
    .replace(/[\x01-\x08\x0b\x0c\x0e-\x1f\x7f]/g, "");
}
  • File: src/utils/sql.ts, lines 23-27
export function sqlLike(value: string): string {
  return sqlStr(value)
    .replace(/\\/g, "\\\\")
    .replace(/%/g, "\\%")
    .replace(/_/g, "\\_");
}
  • File: src/hooks/summary-state.ts, lines 21-25
export interface SummaryState {
  lastSummaryAt: number;
  lastSummaryCount: number;
  totalCount: number;
}

Failed to handle agent chat message. Please try again.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efenocchi can you check this one

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

embeddings: pi spawn-on-miss bug + make openclaw an embedding producer

2 participants