Skip to content

refactor(cli): centralize subprocess execution and make shell use explicit#2465

Open
cv wants to merge 42 commits intomainfrom
autoresearch/shellouts-safety-2026-04-24
Open

refactor(cli): centralize subprocess execution and make shell use explicit#2465
cv wants to merge 42 commits intomainfrom
autoresearch/shellouts-safety-2026-04-24

Conversation

@cv
Copy link
Copy Markdown
Contributor

@cv cv commented Apr 25, 2026

Summary

Centralize subprocess execution behind shared helpers and make shell use explicit across the NemoClaw CLI and plugin stack.
This refactor removes implicit shell execution from the default runner APIs, standardizes environment handling, and consolidates shell/remote-command formatting without changing user-facing behavior.

Changes

  • Centralize raw process primitives in src/lib/process-primitives.ts and route CLI subprocess call sites through shared helpers instead of scattered child_process usage.
  • Make shell use explicit in src/lib/runner.ts with runShell(...), runCaptureShell(...), and runInteractiveShell(...), while keeping the default runner APIs argv-only.
  • Tighten subprocess environment handling so helpers default to the allowlisted subprocess environment instead of broad process.env inheritance.
  • Introduce and reuse shared formatting/remote-command helpers (shell-quote.ts, remote-script.ts, find-executable.ts) across deploy, onboarding, config I/O, skill install, agent runtime, and recovery paths.
  • Update CLI and plugin tests to cover the new subprocess behavior, including runner, onboarding, deploy, shields, config, sandbox-state, and plugin inference probe paths.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

AI Disclosure

  • AI-assisted — tool: pi coding agent

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Refactor
    • Unified subprocess handling to use argv-first execution, improved environment handling, detached process support, and consistent output capture/redaction.
  • New Features
    • Added executable discovery, shell-token/assignment builders, opt-in full-env for probes, warmup runner for local model servers, and structured gateway argv in manifests.
  • Bug Fixes
    • More reliable health/probes with HTTP-status checks, robust JSON/stdout parsing, safer remote script composition, and stricter backup/restore validation.
  • Tests
    • Expanded coverage for argv vs shell paths, env scrubbing/inheritance, probes, and security/regression guards.

@cv cv self-assigned this Apr 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replace ad-hoc shell-string subprocess usage with structured argv-first runners, add env-scrubbing with opt-in inheritFullEnv, introduce safe shell-token builders and executable discovery, migrate many callsites/tests to centralized process primitives and argv-based remote command builders, and switch some probe mocks to execaSync usage.

Changes

Cohort / File(s) Summary
Core runner & primitives
src/lib/runner.ts, src/lib/process-primitives.ts, src/lib/runner-argv.test.ts, test/runner.test.ts
Centralize spawn/capture via spawnResult/spawnAndHandle; require argv arrays for run/runCapture; add runShell/runCaptureShell/runInteractiveShell/runDetachedFile; update tests for argv/shell semantics.
Shell token & command builders
src/lib/shell-quote.ts, src/lib/remote-script.ts, src/lib/shell-quote.test.ts, src/lib/remote-script.test.ts
Generalize quoting (ShellQuotable); add formatShellToken, joinShellWords, buildShellAssignment; add buildShellCommand, buildSshScriptCommand, buildDockerExecScriptCommand with validation and tests.
Subprocess env / env helpers
src/lib/subprocess-env.ts, nemoclaw/src/lib/subprocess-env.ts, src/lib/http-probe.ts, src/lib/openshell.ts, src/lib/http-probe.test.ts, src/lib/openshell.test.ts
Add buildEnvForSubprocess(extraEnv, inheritFullEnv); extend allowlist (ALL_PROXY/all_proxy, NEMOCLAW_NON_INTERACTIVE); thread inheritFullEnv through probes/openshell and add tests.
Executable discovery
src/lib/find-executable.ts, src/lib/find-executable.test.ts, src/lib/resolve-openshell.ts, src/lib/services.ts
Add findExecutable/hasExecutable to locate binaries via PATH (with PATHEXT); replace command -v/execSync checks with PATH scanning and checkExecutable gating.
Remote/SSH/Deploy abstractions
src/lib/deploy.ts, src/lib/deploy.test.ts, src/lib/remote-script.ts
Refactor deploy flows to argv-style run/result objects; replace shell remote commands with buildSshScriptCommand/structured assignments; switch rsync/scp/ssh to argv arrays and update tests.
Sandbox, onboard & gateway flows
src/lib/onboard.ts, src/lib/sandbox-create-stream.ts, src/lib/sandbox-*.ts, nemoclaw/src/index.ts, src/nemoclaw.ts, agents/*/manifest.yaml
Migrate start/stop/install/connect flows to argv-based runFile/spawnChild; streamSandboxCreate accepts `string
Debug, preflight & diagnostics
src/lib/debug.ts, src/lib/preflight.ts, src/lib/preflight.test.ts, src/lib/http-probe.test.ts
Introduce runCommand/collectTransformed/RunCaptureLike/defaultRunCapture; unify capture/truncation/redaction and adapt probes to argv execution and controlled env inheritance; update tests.
Local inference, nim & tooling
src/lib/local-inference.ts, src/lib/nim.ts, src/lib/agent-runtime.ts, src/lib/config-io.ts, src/lib/local-inference.test.ts
Replace shell background/curl wrappers with curl argv arrays and detached runner (startOllamaWarmup); use run helpers for docker login; move quoting to token builders and update tests for unquoted assignments.
Tar, state & sandbox safety
src/lib/sandbox-state.ts, src/lib/sandbox-session-state.ts, test/security-sandbox-tar-traversal.test.ts, test/snapshot.test.ts
Centralize runStateCommand over spawnResult; normalize stdout/stderr; add manifest state-dir validation, safe tar extraction and rollback behavior; add tests for unsafe manifests.
Shields, services & timers
src/lib/shields.ts, src/lib/services.ts, test/shields.test.ts
Add execKubectlInSandbox using runFile; replace fork with spawnChild for timers; use hasExecutable and detached spawn for cloudflared; update tests/mocks.
Process/test adjustments & execa switch
nemoclaw/src/index.ts, nemoclaw/src/register.test.ts, various tests (test/*, src/lib/*/*.test.ts)
Switch some exec mocks/usage from execFileSync/spawnSync to execaSync or unified runner runFile/runCapture; update many tests to new runner shapes, argv expectations, and env-scrubbing behavior. Notably: probeOpenShellInference uses execaSync(..., { reject: true }) and parses result.stdout.
New utilities & additions
src/lib/find-executable.ts, src/lib/process-primitives.ts, src/lib/remote-script.ts, src/lib/subprocess-env.ts
Add utilities for locating executables, thin spawn wrappers, building remote shell commands/ssh/docker argument arrays, and subprocess env builder with extended allowlist.
Small tests & manifest updates
agents/hermes/manifest.yaml, agents/openclaw/manifest.yaml, various tests
Manifests: gateway_commandgateway_argv; many tests updated to reflect argv-style commands, env scrubbing, and new helper behaviors.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as CLI
  participant Runner as Runner (run / runShell)
  participant Env as Env builder (buildEnvForSubprocess)
  participant Spawn as Spawn API (spawnChild / spawnResult)
  participant Remote as Remote process (ssh/docker/bin)

  CLI->>Runner: invoke command (argv or shell)
  Runner->>Env: build environment (extraEnv, inheritFullEnv?)
  Env-->>Runner: prepared env
  Runner->>Spawn: spawn binary with argv + env
  Spawn->>Remote: execute process
  Remote-->>Spawn: stdout/stderr + status
  Spawn-->>Runner: normalized result (output, status, error)
  Runner-->>CLI: return result or throw based on ignoreError
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through args and tamed each quote,

Tokens tidy, pipelines snug as a moat.
Env keys guarded, probes follow the trail,
Spawns now polite — no more brittle shell gale.
🥕 A happy hop to ship this careful patch.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main objective: centralizing subprocess execution and making shell use explicit through new runner APIs and consolidated helpers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch autoresearch/shellouts-safety-2026-04-24

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
nemoclaw/src/register.test.ts (1)

150-153: ⚠️ Potential issue | 🟡 Minor

Reset mockedExecaSync implementation in the second test suite.

vi.clearAllMocks() does not clear mock implementations, so earlier mockReturnValue(...) can leak into this suite and make tests order-dependent.

Suggested fix
  beforeEach(() => {
    vi.clearAllMocks();
+   mockedExecaSync.mockReset();
    mockedLoadOnboardConfig.mockReturnValue(null);
  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/register.test.ts` around lines 150 - 153, The test suite leaks
mock implementations because vi.clearAllMocks() doesn't reset implementations;
add a reset for the specific mock (call mockedExecaSync.mockReset() or
mockedExecaSync.mockClear() followed by re-applying a default
mockImplementation) in the beforeEach where mockedLoadOnboardConfig is set so
mockedExecaSync's previous mockReturnValue/implementation is cleared; reference
the mockedExecaSync symbol to locate and update the beforeEach block near
mockedLoadOnboardConfig.
src/lib/sandbox-state.ts (1)

778-787: ⚠️ Potential issue | 🟠 Major

Count chown failures as restore failures.

This branch only logs a warning, but restoredDirs was already populated above, so callers can get success: true even though the agent may not be able to read its restored state. Move the success accounting behind the ownership fix, or add these dirs to failedDirs when chown fails.

Suggested fix
-    if (sshResult.status === 0) {
-      restoredDirs.push(...localDirs);
+    if (sshResult.status === 0) {
+      let ownershipOk = true;
 
       // Fix ownership — treat failure as restore failure since wrong
       // ownership means the agent can't read its own state files.
       const openshellBinary = resolveOpenshell();
       if (openshellBinary) {
@@
         );
         if (chownResult.status !== 0) {
+          ownershipOk = false;
           _log(
             `WARNING: chown failed (exit ${chownResult.status}) — agent may not be able to read restored state`,
           );
         }
       }
+
+      if (ownershipOk) {
+        restoredDirs.push(...localDirs);
+      } else {
+        failedDirs.push(...localDirs);
+      }
     } else {
       failedDirs.push(...localDirs);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/sandbox-state.ts` around lines 778 - 787, The chown failure is only
logged but not treated as a restore failure, so update the restore accounting:
after calling runStateCommand (openshellBinary, ["sandbox","exec", sandboxName,
"--","chown","-R","sandbox:sandbox", writableDir]) check chownResult.status and
if non-zero add writableDir to failedDirs (or remove it from restoredDirs)
instead of leaving it marked successful; alternatively move the code that marks
restoredDirs as successful to occur only after verifying chownResult.status ===
0. Reference symbols: runStateCommand, chownResult, restoredDirs, failedDirs,
openshellBinary, sandboxName, writableDir.
src/lib/runner.ts (1)

122-126: ⚠️ Potential issue | 🟠 Major

Add opts.shell guards to runShell() and runInteractiveShell().

runShell() and runInteractiveShell() already hardcode bash -c. If opts.shell=true flows through to spawnAndHandle(), it creates a second shell layer and changes quoting/exit semantics. runCaptureShell() already guards against this; the other two explicit shell wrappers should too.

Proposed fix
 function runShell(cmd: string, opts: RunnerOptions = {}): SpawnResult {
+  if (opts.shell) {
+    throw new Error("runShell does not allow opts.shell=true");
+  }
   const shellCmd = String(cmd);
   const stdio = opts.stdio ?? ["ignore", "pipe", "pipe"];
   return spawnAndHandle("bash", ["-c", shellCmd], opts, stdio, shellCmd);
 }
 
 function runInteractiveShell(cmd: string, opts: RunnerOptions = {}): SpawnResult {
+  if (opts.shell) {
+    throw new Error("runInteractiveShell does not allow opts.shell=true");
+  }
   const stdio = opts.stdio ?? ["inherit", "pipe", "pipe"];
   const shellCmd = String(cmd);
   return spawnAndHandle("bash", ["-c", shellCmd], opts, stdio, shellCmd);
 }

Also applies to: 187-191

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/runner.ts` around lines 122 - 126, runShell() and
runInteractiveShell() wrap commands with "bash -c" but currently pass opts
through to spawnAndHandle, so if opts.shell is true spawnAndHandle will create a
second shell; update both functions (runShell and runInteractiveShell) to ensure
opts.shell is explicitly false (or removed) before calling spawnAndHandle—mirror
the guard used in runCaptureShell()—so the hardcoded "bash -c" wrapper is the
only shell layer; make the same change for the corresponding interactive block
(the section around the existing runInteractiveShell implementation).
🧹 Nitpick comments (5)
test/onboard-selection.test.ts (1)

148-148: Consider centralizing the Ollama probe matcher in test helpers.

The same dual-probe condition is duplicated many times; extracting a shared matcher/helper would reduce drift when probe logic changes again.

Also applies to: 330-330, 426-426, 621-621, 723-723, 832-832, 947-947, 3051-3051, 3164-3164, 3293-3293

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard-selection.test.ts` at line 148, The duplicated probe matcher
logic (cmd.includes("command -v ollama") || cmd.includes("ollama --version"))
should be extracted into a shared test helper (e.g., isOllamaProbe or
matchOllamaProbe) and used across tests (replace inline checks in
onboard-selection.test.ts and the other listed locations); create the helper in
your central test helpers module, export it, update mocks or stubs that check
command strings to call that helper (referencing the function name
isOllamaProbe/matchOllamaProbe) and remove the duplicated inline conditions so
future changes to probe logic are made in one place.
src/lib/remote-script.ts (1)

20-33: Make command and commandArgs mutually exclusive.

If both are provided, both are executed (&&), which is easy to do accidentally and can cause unintended side effects. Prefer explicit one-of behavior.

Proposed guard
 export function buildShellCommand(opts: {
   command?: string;
   commandArgs?: string[];
@@
 }): string {
+  if (opts.command && opts.commandArgs && opts.commandArgs.length > 0) {
+    throw new Error("buildShellCommand accepts either command or commandArgs, not both");
+  }
+
   const steps: string[] = [];
@@
-  if (opts.command) {
+  if (opts.command) {
     steps.push(opts.command);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/remote-script.ts` around lines 20 - 33, The current logic allows both
opts.command and opts.commandArgs to be provided and concatenates both into
steps; add a mutual-exclusion guard at the start of the routine (before using
joinShellWords/formatShellToken and before pushing into steps) to throw an error
when both opts.command is truthy and opts.commandArgs exists with length > 0,
e.g. throw new Error("buildShellCommand requires either command or commandArgs,
not both"); also update the existing final validation text to reflect the one-of
requirement and keep existing behavior of building the command when exactly one
is provided.
src/lib/openshell.ts (1)

71-86: Consider extracting common env builder pattern.

buildOpenshellEnv is nearly identical to buildProbeEnv in http-probe.ts. Both functions:

  1. Return full env merge when inheritFullEnv is true
  2. Normalize extra env by removing undefined values
  3. Delegate to buildSubprocessEnv

Consider extracting this to a shared helper in subprocess-env.ts to reduce duplication.

♻️ Potential shared helper

In src/lib/subprocess-env.ts:

export function buildEnvForSubprocess(
  extraEnv: NodeJS.ProcessEnv | undefined,
  inheritFullEnv = false,
): NodeJS.ProcessEnv {
  if (inheritFullEnv) {
    return { ...process.env, ...extraEnv };
  }

  const normalizedExtraEnv: Record<string, string> = {};
  for (const [key, value] of Object.entries(extraEnv || {})) {
    if (value !== undefined) {
      normalizedExtraEnv[key] = value;
    }
  }
  return buildSubprocessEnv(normalizedExtraEnv);
}

Then both openshell.ts and http-probe.ts can import and use this shared helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/openshell.ts` around lines 71 - 86, Extract the duplicated
env-building logic into a shared helper (e.g., buildEnvForSubprocess) in a new
module (subprocess-env.ts) and have buildOpenshellEnv and buildProbeEnv call it;
specifically, move the logic that handles inheritFullEnv, normalizes extraEnv by
removing undefined values, and delegates to buildSubprocessEnv into
buildEnvForSubprocess, export that function, then replace the bodies of
buildOpenshellEnv and buildProbeEnv to simply call
buildEnvForSubprocess(extraEnv, inheritFullEnv) while keeping references to
buildSubprocessEnv intact.
src/lib/deploy.ts (1)

69-74: Keep the injected runner interfaces argv-only.

This type widening removes the static guarantee this refactor is trying to add. A future string command will now type-check in src/lib/deploy.ts and only fail later at runtime in src/lib/runner.ts.

💡 Proposed fix
   run: (
-    command: string | readonly string[],
+    command: readonly string[],
     opts?: ExecLikeOptions & { ignoreError?: boolean; suppressOutput?: boolean },
   ) => ExecResultLike;
-  runInteractive: (command: string | readonly string[]) => void;
+  runInteractive: (command: readonly string[]) => void;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/deploy.ts` around lines 69 - 74, The runner interface was widened to
accept string | readonly string[], losing the argv-only static guarantee; revert
the signatures so they accept only readonly string[] (argv form) for run and
runInteractive (leave log as-is) by changing the run signature from "command:
string | readonly string[]" to "command: readonly string[]" and ensuring
runInteractive also uses "readonly string[]"; update any related type references
(ExecLikeOptions / ExecResultLike usages) in the same module to match the
argv-only contract so callers remain type-checked at compile time.
src/lib/runner.ts (1)

155-167: Render argv commands with joinShellWords().

cmd.join(" ") collapses argument boundaries for spaces and quotes, so the redacted failure output here is less accurate than runFile()'s rendering.

💡 Proposed fix
-  const cmdStr = cmd.join(" ");
+  const cmdStr = joinShellWords(cmd);
@@
-      cmd.join(" "),
+      joinShellWords(cmd),

Also applies to: 289-302

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/runner.ts` around lines 155 - 167, The failure output currently
builds cmdStr with plain string concatenation (const cmdStr = cmd.join(" "))
which loses argument boundaries; replace this with a safe renderer by calling
joinShellWords(cmd) to produce cmdStr so quoted/space-containing args are
preserved; update both occurrences that construct cmdStr (the one used when
calling spawnAndHandle and the similar block at lines 289-302) and ensure the
module that exports joinShellWords is imported where runner.ts uses cmdStr; keep
the variable name cmdStr and pass it unchanged to spawnAndHandle for accurate
redacted failure output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/debug.ts`:
- Around line 123-153: collectTransformed currently calls runCommand which
buffers the entire stdout/stderr before transform runs, causing large outputs to
overflow; change collectTransformed to stream the child process stdout/stderr
and maintain bounded buffers (e.g., keep only first N and last N lines) while
the process runs, then call transform with those truncated buffers; update or
add a helper (e.g., spawnStreamingCommand or modify runCommand) so
collectTransformed still uses the same symbols (collectTransformed,
runCommand/spawnStreamingCommand, writeCollectedOutput,
takeFirstLines/takeLastLines) but receives pre-truncated stdout/stderr and
status without ever buffering the full output in memory.

In `@src/lib/onboard.ts`:
- Around line 5318-5324: The branch that installs Ollama unconditionally binds
the server to 0.0.0.0 and always starts the auth proxy, which regresses WSL
behavior; update the logic around installOllamaViaOfficialScript(),
startDetachedOllamaServe(`0.0.0.0:${OLLAMA_PORT}`), and startOllamaAuthProxy()
to mirror the existing "ollama" path: detect WSL (or reuse the existing ollama
install path check) and only bind to 0.0.0.0 and start the auth proxy when not
on WSL (otherwise bind to the default/local socket/localhost and skip starting
the auth proxy), so the install branch behaves the same as the
preinstalled-ollama branch.
- Around line 2608-2616: The runFile call in src/lib/onboard.ts currently sets
inheritFullEnv: true on the installer subprocess (see the runFile invocation
invoking install-openshell.sh) which bypasses buildSubprocessEnv hardening;
remove the inheritFullEnv: true property from that options object so the
subprocess uses the filtered environment, and then add
"NEMOCLAW_NON_INTERACTIVE" to the allowlist in src/lib/subprocess-env.ts (either
as the explicit variable name or by allowing the "NEMOCLAW_" prefix) so the
installer receives HOME, XDG_BIN_HOME and NEMOCLAW_NON_INTERACTIVE while other
host secrets remain filtered.

In `@src/lib/preflight.ts`:
- Around line 748-758: The check for an existing swap entry uses grep -q
"/swapfile", which can match commented lines or paths like /swapfile.bak; update
the fstab presence test (the run call that sets fstabHasSwapfile) to use a
anchored regex that only matches a non-commented /swapfile at the start of a
line—e.g. run grep with -E and a pattern like '^[[:space:]]*/swapfile(\s|$)' (or
equivalent /^\s*\/swapfile\b/) so only real fstab entries prevent appending;
keep the subsequent run([... "tee", "-a", "/etc/fstab"]) logic unchanged.

In `@src/lib/sandbox-create-stream.ts`:
- Around line 53-71: Reject empty argv arrays at the start of the Stream sandbox
creator: detect when the incoming parameter command is an array with length ===
0 and throw a clear, synchronous API error (e.g., "command must not be empty")
before calling spawnImpl; update the branch handling Array.isArray(command) in
the function that uses spawnImpl/spawnChild so callers get a deterministic
validation error instead of passing undefined as command[0] into spawnImpl.

In `@src/lib/shields.ts`:
- Around line 481-490: The child process for the shields-timer created with
spawnChild currently passes the whole environment (env: process.env), exposing
secrets; update that call to use the filtered environment helper by replacing
env: process.env with env: buildSubprocessEnv() so the subprocess only receives
allowed vars (HOME etc.); locate the spawnChild invocation that uses
actualScript, sandboxName, snapshotPath, restoreAt.toISOString(),
target.configPath, target.configDir and change its options accordingly, leaving
child.unref?.() intact.

In `@src/nemoclaw.ts`:
- Around line 1595-1598: The loop in sandboxConnect uses the synchronous
busy-spin helper sleepSeconds which burns CPU; replace the blocking call with an
awaited non-blocking delay and await the async poll so the loop yields the event
loop: change sleepSeconds(sleepFor) to an awaited async timer (e.g. await a
Promise-based setTimeout for sleepFor seconds) and ensure runSandboxList() is
awaited (const poll = await runSandboxList()) so sandboxConnect and
runSandboxList are both used asynchronously rather than blocking the thread.

In `@test/runner.test.ts`:
- Around line 99-103: The test currently uses CommonJS require for runnerPath
and runInteractive which keeps the test file in mixed CJS/ESM mode; replace the
require-based loading with ESM dynamic import by making the test async and using
const { runInteractive } = await import(runnerPath) (remove require.cache and
require.resolve usage), then call runInteractive([...]) as before; ensure the
test file uses ESM syntax (import/export) so the new assertion uses pure ESM
loading.

---

Outside diff comments:
In `@nemoclaw/src/register.test.ts`:
- Around line 150-153: The test suite leaks mock implementations because
vi.clearAllMocks() doesn't reset implementations; add a reset for the specific
mock (call mockedExecaSync.mockReset() or mockedExecaSync.mockClear() followed
by re-applying a default mockImplementation) in the beforeEach where
mockedLoadOnboardConfig is set so mockedExecaSync's previous
mockReturnValue/implementation is cleared; reference the mockedExecaSync symbol
to locate and update the beforeEach block near mockedLoadOnboardConfig.

In `@src/lib/runner.ts`:
- Around line 122-126: runShell() and runInteractiveShell() wrap commands with
"bash -c" but currently pass opts through to spawnAndHandle, so if opts.shell is
true spawnAndHandle will create a second shell; update both functions (runShell
and runInteractiveShell) to ensure opts.shell is explicitly false (or removed)
before calling spawnAndHandle—mirror the guard used in runCaptureShell()—so the
hardcoded "bash -c" wrapper is the only shell layer; make the same change for
the corresponding interactive block (the section around the existing
runInteractiveShell implementation).

In `@src/lib/sandbox-state.ts`:
- Around line 778-787: The chown failure is only logged but not treated as a
restore failure, so update the restore accounting: after calling runStateCommand
(openshellBinary, ["sandbox","exec", sandboxName,
"--","chown","-R","sandbox:sandbox", writableDir]) check chownResult.status and
if non-zero add writableDir to failedDirs (or remove it from restoredDirs)
instead of leaving it marked successful; alternatively move the code that marks
restoredDirs as successful to occur only after verifying chownResult.status ===
0. Reference symbols: runStateCommand, chownResult, restoredDirs, failedDirs,
openshellBinary, sandboxName, writableDir.

---

Nitpick comments:
In `@src/lib/deploy.ts`:
- Around line 69-74: The runner interface was widened to accept string |
readonly string[], losing the argv-only static guarantee; revert the signatures
so they accept only readonly string[] (argv form) for run and runInteractive
(leave log as-is) by changing the run signature from "command: string | readonly
string[]" to "command: readonly string[]" and ensuring runInteractive also uses
"readonly string[]"; update any related type references (ExecLikeOptions /
ExecResultLike usages) in the same module to match the argv-only contract so
callers remain type-checked at compile time.

In `@src/lib/openshell.ts`:
- Around line 71-86: Extract the duplicated env-building logic into a shared
helper (e.g., buildEnvForSubprocess) in a new module (subprocess-env.ts) and
have buildOpenshellEnv and buildProbeEnv call it; specifically, move the logic
that handles inheritFullEnv, normalizes extraEnv by removing undefined values,
and delegates to buildSubprocessEnv into buildEnvForSubprocess, export that
function, then replace the bodies of buildOpenshellEnv and buildProbeEnv to
simply call buildEnvForSubprocess(extraEnv, inheritFullEnv) while keeping
references to buildSubprocessEnv intact.

In `@src/lib/remote-script.ts`:
- Around line 20-33: The current logic allows both opts.command and
opts.commandArgs to be provided and concatenates both into steps; add a
mutual-exclusion guard at the start of the routine (before using
joinShellWords/formatShellToken and before pushing into steps) to throw an error
when both opts.command is truthy and opts.commandArgs exists with length > 0,
e.g. throw new Error("buildShellCommand requires either command or commandArgs,
not both"); also update the existing final validation text to reflect the one-of
requirement and keep existing behavior of building the command when exactly one
is provided.

In `@src/lib/runner.ts`:
- Around line 155-167: The failure output currently builds cmdStr with plain
string concatenation (const cmdStr = cmd.join(" ")) which loses argument
boundaries; replace this with a safe renderer by calling joinShellWords(cmd) to
produce cmdStr so quoted/space-containing args are preserved; update both
occurrences that construct cmdStr (the one used when calling spawnAndHandle and
the similar block at lines 289-302) and ensure the module that exports
joinShellWords is imported where runner.ts uses cmdStr; keep the variable name
cmdStr and pass it unchanged to spawnAndHandle for accurate redacted failure
output.

In `@test/onboard-selection.test.ts`:
- Line 148: The duplicated probe matcher logic (cmd.includes("command -v
ollama") || cmd.includes("ollama --version")) should be extracted into a shared
test helper (e.g., isOllamaProbe or matchOllamaProbe) and used across tests
(replace inline checks in onboard-selection.test.ts and the other listed
locations); create the helper in your central test helpers module, export it,
update mocks or stubs that check command strings to call that helper
(referencing the function name isOllamaProbe/matchOllamaProbe) and remove the
duplicated inline conditions so future changes to probe logic are made in one
place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 43a7580e-098a-43b4-838f-1b3d8654a8a6

📥 Commits

Reviewing files that changed from the base of the PR and between 6966f4b and 1a1e72e.

📒 Files selected for processing (44)
  • nemoclaw/src/index.ts
  • nemoclaw/src/register.test.ts
  • src/lib/agent-runtime.test.ts
  • src/lib/agent-runtime.ts
  • src/lib/config-io.ts
  • src/lib/credentials.ts
  • src/lib/debug.ts
  • src/lib/deploy.test.ts
  • src/lib/deploy.ts
  • src/lib/find-executable.test.ts
  • src/lib/find-executable.ts
  • src/lib/http-probe.test.ts
  • src/lib/http-probe.ts
  • src/lib/local-inference.ts
  • src/lib/nim.ts
  • src/lib/onboard.ts
  • src/lib/openshell.test.ts
  • src/lib/openshell.ts
  • src/lib/preflight.ts
  • src/lib/process-primitives.ts
  • src/lib/remote-script.ts
  • src/lib/resolve-openshell.ts
  • src/lib/runner-argv.test.ts
  • src/lib/runner.ts
  • src/lib/sandbox-config.ts
  • src/lib/sandbox-create-stream.ts
  • src/lib/sandbox-session-state.ts
  • src/lib/sandbox-state.ts
  • src/lib/sandbox-version.ts
  • src/lib/services.ts
  • src/lib/shell-quote.ts
  • src/lib/shields.ts
  • src/lib/skill-install.test.ts
  • src/lib/skill-install.ts
  • src/lib/version.ts
  • src/nemoclaw.ts
  • test/cli.test.ts
  • test/gateway-cleanup.test.ts
  • test/gateway-liveness-probe.test.ts
  • test/onboard-selection.test.ts
  • test/onboard.test.ts
  • test/runner.test.ts
  • test/security-sandbox-tar-traversal.test.ts
  • test/shields.test.ts

Comment thread src/lib/debug.ts
Comment thread src/lib/onboard.ts
Comment on lines +2608 to 2616
const result = runFile("bash", [path.join(SCRIPTS, "install-openshell.sh")], {
cwd: ROOT,
env: process.env,
stdio: ["ignore", "pipe", "pipe"],
encoding: "utf-8",
timeout: 300_000,
ignoreError: true,
suppressOutput: true,
inheritFullEnv: true,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '== runFile / env handling =='
rg -n -C3 'inheritFullEnv|buildSubprocessEnv|function runFile|const runFile' src

echo
echo '== install-openshell.sh env usage =='
rg -n -C2 '\$[A-Z_][A-Z0-9_]*|process\.env' scripts/install-openshell.sh

Repository: NVIDIA/NemoClaw

Length of output: 17001


🏁 Script executed:

cat -n src/lib/subprocess-env.ts

Repository: NVIDIA/NemoClaw

Length of output: 2399


🏁 Script executed:

sed -n '2600,2630p' src/lib/onboard.ts

Repository: NVIDIA/NemoClaw

Length of output: 1106


🏁 Script executed:

git log --oneline -n 20 -- src/lib/onboard.ts | head -10

Repository: NVIDIA/NemoClaw

Length of output: 142


🏁 Script executed:

git show --stat HEAD:src/lib/onboard.ts | head -30

Repository: NVIDIA/NemoClaw

Length of output: 1553


🏁 Script executed:

rg -n 'installOpenshell|GH_TOKEN|GITHUB_TOKEN' src/lib/onboard.ts | head -20

Repository: NVIDIA/NemoClaw

Length of output: 308


🏁 Script executed:

sed -n '4410,4470p' src/lib/onboard.ts

Repository: NVIDIA/NemoClaw

Length of output: 3233


🏁 Script executed:

cat scripts/install-openshell.sh

Repository: NVIDIA/NemoClaw

Length of output: 4659


🏁 Script executed:

rg -n 'NEMOCLAW_NON_INTERACTIVE' src/lib/ | head -15

Repository: NVIDIA/NemoClaw

Length of output: 698


🏁 Script executed:

rg -n 'GH_TOKEN|GITHUB_TOKEN' src/lib/onboard.ts | head -10

Repository: NVIDIA/NemoClaw

Length of output: 122


🏁 Script executed:

sed -n '3180,3230p' src/lib/onboard.ts

Repository: NVIDIA/NemoClaw

Length of output: 2410


Remove inheritFullEnv: true and add NEMOCLAW_NON_INTERACTIVE to the environment allowlist.

Line 2615 passes the entire host environment to install-openshell.sh, which bypasses the allowlist-based hardening. This re-exposes secrets (GITHUB_TOKEN, AWS_SECRET_ACCESS_KEY, etc.) that the PR aimed to filter out via buildSubprocessEnv().

The installer needs:

  • HOME and XDG_BIN_HOME (already allowed)
  • NEMOCLAW_NON_INTERACTIVE (line 126: controls whether installation is interactive or non-interactive for CI/CD, currently NOT in the allowlist)
  • GH_TOKEN/GITHUB_TOKEN (optional; the script falls back to unauthenticated curl when gh auth fails)

Fix: Remove inheritFullEnv: true from line 2615. Add NEMOCLAW_NON_INTERACTIVE to the allowlist in src/lib/subprocess-env.ts (either as an explicit name or via a NEMOCLAW_ prefix).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 2608 - 2616, The runFile call in
src/lib/onboard.ts currently sets inheritFullEnv: true on the installer
subprocess (see the runFile invocation invoking install-openshell.sh) which
bypasses buildSubprocessEnv hardening; remove the inheritFullEnv: true property
from that options object so the subprocess uses the filtered environment, and
then add "NEMOCLAW_NON_INTERACTIVE" to the allowlist in
src/lib/subprocess-env.ts (either as the explicit variable name or by allowing
the "NEMOCLAW_" prefix) so the installer receives HOME, XDG_BIN_HOME and
NEMOCLAW_NON_INTERACTIVE while other host secrets remain filtered.

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/preflight.ts Outdated
Comment thread src/lib/sandbox-create-stream.ts Outdated
Comment thread src/lib/shields.ts
Comment thread src/nemoclaw.ts
Comment on lines 1595 to 1598
const sleepFor = Math.min(interval, remainingMs() / 1000);
if (sleepFor <= 0) break;
spawnSync("sleep", [String(sleepFor)]);
sleepSeconds(sleepFor);
const poll = runSandboxList();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

sleepSeconds() will busy-spin this wait loop.

src/lib/wait.ts implements sleepSeconds() with a blocking spin loop, so this replacement makes nemoclaw <name> connect burn a CPU core while waiting for readiness. Since sandboxConnect() is already async, use an awaited timer here instead of a synchronous sleep.

Suggested fix
-      const sleepFor = Math.min(interval, remainingMs() / 1000);
+      const sleepFor = Math.min(interval, remainingMs() / 1000);
       if (sleepFor <= 0) break;
-      sleepSeconds(sleepFor);
+      await new Promise((resolve) => setTimeout(resolve, sleepFor * 1000));
       const poll = runSandboxList();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const sleepFor = Math.min(interval, remainingMs() / 1000);
if (sleepFor <= 0) break;
spawnSync("sleep", [String(sleepFor)]);
sleepSeconds(sleepFor);
const poll = runSandboxList();
const sleepFor = Math.min(interval, remainingMs() / 1000);
if (sleepFor <= 0) break;
await new Promise((resolve) => setTimeout(resolve, sleepFor * 1000));
const poll = runSandboxList();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 1595 - 1598, The loop in sandboxConnect uses
the synchronous busy-spin helper sleepSeconds which burns CPU; replace the
blocking call with an awaited non-blocking delay and await the async poll so the
loop yields the event loop: change sleepSeconds(sleepFor) to an awaited async
timer (e.g. await a Promise-based setTimeout for sleepFor seconds) and ensure
runSandboxList() is awaited (const poll = await runSandboxList()) so
sandboxConnect and runSandboxList are both used asynchronously rather than
blocking the thread.

Comment thread test/runner.test.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/lib/onboard.ts (2)

2606-2614: ⚠️ Potential issue | 🟠 Major

Still bypassing the subprocess env hardening here.

inheritFullEnv: true re-opens the exact secret leakage path this refactor is trying to close. The earlier fix still applies: run the installer with the filtered env and allowlist NEMOCLAW_NON_INTERACTIVE in src/lib/subprocess-env.ts instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 2606 - 2614, The runFile invocation that
executes install-openshell.sh currently sets inheritFullEnv: true which
re-introduces secret leakage; change the call in onboard.ts (the runFile(...)
that passes SCRIPTS and ROOT) to use the filtered environment instead of
inheritFullEnv by removing inheritFullEnv: true and ensuring runFile is invoked
with the filtered env helper from src/lib/subprocess-env.ts; also add
NEMOCLAW_NON_INTERACTIVE to the allowlist in src/lib/subprocess-env.ts so that
this variable is preserved for the installer while all other env entries remain
filtered.

5316-5326: ⚠️ Potential issue | 🟠 Major

Keep the install-Ollama branch aligned with the existing WSL path.

This branch still binds Ollama to 0.0.0.0 and always starts the auth proxy, while the already-installed Ollama branch skips both on WSL. That reintroduces the WSL startup regression already called out earlier.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 5316 - 5326, The new install-Ollama branch
unconditionally binds to 0.0.0.0 and always starts the auth proxy, reintroducing
the WSL regression; update the installation path in the onboarding flow so that
when running on WSL (detect existing WSL path/flag consistent with the
already-installed branch) you skip binding Ollama to 0.0.0.0 and do not call
startOllamaAuthProxy(); instead mirror the logic used by the already-installed
Ollama branch (use its WSL-aware condition) around
startDetachedOllamaServe(`0.0.0.0:${OLLAMA_PORT}`) and startOllamaAuthProxy()
and ensure OLLAMA_PORT/OLLAMA_PROXY_PORT behavior matches the existing WSL
branch.
🧹 Nitpick comments (2)
test/runner.test.ts (1)

908-925: Avoid pinning deploy regression guards to exact source spelling.

These new toContain(...) checks lock src/lib/deploy.ts to specific array literal formatting and inline command layout. Small refactors like extracting args into constants or reflowing the array will fail the suite without changing behavior.

♻️ Less brittle option
-      expect(tsSrc).toContain('commandArgs: ["openshell", "sandbox", "connect", sandboxName]');
+      expect(tsSrc).toMatch(
+        /commandArgs:\s*\[\s*"openshell",\s*"sandbox",\s*"connect",\s*sandboxName\s*\]/,
+      );

Also applies to: 955-955

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/runner.test.ts` around lines 908 - 925, The test is brittle because it
asserts exact array/spacing formatting in src/lib/deploy.ts (e.g., the
commandArgs array and literal `"--exclude src"`); update the tests to match
semantics not exact source formatting by using looser checks: replace exact
toContain('"commandArgs: ["openshell", "sandbox", "connect", sandboxName]') with
assertions that the file includes the tokens "openshell", "sandbox", "connect"
and sandboxName (or a regex that matches those elements in any
array/order/spacing), and replace the `"--exclude src"` check with a check that
ensures src is not excluded (e.g., assert absence of the string "--exclude src"
or use a regex/lookaround) while keeping the existing checks for `${rootDir}/`,
"--exclude", "dist", the brevProvider line and the run(["brev","create"...])
intent; apply the same loosening at the similar assertion around line 955.
src/lib/onboard.ts (1)

7180-7185: Reuse getWslHostAddress() here instead of re-parsing hostname -I.

This is now a second copy of the WSL host lookup logic in the same file, so dashboard URL generation can drift from the helper-backed paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 7180 - 7185, The wslAddr assignment
duplicates hostname -I parsing; replace that inline logic with a call to the
existing helper getWslHostAddress() to avoid drifting behavior—i.e., when
isWsl() is true call getWslHostAddress() and assign its result (with same null
handling) instead of re-running runCapture and split logic; update any local
assumptions about trimming/format (getWslHostAddress returns the canonical host
or null) so dashboard URL generation uses the shared helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2280-2288: installOllamaViaOfficialScript currently calls
run(["curl", ...]) and run(["sh", ...]) which may exit the process and prevent
the finally cleanup; change these to non-fatal subprocess invocations (e.g., use
spawnSync/execSync or run with a non-fatal option) that return status/errors
instead of exiting, check their exit code/err for failure, and only then
handle/throw an error after the finally block so fs.rmSync(tempDir, ...) in the
finally always runs; keep references to tempDir and installerPath and ensure any
thrown error preserves the original failure details for logging/propagation.

In `@src/lib/preflight.test.ts`:
- Around line 315-320: The test's runCaptureImpl uses String(command) which
turns string[] into comma-joined values and breaks argv-style assertions; update
runCaptureImpl to normalize command by checking if command is an array and
joining with a space (e.g., Array.isArray(command) ? command.join(' ') :
String(command')) before comparing against expected strings, and apply the same
normalization in the other occurrences referenced (around the runCaptureImpl
usages at the other listed lines) so assertions like "docker run" match
correctly.

---

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 2606-2614: The runFile invocation that executes
install-openshell.sh currently sets inheritFullEnv: true which re-introduces
secret leakage; change the call in onboard.ts (the runFile(...) that passes
SCRIPTS and ROOT) to use the filtered environment instead of inheritFullEnv by
removing inheritFullEnv: true and ensuring runFile is invoked with the filtered
env helper from src/lib/subprocess-env.ts; also add NEMOCLAW_NON_INTERACTIVE to
the allowlist in src/lib/subprocess-env.ts so that this variable is preserved
for the installer while all other env entries remain filtered.
- Around line 5316-5326: The new install-Ollama branch unconditionally binds to
0.0.0.0 and always starts the auth proxy, reintroducing the WSL regression;
update the installation path in the onboarding flow so that when running on WSL
(detect existing WSL path/flag consistent with the already-installed branch) you
skip binding Ollama to 0.0.0.0 and do not call startOllamaAuthProxy(); instead
mirror the logic used by the already-installed Ollama branch (use its WSL-aware
condition) around startDetachedOllamaServe(`0.0.0.0:${OLLAMA_PORT}`) and
startOllamaAuthProxy() and ensure OLLAMA_PORT/OLLAMA_PROXY_PORT behavior matches
the existing WSL branch.

---

Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 7180-7185: The wslAddr assignment duplicates hostname -I parsing;
replace that inline logic with a call to the existing helper getWslHostAddress()
to avoid drifting behavior—i.e., when isWsl() is true call getWslHostAddress()
and assign its result (with same null handling) instead of re-running runCapture
and split logic; update any local assumptions about trimming/format
(getWslHostAddress returns the canonical host or null) so dashboard URL
generation uses the shared helper.

In `@test/runner.test.ts`:
- Around line 908-925: The test is brittle because it asserts exact
array/spacing formatting in src/lib/deploy.ts (e.g., the commandArgs array and
literal `"--exclude src"`); update the tests to match semantics not exact source
formatting by using looser checks: replace exact toContain('"commandArgs:
["openshell", "sandbox", "connect", sandboxName]') with assertions that the file
includes the tokens "openshell", "sandbox", "connect" and sandboxName (or a
regex that matches those elements in any array/order/spacing), and replace the
`"--exclude src"` check with a check that ensures src is not excluded (e.g.,
assert absence of the string "--exclude src" or use a regex/lookaround) while
keeping the existing checks for `${rootDir}/`, "--exclude", "dist", the
brevProvider line and the run(["brev","create"...]) intent; apply the same
loosening at the similar assertion around line 955.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8a815183-d6b0-444c-a46e-f4c7ae68b2d1

📥 Commits

Reviewing files that changed from the base of the PR and between 1a1e72e and 2971418.

📒 Files selected for processing (11)
  • src/lib/config-io.ts
  • src/lib/credentials.ts
  • src/lib/deploy.test.ts
  • src/lib/http-probe.ts
  • src/lib/onboard.ts
  • src/lib/preflight.test.ts
  • src/lib/sandbox-channels.ts
  • src/lib/sandbox-state.ts
  • test/cli.test.ts
  • test/onboard.test.ts
  • test/runner.test.ts
✅ Files skipped from review due to trivial changes (3)
  • src/lib/sandbox-channels.ts
  • src/lib/sandbox-state.ts
  • test/onboard.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lib/deploy.test.ts
  • test/cli.test.ts

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/preflight.test.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
src/lib/runner.ts (1)

41-56: Reuse buildEnvForSubprocess() here instead of re-implementing it.

This is effectively a second copy of the helper that now lives in src/lib/subprocess-env.ts. Because the env allowlist is security-sensitive and mirrored in nemoclaw/src/lib/subprocess-env.ts too, keeping another local implementation here makes future drift much more likely.

♻️ Suggested refactor
-const { buildSubprocessEnv } = require("./subprocess-env.js");
+const { buildEnvForSubprocess } = require("./subprocess-env.js");
...
-function buildRunnerEnv(
-  extraEnv: NodeJS.ProcessEnv | undefined,
-  inheritFullEnv = false,
-): NodeJS.ProcessEnv {
-  if (inheritFullEnv) {
-    return { ...process.env, ...extraEnv };
-  }
-
-  const normalizedExtraEnv: Record<string, string> = {};
-  for (const [key, value] of Object.entries(extraEnv || {})) {
-    if (value !== undefined) {
-      normalizedExtraEnv[key] = value;
-    }
-  }
-  return buildSubprocessEnv(normalizedExtraEnv);
-}
+const buildRunnerEnv = buildEnvForSubprocess;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/runner.ts` around lines 41 - 56, The buildRunnerEnv function
duplicates environment normalization logic; replace the local normalization and
the call to buildSubprocessEnv with the shared helper buildEnvForSubprocess to
avoid drift: import buildEnvForSubprocess, preserve existing inheritFullEnv
behavior (when true return { ...process.env, ...extraEnv }), and otherwise
return buildEnvForSubprocess(extraEnv); remove the manual normalization loop and
any unused buildSubprocessEnv references and update imports accordingly.
src/nemoclaw.ts (1)

239-261: Consider a shared quiet-capture helper for these new probe callsites.

The same runFile(..., { encoding, stdio, ignoreError, suppressOutput }) bundle is repeated in several places now. A tiny helper for “best-effort captured probe” calls would make future runner-policy changes less error-prone.

Also applies to: 1335-1411, 3112-3132, 3494-3576

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 239 - 261, Multiple callsites invoke
runFile(...) with the same "quiet capture" options object (encoding, stdio,
timeout, ignoreError, suppressOutput); extract that repeated bundle into a small
helper function (e.g., runCapturedProbe or runFileQuietCapture) that wraps
runFile(command, args, defaultQuietOptions) and returns the same result; replace
the direct runFile invocations (the one using `openshell-${sandboxName}` and the
other similar probes) to call the new helper so future runner/policy changes
only need to update the single helper; ensure the helper preserves all existing
option keys (encoding, stdio, timeout, ignoreError, suppressOutput) and accepts
overrides for timeout or encoding if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoclaw/src/lib/subprocess-env.ts`:
- Around line 67-82: When inheritFullEnv is false buildEnvForSubprocess
currently drops ALL_PROXY/all_proxy because buildSubprocessEnv's allowlist
excludes them; preserve and forward these proxy vars by ensuring
normalizedExtraEnv includes extraEnv["ALL_PROXY"] and extraEnv["all_proxy"] when
present (or add them to the allowlist inside buildSubprocessEnv) before calling
buildSubprocessEnv; update the logic around buildEnvForSubprocess,
normalizedExtraEnv, and buildSubprocessEnv so curl and other tools will receive
ALL_PROXY/all_proxy even when not inheriting the full process.env.

In `@src/lib/deploy.ts`:
- Around line 425-434: The code currently trusts ssh echo $HOME succeeded if
remoteHomeResult.status === 0 but doesn't verify stdout; ensure you validate the
captured remoteHome from readCommandOutput(remoteHomeResult, "stdout").trim() is
non-empty before using it to build remoteDir. If remoteHome is empty (or only
whitespace) call fail(...) with a clear message (same pattern as the existing
error path) instead of proceeding, so remoteDir = `${remoteHome}/nemoclaw` is
never constructed from an empty string; update the block around run(...),
readCommandOutput(...), remoteHomeResult, remoteHome and remoteDir to perform
this guard.

In `@src/lib/onboard.ts`:
- Around line 4712-4713: The probe currently treats any non-empty runCapture()
output as success; change the logic so hasOllama is set by the command's exit
status instead of truthiness of output. Replace the existing hasOllama
assignment that uses runCapture(["ollama","--version"], { ignoreError: true })
with a call that yields the child process exit code (or use run() if available)
and set hasOllama = (exitCode === 0); ensure you still pass ignoreError to avoid
throwing, and update any downstream logic (e.g., install-ollama gating) to rely
on hasOllama based on the exit status.
- Around line 2666-2675: The helper listGatewayDockerVolumes currently calls
runCapture(..., { ignoreError: true }) and then splits whatever text is
returned, which can include Docker stderr like "Cannot connect to the Docker
daemon"; update listGatewayDockerVolumes to treat errors explicitly: either call
runCapture without ignoreError (so it throws on failure) and catch the exception
to return an empty array, or use a runCapture variant that returns an exit code
and only parse stdout when exitCode === 0; additionally filter parsed names to
those matching the expected prefix `openshell-cluster-${GATEWAY_NAME}` (or a
strict volume-name regex) before returning to avoid feeding error text into
docker volume removal.

In `@src/lib/sandbox-state.ts`:
- Around line 656-660: The remote shell commands (constructed into fullCheckCmd,
tarCmd, rmCmd, and extractCmd and invoked via runStateCommand with sshArgs)
interpolate writableDir and existingDirs/localDirs directly, allowing
shell-expansion attacks; update each command construction to quote or escape
every remote operand using the project's shell-quote.ts or remote-script.ts
helper (e.g., wrap writableDir and each entry of existingDirs/localDirs with the
helper's quoting function or build the remote script with remote-script.ts so
arguments are passed as safe tokens) and ensure the resulting string passed to
runStateCommand/sshArgs contains only safely quoted tokens rather than raw
interpolations.
- Around line 45-50: runStateCommand currently forwards opts directly to
spawnResult causing child processes to inherit the full process.env; change
runStateCommand to call buildSubprocessEnv() (imported from subprocess-env.ts)
and ensure opts.env is set to that allowlisted env before calling spawnResult,
or require callers to supply opts.env and throw if absent. Also audit the use of
existingDirs where it’s interpolated into tarCmd and switch to a safe
construction (e.g., escape each dirname or pass paths as argv rather than a
shell string) to avoid shell injection when building tar/ssh commands. Ensure
references: runStateCommand, buildSubprocessEnv, spawnResult, existingDirs,
tarCmd.

In `@src/lib/subprocess-env.ts`:
- Around line 67-82: The PROXY allowlist is missing the ALL_PROXY and all_proxy
variables causing buildEnvForSubprocess/buildSubprocessEnv to omit curl fallback
proxy settings; update the PROXY constant to include "ALL_PROXY" and "all_proxy"
so environments used by http-probe.ts (curl invocations) pass through those
variables, then run/update any tests referencing PROXY or subprocess env
behavior to cover both uppercase and lowercase all_proxy.

---

Nitpick comments:
In `@src/lib/runner.ts`:
- Around line 41-56: The buildRunnerEnv function duplicates environment
normalization logic; replace the local normalization and the call to
buildSubprocessEnv with the shared helper buildEnvForSubprocess to avoid drift:
import buildEnvForSubprocess, preserve existing inheritFullEnv behavior (when
true return { ...process.env, ...extraEnv }), and otherwise return
buildEnvForSubprocess(extraEnv); remove the manual normalization loop and any
unused buildSubprocessEnv references and update imports accordingly.

In `@src/nemoclaw.ts`:
- Around line 239-261: Multiple callsites invoke runFile(...) with the same
"quiet capture" options object (encoding, stdio, timeout, ignoreError,
suppressOutput); extract that repeated bundle into a small helper function
(e.g., runCapturedProbe or runFileQuietCapture) that wraps runFile(command,
args, defaultQuietOptions) and returns the same result; replace the direct
runFile invocations (the one using `openshell-${sandboxName}` and the other
similar probes) to call the new helper so future runner/policy changes only need
to update the single helper; ensure the helper preserves all existing option
keys (encoding, stdio, timeout, ignoreError, suppressOutput) and accepts
overrides for timeout or encoding if needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c267cb73-3a4a-493c-9680-7bc4efd7a298

📥 Commits

Reviewing files that changed from the base of the PR and between 2971418 and e2085be.

📒 Files selected for processing (20)
  • nemoclaw/src/lib/subprocess-env.ts
  • nemoclaw/src/register.test.ts
  • src/lib/config-io.ts
  • src/lib/debug.ts
  • src/lib/deploy.ts
  • src/lib/http-probe.ts
  • src/lib/onboard.ts
  • src/lib/openshell.ts
  • src/lib/preflight.test.ts
  • src/lib/preflight.ts
  • src/lib/remote-script.ts
  • src/lib/runner.ts
  • src/lib/sandbox-create-stream.test.ts
  • src/lib/sandbox-create-stream.ts
  • src/lib/sandbox-state.ts
  • src/lib/shields.ts
  • src/lib/subprocess-env.ts
  • src/nemoclaw.ts
  • test/onboard-selection.test.ts
  • test/runner.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lib/sandbox-create-stream.ts
  • nemoclaw/src/register.test.ts

Comment thread nemoclaw/src/lib/subprocess-env.ts
Comment thread src/lib/deploy.ts
Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/sandbox-state.ts Outdated
Comment thread src/lib/sandbox-state.ts
Comment thread src/lib/subprocess-env.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/sandbox-state.ts (1)

796-800: ⚠️ Potential issue | 🔴 Critical

Reject unsafe manifest paths before using localDirs.

localDirs comes straight from rebuild-manifest.json, and the current existsSync(path.join(backupPath, d)) filter at Line 776 still accepts values like "", ".", absolute paths, and ../... escapes. Feeding those into tar -C here can read host files outside the backup directory, and the same list later drives remote rm -rf and extraction. Please validate that every entry is non-empty and stays strictly within backupPath before using it.

🛡️ Suggested guard
   const localDirs = manifest.stateDirs.filter((d) => existsSync(path.join(backupPath, d)));
+  const unsafeLocalDirs = localDirs.filter((d) => {
+    const resolved = path.resolve(backupPath, d);
+    return d.length === 0 || resolved === backupPath || path.isAbsolute(d) || !isWithinRoot(resolved, backupPath);
+  });
+  if (unsafeLocalDirs.length > 0) {
+    _log(`FAILED: backup manifest contains unsafe state dirs: [${unsafeLocalDirs.join(",")}]`);
+    return { success: false, restoredDirs, failedDirs: [...localDirs] };
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/sandbox-state.ts` around lines 796 - 800, localDirs from
rebuild-manifest.json must be strictly validated before being passed to
runStateCommand("tar", ...) or later removal/extraction; replace the shallow
existsSync filter with checks that each entry is non-empty, not "." or "..", not
an absolute path (use path.isAbsolute), and that path.resolve(backupPath, entry)
startsWith the resolved backupPath plus path.sep (to prevent ../ escapes), only
then include it in localDirs; apply the same validated list wherever localDirs
is used (e.g., tar invocation, rm -rf, extraction) and log/reject any invalid
entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 4714-4720: The code currently shells out via "sh -lc" to detect
ollama, which fails on Windows; update the probe to invoke the ollama binary
directly using the runCapture helper: call runCapture(["ollama", "--version"], {
ignoreError: true }) (or another innocuous CLI flag) and set hasOllama based on
the non-empty/ successful result (e.g., const hasOllama = (await
runCapture(...)) !== ""); replace the existing runCapture(["sh","-lc",...])
usage so detection no longer depends on a POSIX shell.

In `@src/lib/sandbox-state.ts`:
- Around line 829-854: Inside the sshResult.status === 0 block, if the chown
step (using resolveOpenshell() + runStateCommand(... "chown" ...)) fails, remove
the freshly extracted directories (the same paths in localDirs under
writableDir) before pushing them into failedDirs so the sandbox is not left in a
half-restored state; implement this by invoking runStateCommand (or equivalent
cleanup logic) to rm -rf each extracted path when ownershipOk becomes false,
then add those paths to failedDirs instead of leaving them on disk.

---

Outside diff comments:
In `@src/lib/sandbox-state.ts`:
- Around line 796-800: localDirs from rebuild-manifest.json must be strictly
validated before being passed to runStateCommand("tar", ...) or later
removal/extraction; replace the shallow existsSync filter with checks that each
entry is non-empty, not "." or "..", not an absolute path (use path.isAbsolute),
and that path.resolve(backupPath, entry) startsWith the resolved backupPath plus
path.sep (to prevent ../ escapes), only then include it in localDirs; apply the
same validated list wherever localDirs is used (e.g., tar invocation, rm -rf,
extraction) and log/reject any invalid entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ea67e27-3f3e-48e0-a054-9abc4cf2649f

📥 Commits

Reviewing files that changed from the base of the PR and between 9a48c2f and 7301ada.

📒 Files selected for processing (6)
  • nemoclaw/src/lib/subprocess-env.ts
  • src/lib/deploy.ts
  • src/lib/http-probe.test.ts
  • src/lib/onboard.ts
  • src/lib/sandbox-state.ts
  • src/lib/subprocess-env.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/deploy.ts

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/sandbox-state.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/lib/sandbox-state.ts (2)

800-825: ⚠️ Potential issue | 🟠 Major

Validate manifest paths before using them in tar and rm.

formatShellToken() only protects the shell surface. It does not stop "", ".", or "../foo". Here, localDirs comes from rebuild-manifest.json, so a tampered snapshot can make Line 800 archive files outside backupPath and Line 812 remove paths outside writableDir.

🔒 Proposed fix
+    const invalidLocalDirs = localDirs.filter((dir) => {
+      const normalized = path.normalize(dir);
+      return (
+        !dir ||
+        normalized === "." ||
+        normalized === ".." ||
+        path.isAbsolute(dir) ||
+        !isWithinRoot(path.resolve(backupPath, dir), backupPath)
+      );
+    });
+    if (invalidLocalDirs.length > 0) {
+      _log(`FAILED: invalid manifest paths: [${invalidLocalDirs.join(",")}]`);
+      return { success: false, restoredDirs, failedDirs: [...localDirs] };
+    }
+
     const tarResult = runStateCommand("tar", ["-cf", "-", "-C", backupPath, ...localDirs], {
       stdio: ["ignore", "pipe", "pipe"],
       timeout: 60000,
       maxBuffer: 256 * 1024 * 1024,
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/sandbox-state.ts` around lines 800 - 825, LocalDirs from
rebuild-manifest.json is untrusted and formatShellToken() doesn't prevent
entries like "" , "." , ".." , "../foo" or absolute paths; before using
localDirs in the tar invocation or in buildRemoveDirsCommand (and the subsequent
ssh rm/tar commands invoked via runStateCommand), validate and/or sanitize each
entry: reject empty strings, reject absolute paths (leading "/" on POSIX or
drive letters on Windows), normalize with path.posix.normalize and ensure the
normalized path does not start with ".." or contain "/../", and only allow
relative paths that refer inside the intended directory; if any entry fails
validation, either filter it out and record it as failedDirs or abort the
restore with an error. Ensure the same validation is applied both where
localDirs is used to build the tar arguments (the tar call around
runStateCommand("tar", [...localDirs])) and where
buildRemoveDirsCommand(writableDir, localDirs) is created/used so rmResult
cannot remove outside writableDir.

394-403: ⚠️ Potential issue | 🟠 Major

Clean targetDir when extraction fails.

tar -xf can leave partially extracted files behind before returning non-zero. From there, backupSandboxState() still sanitizes the directory and writes a manifest, so a corrupted snapshot can be persisted and later restored. Line 400 should do the same best-effort cleanup you already do for the post-extraction symlink audit.

🧹 Proposed fix
   if (extractResult.status !== 0) {
+    try {
+      rmSync(targetDir, { recursive: true, force: true });
+      mkdirSync(targetDir, { recursive: true, mode: 0o700 });
+    } catch {
+      /* best effort cleanup */
+    }
     return {
       success: false,
-      error: `tar extraction failed (exit ${extractResult.status}): ${(extractResult.stderr?.toString() || "").substring(0, 200)}`,
+      error: `tar extraction failed (exit ${extractResult.status}): ${resultStderrText(extractResult).substring(0, 200)}`,
     };
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/sandbox-state.ts` around lines 394 - 403, When tar extraction fails
(check extractResult from runStateCommand), perform the same best-effort cleanup
you do after extraction: remove any partially-extracted files under targetDir
(e.g., fs.rmSync(targetDir, { recursive: true, force: true }) or equivalent
recursive unlink/readdir logic), recreate or ensure an empty targetDir, and
swallow/log any errors (do not throw) before returning the failure object so
backupSandboxState() cannot persist a corrupted snapshot; apply this cleanup in
the block that handles extractResult.status !== 0 and mirror the error-handling
pattern used by the post-extraction symlink-audit code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/deploy.ts`:
- Around line 329-337: The current logic treats any non-zero result from running
brev ls as "instance does not exist", which can misclassify auth/network
failures; update the handling of brevLsResult (returned from run([...]) when
calling brev ls) to first check brevLsResult.status and if it indicates failure
(non-zero) fail fast — e.g., surface/throw an error or return the failure
upstream with a clear message — instead of inspecting stdout/stderr and setting
exists; keep using readCommandOutput and outputHasExactLine for the success path
(when brevLsResult.status === 0) to set exists for the given name.

In `@src/lib/onboard.ts`:
- Around line 5261-5267: The WSL branch currently passes undefined to
startDetachedOllamaServe which lets ollama serve pick its default port and
ignores the configured OLLAMA_PORT; change the call to always pass an explicit
host:port string built from OLLAMA_PORT — e.g. use
startDetachedOllamaServe(isWsl() ? `127.0.0.1:${OLLAMA_PORT}` :
`0.0.0.0:${OLLAMA_PORT}`) so WSL binds to 127.0.0.1 with the configured port;
make the same change for the other occurrence around startDetachedOllamaServe at
the later block (the one referenced in the comment).
- Around line 2275-2278: startDetachedOllamaServe replaces the child env with a
filtered one that only sets OLLAMA_HOST, dropping any existing OLLAMA_*
settings; fix by building the child env from the parent process.env and then
overriding/injecting OLLAMA_HOST when provided. Update startDetachedOllamaServe
to create an env object based on process.env (or a shallow copy) and then set
env.OLLAMA_HOST = hostBinding when hostBinding is defined, and pass that env
into spawnDetachedProcess (reference startDetachedOllamaServe and
spawnDetachedProcess).
- Around line 2428-2435: The subprocess call to runFile("ollama", ["pull",
model]) no longer inherits OLLAMA_HOST due to env scrubbing; modify the call in
src/lib/onboard.ts (the runFile invocation) to pass an env object that forwards
the Ollama connection variable (e.g. env: { ...process.env, OLLAMA_HOST:
process.env.OLLAMA_HOST }) or at minimum include OLLAMA_HOST in the options.env
so the subprocess uses the configured host/port instead of falling back to
127.0.0.1:11434.

In `@src/lib/sandbox-state.ts`:
- Around line 665-670: The pipeline that builds fullCheckCmd currently masks
upstream failures because only the last stage (awk) determines the exit code;
update fullCheckCmd to enable pipefail so remote errors are propagated—for
example, wrap the combined script with bash -o pipefail -c '... | awk ...' or
prepend "set -o pipefail" in the script string so failures in workspaceGlobCmd
or existCheckCmd surface; adjust the string assigned to fullCheckCmd
(referencing workspaceGlobCmd and existCheckCmd) and ensure runStateCommand
invocation still executes that updated fullCheckCmd so existResult.status
reflects any remote probe failures.

---

Outside diff comments:
In `@src/lib/sandbox-state.ts`:
- Around line 800-825: LocalDirs from rebuild-manifest.json is untrusted and
formatShellToken() doesn't prevent entries like "" , "." , ".." , "../foo" or
absolute paths; before using localDirs in the tar invocation or in
buildRemoveDirsCommand (and the subsequent ssh rm/tar commands invoked via
runStateCommand), validate and/or sanitize each entry: reject empty strings,
reject absolute paths (leading "/" on POSIX or drive letters on Windows),
normalize with path.posix.normalize and ensure the normalized path does not
start with ".." or contain "/../", and only allow relative paths that refer
inside the intended directory; if any entry fails validation, either filter it
out and record it as failedDirs or abort the restore with an error. Ensure the
same validation is applied both where localDirs is used to build the tar
arguments (the tar call around runStateCommand("tar", [...localDirs])) and where
buildRemoveDirsCommand(writableDir, localDirs) is created/used so rmResult
cannot remove outside writableDir.
- Around line 394-403: When tar extraction fails (check extractResult from
runStateCommand), perform the same best-effort cleanup you do after extraction:
remove any partially-extracted files under targetDir (e.g., fs.rmSync(targetDir,
{ recursive: true, force: true }) or equivalent recursive unlink/readdir logic),
recreate or ensure an empty targetDir, and swallow/log any errors (do not throw)
before returning the failure object so backupSandboxState() cannot persist a
corrupted snapshot; apply this cleanup in the block that handles
extractResult.status !== 0 and mirror the error-handling pattern used by the
post-extraction symlink-audit code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b66ab4b-9bbf-4dfd-a74f-acc0c6e32b02

📥 Commits

Reviewing files that changed from the base of the PR and between 9a48c2f and ce39629.

📒 Files selected for processing (6)
  • nemoclaw/src/lib/subprocess-env.ts
  • src/lib/deploy.ts
  • src/lib/http-probe.test.ts
  • src/lib/onboard.ts
  • src/lib/sandbox-state.ts
  • src/lib/subprocess-env.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lib/subprocess-env.ts
  • nemoclaw/src/lib/subprocess-env.ts

Comment thread src/lib/deploy.ts
Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts
Comment thread src/lib/sandbox-state.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/lib/onboard.ts (1)

4734-4737: ⚠️ Potential issue | 🟠 Major

Check the Ollama probe by exit status, not captured text.

Line 4737 regresses to treating any non-empty runCapture() output as “installed”. With ignoreError: true, failures can still produce text, so this can incorrectly surface the local Ollama option and route users into a broken path.

Proposed fix
-  const hasOllama = runCapture(["ollama", "--version"], { ignoreError: true }) !== "";
+  const hasOllama =
+    runFile("ollama", ["--version"], {
+      ignoreError: true,
+      stdio: ["ignore", "ignore", "ignore"],
+      suppressOutput: true,
+    }).status === 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 4734 - 4737, The current probe sets
hasOllama by testing runCapture(...) !== "" which can misreport success when
ignoreError: true still yields text; change to check the process exit status
instead. Call runCapture(["ollama","--version"], { ignoreError: true }) into a
result variable, then set hasOllama = result.exitCode === 0 (or result.success
=== true depending on the runCapture shape); only fall back to inspecting
result.stdout.trim() if an exitCode/success field is not available. Update the
code around the runCapture invocation and the hasOllama assignment to use the
exit-status check rather than non-empty output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2303-2319: The curl download and installer execution use run(...)
without a timeout, so both download and install can hang; update the two run
invocations that produce the download and install variables (the run call that
writes installerPath and the run call that runs "sh", installerPath) to include
a reasonable timeout option (e.g., a few minutes) passed in the run options, and
treat timeout results as failures by checking download.error or download.status
and install.error or install.status the same as other errors so the existing
cleanup/error path runs; ensure the timeout value is configurable or documented
and that any timeout error message includes context like "Timed out while
downloading Ollama installer" or "Timed out while running Ollama installer".

In `@src/lib/sandbox-state.ts`:
- Around line 73-75: Reject or sanitize manifest entries before using them in
shell commands: ensure each entry from localDirs (read from
rebuild-manifest.json) is a non-empty relative path (not starting with "/" and
not containing ".." or empty path segments), resolve it against the intended
root and verify the resolved path is inside backupPath and writableDir as
applicable; implement a helper like validateManifestPath(path, root) and call it
before building commands in buildRemoveDirsCommand and wherever packageFiles/tar
operations are created (e.g., the code that assembles tar commands for
packageFiles and the places that use writableDir), and if validation fails,
reject the manifest or skip the entry and log an error.

---

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 4734-4737: The current probe sets hasOllama by testing
runCapture(...) !== "" which can misreport success when ignoreError: true still
yields text; change to check the process exit status instead. Call
runCapture(["ollama","--version"], { ignoreError: true }) into a result
variable, then set hasOllama = result.exitCode === 0 (or result.success === true
depending on the runCapture shape); only fall back to inspecting
result.stdout.trim() if an exitCode/success field is not available. Update the
code around the runCapture invocation and the hasOllama assignment to use the
exit-status check rather than non-empty output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 571da644-1490-41ec-8333-84f92e9a496f

📥 Commits

Reviewing files that changed from the base of the PR and between 9a48c2f and 7ea4eee.

📒 Files selected for processing (7)
  • nemoclaw/src/lib/subprocess-env.ts
  • src/lib/deploy.test.ts
  • src/lib/deploy.ts
  • src/lib/http-probe.test.ts
  • src/lib/onboard.ts
  • src/lib/sandbox-state.ts
  • src/lib/subprocess-env.ts

Comment thread src/lib/onboard.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/nemoclaw.ts (1)

240-262: ⚠️ Potential issue | 🟠 Major

Don’t disable SSH host verification in the sandbox command helper.

This helper still runs ssh with StrictHostKeyChecking=no and UserKnownHostsFile=/dev/null, so every health probe and auto-recovery command can execute against an unverified endpoint. That undoes the host-key pinning added later in this file and turns a stale/MITM target into arbitrary remote-command execution on the wrong host. Please reuse the same pinned known_hosts flow here instead of opting out of verification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 240 - 262, The SSH helper currently disables
host verification by passing "-o StrictHostKeyChecking=no" and "-o
UserKnownHostsFile=/dev/null" to runFile; change it to reuse the pinned
known_hosts flow used elsewhere in this module instead: remove the
StrictHostKeyChecking and /dev/null options, and add the same UserKnownHostsFile
and any related SSH options/known-hosts setup (the pinned known_hosts path and
flags used elsewhere in this file) so runFile("ssh", [...,
`openshell-${sandboxName}`, command], ...) uses the pinned known_hosts rather
than disabling verification; keep other options like ConnectTimeout and LogLevel
intact.
♻️ Duplicate comments (1)
src/lib/onboard.ts (1)

4760-4763: ⚠️ Potential issue | 🟠 Major

Use the probe exit status for hasOllama.

This regresses to the same false-positive pattern as before: with ignoreError: true, a failed probe can still return stderr text, making hasOllama truthy and hiding the install-ollama path.

Proposed fix
-  const hasOllama = runCapture(["ollama", "--version"], { ignoreError: true }) !== "";
+  const hasOllama =
+    runFile("ollama", ["--version"], {
+      ignoreError: true,
+      stdio: "ignore",
+      suppressOutput: true,
+    }).status === 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 4760 - 4763, The current probe sets
hasOllama by checking runCapture(..., { ignoreError: true }) !== "" which can be
truthy from stderr even on failure; change the probe to determine success from
the child process exit status instead of output: call runCapture (or the
equivalent runner) with ignoreError and inspect the returned
exit/exitCode/returnCode field (or use run() and catch errors) and set hasOllama
= (exitCode === 0); update the hasOllama assignment near the runCapture call so
the install-ollama path is taken when the probe fails.
🧹 Nitpick comments (2)
src/lib/remote-script.ts (1)

111-113: Consider whether login shell is always needed for docker exec.

Using sh -lc invokes a login shell, which sources profile scripts. This is appropriate for commands that need the full user environment, but may add overhead or unexpected behavior in some contexts.

If there are use cases where a plain sh -c would suffice, consider adding an optional login?: boolean parameter. However, this is a minor observation—the current behavior is safe and consistent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/remote-script.ts` around lines 111 - 113, The function
buildDockerExecScriptArgs currently always uses a login shell ("sh -lc"); make
it configurable by adding an optional parameter login?: boolean (default true)
so callers can request a plain shell when desired; update
buildDockerExecScriptArgs to choose either "sh", "-lc" (when login===true) or
"sh", "-c" (when login===false) and update any callers of
buildDockerExecScriptArgs to pass the flag where appropriate.
src/lib/skill-install.ts (1)

358-385: Consider deduplicating skill-existence probe logic.

checkExisting and verifyInstall now execute the same probe script and parse the same result. A tiny shared helper would reduce drift risk.

♻️ Optional refactor sketch
+function skillExists(ctx: SshContext, uploadDir: string): boolean {
+  const result = sshExec(
+    ctx,
+    buildShellCommand({
+      steps: [
+        { commandArgs: ["test", "-f", `${uploadDir}/SKILL.md`] },
+        { command: "echo EXISTS" },
+      ],
+    }),
+  );
+  return result !== null && result.stdout === "EXISTS";
+}
+
 export function checkExisting(ctx: SshContext, paths: SkillPaths): boolean {
-  const result = sshExec(
-    ctx,
-    buildShellCommand({
-      steps: [
-        { commandArgs: ["test", "-f", `${paths.uploadDir}/SKILL.md`] },
-        { command: "echo EXISTS" },
-      ],
-    }),
-  );
-  return result !== null && result.stdout === "EXISTS";
+  return skillExists(ctx, paths.uploadDir);
 }
 
 export function verifyInstall(ctx: SshContext, paths: SkillPaths): boolean {
-  const result = sshExec(
-    ctx,
-    buildShellCommand({
-      steps: [
-        { commandArgs: ["test", "-f", `${paths.uploadDir}/SKILL.md`] },
-        { command: "echo EXISTS" },
-      ],
-    }),
-  );
-  return result !== null && result.stdout === "EXISTS";
+  return skillExists(ctx, paths.uploadDir);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/skill-install.ts` around lines 358 - 385, Both checkExisting and
verifyInstall run identical SSH probe logic; extract the duplicate into a small
shared helper (e.g., a function like probeSkillExists or skillFileExists) that
accepts (ctx: SshContext, paths: SkillPaths) and returns the boolean, then have
checkExisting and verifyInstall call that helper; update references to use the
helper and keep the same sshExec/buildShellCommand steps and stdout comparison
(result !== null && result.stdout === "EXISTS") to preserve behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Line 2509: The detached warmup is not inheriting custom
OLLAMA_HOST/OLLAMA_PORT, so change startOllamaWarmup to forward the current
OLLAMA_HOST and OLLAMA_PORT into the detached process (via the env override
passed to runDetachedFile); locate the startOllamaWarmup implementation and
ensure it either accepts an env override or builds one from process.env and
passes that map into runDetachedFile(), so the warmup targets the same daemon as
ollama serve/pull.

---

Outside diff comments:
In `@src/nemoclaw.ts`:
- Around line 240-262: The SSH helper currently disables host verification by
passing "-o StrictHostKeyChecking=no" and "-o UserKnownHostsFile=/dev/null" to
runFile; change it to reuse the pinned known_hosts flow used elsewhere in this
module instead: remove the StrictHostKeyChecking and /dev/null options, and add
the same UserKnownHostsFile and any related SSH options/known-hosts setup (the
pinned known_hosts path and flags used elsewhere in this file) so runFile("ssh",
[..., `openshell-${sandboxName}`, command], ...) uses the pinned known_hosts
rather than disabling verification; keep other options like ConnectTimeout and
LogLevel intact.

---

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 4760-4763: The current probe sets hasOllama by checking
runCapture(..., { ignoreError: true }) !== "" which can be truthy from stderr
even on failure; change the probe to determine success from the child process
exit status instead of output: call runCapture (or the equivalent runner) with
ignoreError and inspect the returned exit/exitCode/returnCode field (or use
run() and catch errors) and set hasOllama = (exitCode === 0); update the
hasOllama assignment near the runCapture call so the install-ollama path is
taken when the probe fails.

---

Nitpick comments:
In `@src/lib/remote-script.ts`:
- Around line 111-113: The function buildDockerExecScriptArgs currently always
uses a login shell ("sh -lc"); make it configurable by adding an optional
parameter login?: boolean (default true) so callers can request a plain shell
when desired; update buildDockerExecScriptArgs to choose either "sh", "-lc"
(when login===true) or "sh", "-c" (when login===false) and update any callers of
buildDockerExecScriptArgs to pass the flag where appropriate.

In `@src/lib/skill-install.ts`:
- Around line 358-385: Both checkExisting and verifyInstall run identical SSH
probe logic; extract the duplicate into a small shared helper (e.g., a function
like probeSkillExists or skillFileExists) that accepts (ctx: SshContext, paths:
SkillPaths) and returns the boolean, then have checkExisting and verifyInstall
call that helper; update references to use the helper and keep the same
sshExec/buildShellCommand steps and stdout comparison (result !== null &&
result.stdout === "EXISTS") to preserve behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 07a0a4a1-3bcc-499f-81c0-49563c323c5f

📥 Commits

Reviewing files that changed from the base of the PR and between ef06bb4 and 2eed6ab.

📒 Files selected for processing (24)
  • agents/hermes/manifest.yaml
  • agents/openclaw/manifest.yaml
  • src/lib/agent-defs.test.ts
  • src/lib/agent-defs.ts
  • src/lib/agent-onboard.test.ts
  • src/lib/agent-runtime.test.ts
  • src/lib/agent-runtime.ts
  • src/lib/local-inference.test.ts
  • src/lib/local-inference.ts
  • src/lib/onboard.ts
  • src/lib/preflight.test.ts
  • src/lib/preflight.ts
  • src/lib/remote-script.test.ts
  • src/lib/remote-script.ts
  • src/lib/runner-argv.test.ts
  • src/lib/runner.ts
  • src/lib/sandbox-state.ts
  • src/lib/shell-quote.test.ts
  • src/lib/shell-quote.ts
  • src/lib/skill-install.ts
  • src/nemoclaw.ts
  • test/onboard-selection.test.ts
  • test/runner.test.ts
  • test/snapshot.test.ts
✅ Files skipped from review due to trivial changes (1)
  • src/lib/shell-quote.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/lib/preflight.test.ts
  • src/lib/local-inference.ts
  • src/lib/shell-quote.ts

Comment thread src/lib/onboard.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/lib/onboard.ts (2)

2509-2510: ⚠️ Potential issue | 🟠 Major

Forward the Ollama connection env into detached warmup.

These warmup calls still use startOllamaWarmup(model)'s default detached runner from src/lib/local-inference.ts:343-351, which does not inject OLLAMA_HOST. With the new env scrubbing, any custom OLLAMA_PORT now warms the wrong daemon.

Proposed fix
+function startDetachedOllamaWarmup(model: string): void {
+  startOllamaWarmup(model, "15m", (file, args) =>
+    spawnDetachedProcess(file, [...args], {
+      env: getOllamaProcessEnv({ OLLAMA_HOST: getOllamaClientHost() }),
+    }),
+  );
+}
+
 function prepareOllamaModel(
   model: string,
   installedModels: string[] = [],
 ): ValidationResult | { ok: false; message: string } {
@@
   }
 
   console.log(`  Loading Ollama model: ${model}`);
-  startOllamaWarmup(model);
+  startDetachedOllamaWarmup(model);
   return validateOllamaModel(model);
 }
@@
     ]);
     console.log(`  Priming Ollama model: ${model}`);
-    startOllamaWarmup(model);
+    startDetachedOllamaWarmup(model);
     const probe = validateOllamaModel(model);

Also applies to: 5706-5707

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 2509 - 2510, The detached warmup launched by
startOllamaWarmup is not receiving the OLLAMA_HOST env (due to env scrubbing),
so custom OLLAMA_PORTs target the wrong daemon; update the call sites (e.g.,
where you call startOllamaWarmup in src/lib/onboard.ts and the similar site at
the other location) to pass the current process environment (or at least
OLLAMA_HOST and OLLAMA_PORT) into the detached runner, and modify the detached
runner in src/lib/local-inference.ts (the default detached runner used by
startOllamaWarmup) to accept and inject those env vars into the child process so
the warmup uses the correct Ollama host:port.

4761-4764: ⚠️ Potential issue | 🟠 Major

Probe ollama by exit status, not captured text.

This still treats any non-empty runCapture() output as “installed”. On failure paths, captured stderr can make hasOllama truthy and hide the install-ollama option even when the binary is missing or broken.

Proposed fix
   // Detect local inference options.
-  // Direct argv probing avoids a shell dependency on Windows, while the
-  // ignore-error capture path collapses missing-binary failures to "".
-  const hasOllama = runCapture(["ollama", "--version"], { ignoreError: true }) !== "";
+  // Direct argv probing avoids a shell dependency on Windows.
+  const hasOllama =
+    runFile("ollama", ["--version"], {
+      ignoreError: true,
+      stdio: "ignore",
+      suppressOutput: true,
+    }).status === 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 4761 - 4764, The check for Ollama should use
the process exit status, not non-empty captured text: replace the current
hasOllama = runCapture(["ollama","--version"], { ignoreError: true }) !== ""
with a call that runs ["ollama","--version"] and inspects the child process exit
code (success === 0) so that stderr output doesn't make hasOllama truthy; locate
the runCapture invocation and change it to use the runner that returns an exit
status (or check a returned .exitCode/.status field) and set hasOllama =
(exitCode === 0).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 2509-2510: The detached warmup launched by startOllamaWarmup is
not receiving the OLLAMA_HOST env (due to env scrubbing), so custom OLLAMA_PORTs
target the wrong daemon; update the call sites (e.g., where you call
startOllamaWarmup in src/lib/onboard.ts and the similar site at the other
location) to pass the current process environment (or at least OLLAMA_HOST and
OLLAMA_PORT) into the detached runner, and modify the detached runner in
src/lib/local-inference.ts (the default detached runner used by
startOllamaWarmup) to accept and inject those env vars into the child process so
the warmup uses the correct Ollama host:port.
- Around line 4761-4764: The check for Ollama should use the process exit
status, not non-empty captured text: replace the current hasOllama =
runCapture(["ollama","--version"], { ignoreError: true }) !== "" with a call
that runs ["ollama","--version"] and inspects the child process exit code
(success === 0) so that stderr output doesn't make hasOllama truthy; locate the
runCapture invocation and change it to use the runner that returns an exit
status (or check a returned .exitCode/.status field) and set hasOllama =
(exitCode === 0).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: db225110-26aa-4c18-964e-680c067cc47a

📥 Commits

Reviewing files that changed from the base of the PR and between 2eed6ab and 346d03c.

📒 Files selected for processing (1)
  • src/lib/onboard.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/onboard.test.ts`:
- Line 2529: The mock command extraction currently only unwraps when args[1] is
["-lc", "<cmd>"], which misses runner wrappers that use ["-c", "<cmd>"]; update
the guard used where commands are pushed (the commands.push call that checks
Array.isArray(args[1]) && args[1][0] === "-lc") to accept either "-lc" or "-c"
(e.g., check args[1][0] === "-lc" || args[1][0] === "-c") and then extract the
inner command from args[1][1]; apply the same change to each repeated occurrence
mentioned (lines around the commands.push calls at the other locations).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 29e7c159-fd4c-4191-93af-4387a59cfda0

📥 Commits

Reviewing files that changed from the base of the PR and between 3376954 and 91c3a25.

📒 Files selected for processing (3)
  • test/cli.test.ts
  • test/onboard.test.ts
  • vitest.config.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/cli.test.ts

Comment thread test/onboard.test.ts Outdated
@jyaunches jyaunches self-assigned this Apr 26, 2026
@cv cv added the v0.0.28 Release target label Apr 27, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)

4231-4259: ⚠️ Potential issue | 🟠 Major

Gate the Ollama option off on native Windows, or make proxy lifecycle checks platform-independent.

The Ollama option is unconditionally offered when ollama --version succeeds (line ~4250), but the startup path for native Windows (else branch at line ~4780) calls startOllamaAuthProxy(), which uses Unix-only tools (ps, kill, lsof at lines 1682–1846). These tools are unavailable on stock Windows, causing startup to fail with process.exit(1).

The isWsl() guard correctly skips the proxy on WSL2, but it only protects the startup phase. The option itself is offered on native Windows without checking the platform first. Either add process.platform !== "win32" to the option-push condition (matching the pattern used for install options at lines 4273–4279), or refactor proxy lifecycle checks to use platform-neutral APIs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 4231 - 4259, The Ollama option is shown when
hasOllama or ollamaRunning is true but the startup path uses
startOllamaAuthProxy which relies on Unix-only tools and will fail on native
Windows; modify the condition that pushes the "ollama" option (the check that
inspects hasOllama/ollamaRunning) to also ensure the platform is not Windows
(e.g., require process.platform !== "win32" or reuse isWsl logic) so the option
isn't offered on native Windows, or alternatively refactor startOllamaAuthProxy
and its lifecycle checks to use platform-neutral APIs; update the branch that
constructs the options array (where options.push({ key: "ollama", ... }))
accordingly to avoid presenting an option you cannot start on Windows.
♻️ Duplicate comments (2)
src/lib/onboard.ts (1)

1949-1950: ⚠️ Potential issue | 🟠 Major

Forward OLLAMA_HOST into the detached warmup path too.

Line 1950 and Line 5177 still call startOllamaWarmup(model) with the default implementation. Per src/lib/local-inference.ts:343-351, that path delegates to runDetachedFile(...) without the getOllamaProcessEnv() override, so non-default OLLAMA_PORT / OLLAMA_HOST setups warm the wrong daemon even though ollama serve and ollama pull were already fixed.

Suggested fix
+function startDetachedOllamaWarmup(model: string): void {
+  startOllamaWarmup(model, "15m", (file, args) =>
+    spawnDetachedProcess(file, [...args], {
+      env: getOllamaProcessEnv({ OLLAMA_HOST: getOllamaClientHost() }),
+    }),
+  );
+}
+
 function prepareOllamaModel(
   model: string,
   installedModels: string[] = [],
 ): ValidationResult | { ok: false; message: string } {
@@
   console.log(`  Loading Ollama model: ${model}`);
-  startOllamaWarmup(model);
+  startDetachedOllamaWarmup(model);
   return validateOllamaModel(model);
 }
@@
     ]);
     console.log(`  Priming Ollama model: ${model}`);
-    startOllamaWarmup(model);
+    startDetachedOllamaWarmup(model);
     const probe = validateOllamaModel(model);

Also applies to: 5176-5177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 1949 - 1950, The warmup call
startOllamaWarmup(model) is being invoked without passing the Ollama env
override so detached warmup uses default host/port; modify calls to
startOllamaWarmup (the ones at the shown locations) to forward the environment
produced by getOllamaProcessEnv() (or otherwise pass OLLAMA_HOST/OLLAMA_PORT)
into the detached path so that runDetachedFile(...) receives the correct env;
update startOllamaWarmup signature/usage accordingly so local-inference.ts's
runDetachedFile gets the override when invoked from startOllamaWarmup.
test/onboard.test.ts (1)

2630-2630: ⚠️ Potential issue | 🟡 Minor

Handle both -c and -lc in spawn mock command extraction.

These mocks still only unwrap ["-lc", "<cmd>"]. ["-c", "<cmd>"] will be mis-parsed, making command assertions brittle.

Suggested patch pattern
- commands.push({ command: _n(Array.isArray(args[1]) && args[1][0] === "-lc" ? args[1][1] : args[1]), env: args[2]?.env || null });
+ const spawnArg = args[1];
+ const extracted =
+   Array.isArray(spawnArg) && (spawnArg[0] === "-lc" || spawnArg[0] === "-c")
+     ? spawnArg[1]
+     : spawnArg;
+ commands.push({ command: _n(extracted), env: args[2]?.env || null });

Also applies to: 2738-2738, 2832-2832, 2961-2961, 3402-3402, 3516-3516, 3773-3773, 3896-3896, 4383-4383, 4757-4757, 4887-4887, 5481-5481

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.ts` at line 2630, The spawn-mock extraction currently
assumes args[1] is ["-lc", "<cmd>"] and only unwraps that; change the
command-selection logic used in the commands.push call (the expression
_n(Array.isArray(args[1]) && args[1][0] === "-lc" ? args[1][1] : args[1])) to
detect both "-lc" and "-c" (e.g., check args[1][0] === "-lc" || args[1][0] ===
"-c" or use a set/regex) and return the inner command when either flag is
present while falling back to args[1] otherwise; apply the same fix to the other
identical commands.push/_n usages in this file so tests correctly parse both
"-c" and "-lc" forms and preserve the env extraction (args[2]?.env).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/credentials.ts`:
- Around line 12-15: Change the local require in src/lib/credentials.ts to drop
the compiled suffix so the module loads in source-mode: replace
require("./runner.js") with require("./runner") while keeping the existing
CommonJS comment and eslint-disable line; ensure the imported symbol runCapture
is still destructured from the require call (const { runCapture } =
require("./runner")).

In `@src/lib/preflight.ts`:
- Around line 153-159: The commandExists function ignores AssessHostOpts.env and
falls back to hasExecutable which uses process.env.PATH; update commandExists to
accept and pass an env object through to both the optional commandExistsImpl and
the hasExecutable call so the injected AssessHostOpts.env is used during
detection, and then propagate this env parameter to every commandExists(...)
invocation in this file (including checks for dockerInstalled, nodeInstalled,
systemctlAvailable, package-manager detection, and nvidia-smi) so assessHost()'s
supplied env is preserved end-to-end.

In `@src/nemoclaw.ts`:
- Around line 2240-2251: The ssh-keyscan invocation in the runFile call uses
only realHost but the later SSH connection honors a port from configFile, so
keyscan may target the wrong endpoint; update the runFile("ssh-keyscan", ...)
invocation (and any variables used to construct its args) to include the
effective port from configFile (e.g., use "host:port" or pass the -p flag as
appropriate) so the scanned endpoint matches the SSH connection, and keep
references to realHost, configFile, runFile, knownHostsFile and sandboxName when
making the change.

---

Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 4231-4259: The Ollama option is shown when hasOllama or
ollamaRunning is true but the startup path uses startOllamaAuthProxy which
relies on Unix-only tools and will fail on native Windows; modify the condition
that pushes the "ollama" option (the check that inspects
hasOllama/ollamaRunning) to also ensure the platform is not Windows (e.g.,
require process.platform !== "win32" or reuse isWsl logic) so the option isn't
offered on native Windows, or alternatively refactor startOllamaAuthProxy and
its lifecycle checks to use platform-neutral APIs; update the branch that
constructs the options array (where options.push({ key: "ollama", ... }))
accordingly to avoid presenting an option you cannot start on Windows.

---

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 1949-1950: The warmup call startOllamaWarmup(model) is being
invoked without passing the Ollama env override so detached warmup uses default
host/port; modify calls to startOllamaWarmup (the ones at the shown locations)
to forward the environment produced by getOllamaProcessEnv() (or otherwise pass
OLLAMA_HOST/OLLAMA_PORT) into the detached path so that runDetachedFile(...)
receives the correct env; update startOllamaWarmup signature/usage accordingly
so local-inference.ts's runDetachedFile gets the override when invoked from
startOllamaWarmup.

In `@test/onboard.test.ts`:
- Line 2630: The spawn-mock extraction currently assumes args[1] is ["-lc",
"<cmd>"] and only unwraps that; change the command-selection logic used in the
commands.push call (the expression _n(Array.isArray(args[1]) && args[1][0] ===
"-lc" ? args[1][1] : args[1])) to detect both "-lc" and "-c" (e.g., check
args[1][0] === "-lc" || args[1][0] === "-c" or use a set/regex) and return the
inner command when either flag is present while falling back to args[1]
otherwise; apply the same fix to the other identical commands.push/_n usages in
this file so tests correctly parse both "-c" and "-lc" forms and preserve the
env extraction (args[2]?.env).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1ca35b5f-a788-4321-8a2f-a4b1f6aa6dfe

📥 Commits

Reviewing files that changed from the base of the PR and between 91c3a25 and 5e76eef.

📒 Files selected for processing (8)
  • src/lib/credentials.ts
  • src/lib/onboard.ts
  • src/lib/preflight.test.ts
  • src/lib/preflight.ts
  • src/lib/sandbox-config.ts
  • src/nemoclaw.ts
  • test/onboard-selection.test.ts
  • test/onboard.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/sandbox-config.ts

Comment thread src/lib/credentials.ts
Comment thread src/lib/preflight.ts
Comment thread src/nemoclaw.ts Outdated
@cv cv enabled auto-merge (squash) April 27, 2026 18:02
@jyaunches
Copy link
Copy Markdown
Contributor

🔴 Blocker 1: Ollama proxy re-inlining reverses prior extraction (src/lib/onboard.ts:1616)

PR #2495 (merged) extracted the Ollama auth proxy code into onboard-ollama-proxy.ts as part of the onboard.ts decomposition effort. This PR re-inlines all ~326 lines of that code back into onboard.ts, removing the require("./onboard-ollama-proxy") import and leaving onboard-ollama-proxy.ts as dead code (zero imports remain).

The re-inlined functions include: ensureOllamaAuthProxy, getOllamaProxyToken, persistProxyToken, startOllamaAuthProxy, promptOllamaModel, printOllamaExposureWarning, pullOllamaModel, prepareOllamaModel, plus new helpers like spawnOllamaAuthProxy, killStaleProxy, installOllamaViaOfficialScript, etc.

This is the single largest contributor to onboard.ts growing from 7,053 → 7,480 lines (+427 net).

Suggested fix: Update onboard-ollama-proxy.ts to use spawnChild from process-primitives.ts and buildSubprocessEnv from subprocess-env.ts instead of re-inlining. Keep the require("./onboard-ollama-proxy") import in onboard.ts. This would bring onboard.ts back to ~7,154 lines (net +101, acceptable for argv conversions).

Reference: Issue #2306 (onboard decomposition).

@jyaunches
Copy link
Copy Markdown
Contributor

🔴 Blocker 2: Duplicated env builder (src/lib/runner.ts:47 vs src/lib/subprocess-env.ts:84)

runner.ts defines a private buildRunnerEnv() that is functionally identical to the exported buildEnvForSubprocess() in subprocess-env.ts. Both do the same thing: check inheritFullEnv, normalize extra env entries (strip undefined values), call buildSubprocessEnv(). The only difference is || vs ??.

Other modules (openshell.ts, http-probe.ts, sandbox-state.ts) already import buildEnvForSubprocess from subprocess-env.ts. Having a private duplicate in runner.ts means future allowlist changes need to be applied in two places.

Suggested fix: Replace the private buildRunnerEnv in runner.ts with an import of buildEnvForSubprocess from subprocess-env.ts:

const { buildEnvForSubprocess } = require("./subprocess-env.js");
// Then replace all buildRunnerEnv calls with buildEnvForSubprocess

@jyaunches
Copy link
Copy Markdown
Contributor

🔴 Blocker 3: Duplicated listGatewayDockerVolumes (onboard.ts:2133 and nemoclaw.ts:174)

Two independent implementations of the same Docker volume listing logic exist in both files. They differ slightly (onboard.ts uses run(), nemoclaw.ts uses _runCapture()), but share the same filter logic (query docker volume ls -q --filter, split lines, filter by gateway name prefix).

This is a maintenance hazard — if the volume naming scheme changes, both need updating.

Suggested fix: Extract to a shared module (e.g., src/lib/gateway-volumes.ts) that both files import. Alternatively, add it to an existing shared module like runner.ts or a gateway-specific helper. The removeGatewayDockerVolumes helper in onboard.ts could also live in the shared module.

@jyaunches
Copy link
Copy Markdown
Contributor

🟡 Warning 1: Overly complex remediation string assembly (src/lib/config-io.ts:50–60)

The error remediation messages now use nested layers of buildShellCommand({ command: ... }) wrapping template literals that contain joinShellWords(...) and buildShellAssignment(...). For example:

`    ${buildShellCommand({ command: \`${joinShellWords(["mkdir", "-p", recoveryHome])} && ${buildShellAssignment("HOME", recoveryHome)} nemoclaw onboard\` })}`

Since buildShellCommand({ command }) just returns the command string unchanged (it's a passthrough), the outer wrapper adds complexity with no safety benefit. These are human-readable help messages printed to the terminal, not executed commands.

Suggestion: Simplify back to template strings with shellQuote() for the dynamic path values. The safety-through-structure goal of buildShellCommand is better served in code that actually executes commands, not display text.

@jyaunches
Copy link
Copy Markdown
Contributor

🟡 Warning 2: IIFE inside ternary in sandbox-create-stream.ts:68

The legacy string-command branch uses an immediately-invoked function expression inside a ternary:

const child: StreamableChildProcess = Array.isArray(command)
    ? spawnImpl(command[0], [...command.slice(1)], { ... })
    : (() => {
        const shellCommand = String(command);
        return spawnImpl("bash", ["-lc", shellCommand], { ... });
      })();

A simple if/else would be clearer:

let child: StreamableChildProcess;
if (Array.isArray(command)) {
  child = spawnImpl(command[0], command.slice(1), { cwd: ROOT, env, stdio: ... });
} else {
  child = spawnImpl("bash", ["-lc", String(command)], { cwd: ROOT, env, stdio: ... });
}

@jyaunches
Copy link
Copy Markdown
Contributor

🟡 Warning 3: Plugin switches to execa while CLI centralizes on process-primitives (nemoclaw/src/index.ts:14)

The plugin now uses execaSync from execa for the openshell inference get --json probe, while the rest of the CLI converges on spawnResult from process-primitives.ts. This creates an inconsistency — the PR's goal is to centralize subprocess execution, but the plugin takes a different path.

Both nemoclaw/src/lib/subprocess-env.ts and src/lib/subprocess-env.ts get the same buildEnvForSubprocess addition, suggesting they should share patterns.

Suggestion: Consider using spawnResult here too for consistency. If execa is preferred in the plugin for its promise API or other features, a brief comment explaining the choice would help future readers.

@jyaunches
Copy link
Copy Markdown
Contributor

🟡 Warning 4: Significant complexity increase in debug output capture (src/lib/debug.ts:116–215)

The new bounded-output capture adds ~100 lines of low-level fd/chunk management: readFirstLinesFromFile, readLastLinesFromFile, readBoundedOutput, and runCommandForTransform with manual openSync/readSync/closeSync at specific byte positions.

This is defensive against huge debug outputs (good!), but the first-lines + last-lines approach with CAPTURE_CHUNK_BYTES = 16KB chunks is complex for what is ultimately a "don't OOM on giant debug dumps" guard.

Suggestion: A brief comment explaining why this approach was chosen over simpler alternatives (e.g., head -n 400 && echo "..." && tail -n 400 in shell, or reading the full file and truncating in-memory up to 1MB) would help future readers. If the concern is that stdout/stderr can exceed 1MB for debug commands, that context is worth preserving.

@jyaunches
Copy link
Copy Markdown
Contributor

🟡 Warning 5: Re-introduced polymorphic dispatch in preflight.ts (src/lib/preflight.ts:119–135)

The new RunCaptureLike type accepts both string and readonly string[], and defaultRunCapture dispatches to runCapture (argv) or runCaptureShell (string) based on Array.isArray(). This re-introduces the same polymorphic pattern that this PR explicitly removes from runner.ts:

function defaultRunCapture(command: string | readonly string[], ...) {
  if (Array.isArray(command)) {
    return runCapture(command, ...);
  }
  return runCaptureShell(command, ...);
}

The callers that pass strings are: docker info, nvidia-smi -L, systemctl is-active docker, systemctl is-enabled docker.

Suggestion: Convert these remaining string callers to argv arrays (e.g., ["nvidia-smi", "-L"], ["systemctl", "is-active", "docker"]) and make RunCaptureLike accept only readonly string[]. The docker info --format one with 2>/dev/null would need runCaptureShell explicitly, but the rest are clean argv candidates.

@jyaunches
Copy link
Copy Markdown
Contributor

🔵 Suggestion 1: Deprecate gateway_command in agent manifests

The new gateway_argv field alongside the legacy gateway_command is a clean migration path (agents/hermes/manifest.yaml:20, agents/openclaw/manifest.yaml:23). The parseLegacyGatewayCommand fallback in agent-defs.ts is well-guarded (rejects non-simple tokens).

Consider adding a # deprecated — use gateway_argv comment in the manifest schema docs or a console warning when gateway_command is used without gateway_argv, to guide third-party agent authors toward the structured form.

@jyaunches
Copy link
Copy Markdown
Contributor

🔵 Suggestion 2: importRunnerFresh module cache busting (test/runner.test.ts:50)

The importRunnerFresh() function uses URL query params to bust the module cache:

async function importRunnerFresh() {
  return import(\`${pathToFileURL(runnerPath).href}?update=${Date.now()}-${Math.random()}\`);
}

This is fragile — it relies on the module loader treating query params as distinct module keys, which may not hold across all Node.js versions or bundlers. Consider using vi.resetModules() from Vitest instead, which is the idiomatic way to get fresh module instances in tests.

@jyaunches
Copy link
Copy Markdown
Contributor

🔵 Suggestion 3: Scoped test timeout (vitest.config.ts:15)

The test timeout bump from 5s → 15s applies to the entire cli test project. While pragmatic for slower CI hosts, a broad timeout increase can mask genuine hangs.

Consider scoping the timeout to the specific slow test files (e.g., test/runner.test.ts) via per-file vi.setConfig({ testTimeout: 15_000 }) or a separate vitest project for integration-heavy tests, keeping the default at 5s for unit-style tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.28 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants