Skip to content

fix(cli): stop in-sandbox messaging channels on nemoclaw stop#1977

Merged
brandonpelfrey merged 14 commits into
mainfrom
fix/1825-stop-messaging-2
May 1, 2026
Merged

fix(cli): stop in-sandbox messaging channels on nemoclaw stop#1977
brandonpelfrey merged 14 commits into
mainfrom
fix/1825-stop-messaging-2

Conversation

@brandonpelfrey
Copy link
Copy Markdown
Collaborator

@brandonpelfrey brandonpelfrey commented Apr 16, 2026

Summary

nemoclaw stop only killed the host-side cloudflared process but left the OpenClaw gateway (and its Telegram/Discord/Slack messaging channels) running inside the sandbox. After PR #1081 moved bridges into the sandbox, no stop mechanism was added. This PR adds stopSandboxChannels() which execs into the sandbox via openshell sandbox exec to SIGTERM the gateway process, stopping all messaging channel polling.

Related Issue

Fixes #1825

Changes

  • Add stopSandboxChannels() to src/lib/services.ts — resolves openshell, execs pkill -TERM -f "openclaw gateway" inside the sandbox, and handles all pkill exit codes (0 = stopped, 1 = already stopped, >1 = unreachable)
  • Update stopAll() to call stopSandboxChannels() before stopping host-side services; reads sandbox name from opts, NEMOCLAW_SANDBOX, or SANDBOX_NAME env vars; warns when no sandbox name is available
  • Import spawnSync from node:child_process and resolveOpenshell from ./resolve-openshell
  • Add src/lib/services-sandbox.test.ts with 9 tests covering: successful gateway stop, pkill exit 1 (no match), unreachable sandbox, null status (timeout), openshell not found, stopAll with sandbox name, stopAll without sandbox name, cloudflared cleanup on exec failure, and env var fallback

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

AI Disclosure

  • AI-assisted — tool: Claude Code

Signed-off-by: Brandon Pelfrey bpelfrey@nvidia.com

Summary by CodeRabbit

  • New Features

    • Introduced sandbox-aware shutdown flow: attempts in-sandbox gateway cleanup when a validated sandbox name is available, prefers validated env-provided sandbox names (with correct precedence), and logs clearer success/warning outcomes for varied exit statuses and missing in-sandbox tooling.
  • Tests

    • Added tests for sandbox shutdown paths, missing/failed in-sandbox tooling, env-var vs registry name selection and validation (rejecting unsafe names), and ensuring host-side cleanup still runs.

`nemoclaw stop` only killed the host-side `cloudflared` process but left
the OpenClaw gateway (and its Telegram/Discord/Slack messaging channels)
running inside the sandbox. Bots continued responding to messages after
stop.

After PR #1081 moved messaging bridges into the sandbox, no stop
mechanism was added to replace the old host-side process kill.

Add `stopSandboxChannels()` to `services.ts` that execs into the
sandbox via `openshell sandbox exec <name> -- pkill -TERM -f "openclaw
gateway"`. The gateway's own signal handler (`cleanup()` in
nemoclaw-start.sh) forwards SIGTERM to child processes, which stops all
messaging channel polling.

`stopAll()` now calls `stopSandboxChannels()` before stopping host-side
services. When the sandbox name is unavailable or openshell is not
installed, it warns and continues — host-side cleanup still proceeds.

- [x] `npx vitest run --project cli src/lib/services.test.ts` — 14 existing tests pass
- [x] `npx vitest run --project cli src/lib/services-sandbox.test.ts` — 9 new tests pass
- [x] `npx vitest run --project cli src/lib/services-command.test.ts` — 4 tests pass
- [x] `npm test` — full suite (1635 tests) passes
- [x] `npm run typecheck:cli` — clean
- [ ] Manual: `nemoclaw stop` → Telegram/Discord bots stop responding

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
@brandonpelfrey brandonpelfrey self-assigned this Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an in‑sandbox shutdown for OpenClaw channels via a new exported stopSandboxChannels(sandboxName) and updates stopAll to derive/validate a sandbox name (opts/env) and call it before stopping host services; includes tests covering openshell resolution, spawnSync outcomes, env precedence, and invalid names.

Changes

Cohort / File(s) Summary
Sandbox channel logic
src/lib/services.ts
Adds exported stopSandboxChannels(sandboxName: string) that resolves OpenShell and runs openshell sandbox exec --name <sandbox> -- pkill -TERM -f "openclaw[- ]gateway" (15s timeout); treats pkill exit codes 0 and 1 as success, warns on other failures; updates stopAll to derive/validate sandbox name from opts or env (NEMOCLAW_SANDBOX_NAME, NEMOCLAW_SANDBOX, SANDBOX_NAME) and call the new function before host-side cleanup.
Sandbox-related tests
src/lib/services-sandbox.test.ts
New Vitest suite (targets compiled dist/lib/services) that runtime-overrides resolve-openshell and spies on node:child_process.spawnSync from same Node instance; asserts exact openshell invocation shape (--name <sandbox> and pkill), handling of exit statuses (0,1,>1,null), behavior when openshell is missing, and ensures stopAll PID cleanup even on sandbox exec failure.
Sandbox name resolution tests/logic
src/lib/services-command.ts, src/lib/services-command.test.ts
resolveDefaultSandboxName now prioritizes env vars (NEMOCLAW_SANDBOX_NAME, then NEMOCLAW_SANDBOX, then SANDBOX_NAME) over registry default and validates with SAFE_SANDBOX_RE; tests save/restore env and assert correct precedence and rejection of unsafe names.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant stopAll as stopAll()
    participant Resolver as resolveOpenshell()
    participant stopSandbox as stopSandboxChannels()
    participant Spawn as spawnSync()
    participant Sandbox as Sandbox/OpenClaw
    participant Host as Host services (cloudflared)

    User->>stopAll: run "nemoclaw stop"
    stopAll->>stopAll: derive sandboxName from opts/env
    alt sandboxName present
        stopAll->>stopSandbox: stopSandboxChannels(sandboxName)
        stopSandbox->>Resolver: resolveOpenshell()
        Resolver-->>stopSandbox: openshell path or null
        alt openshell found
            stopSandbox->>Spawn: spawnSync(openshell, ["sandbox","exec","--name", sandboxName, "--","pkill","-TERM","-f","openclaw[- ]gateway"], {timeout:15000})
            Spawn->>Sandbox: execute pkill inside sandbox
            Sandbox-->>Spawn: exit status (0/1/>1/null)
            Spawn-->>stopSandbox: result
            stopSandbox->>stopSandbox: interpret status (0/1 success, else warn)
        else openshell missing
            stopSandbox-->>stopAll: log "openshell not found"
        end
    else no sandboxName
        stopAll-->>stopAll: log "No sandbox name available"
    end
    stopAll->>Host: stopService(..., "cloudflared")
    Host-->>stopAll: stopped
    stopAll-->>User: "All services stopped"
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I tiptoe in with gentle paws,
I call the shell and hush the draws,
A pkill whisper, gateway sleeps,
Logs blink soft while cloudflared keeps,
The rabbit hops — the sandbox sleeps.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(cli): stop in-sandbox messaging channels on nemoclaw stop' directly and clearly describes the main change: adding functionality to stop messaging channels running inside the sandbox when nemoclaw stop is executed.
Linked Issues check ✅ Passed The PR implements the core objective from issue #1825 by introducing stopSandboxChannels() function and integrating it into stopAll(), which directly addresses the requirement to stop in-sandbox messaging bridges when nemoclaw stop is executed.
Out of Scope Changes check ✅ Passed All changes align with the issue objective: stopSandboxChannels() adds in-sandbox stop mechanism, stopAll() integration ensures bridges stop on nemoclaw stop, environment variable precedence adjustments support proper sandbox name resolution, and comprehensive tests validate the implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/1825-stop-messaging-2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lib/services-sandbox.test.ts (1)

182-203: Add a regression test for NEMOCLAW_SANDBOX_NAME fallback.

This suite currently validates NEMOCLAW_SANDBOX fallback only. Please add a sibling case asserting stopAll({ pidDir }) uses process.env.NEMOCLAW_SANDBOX_NAME when opts are empty, so the env compatibility path stays protected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/services-sandbox.test.ts` around lines 182 - 203, Add a sibling test
in src/lib/services-sandbox.test.ts that mirrors the existing "reads sandbox
name from NEMOCLAW_SANDBOX env when not in opts" case but sets
process.env.NEMOCLAW_SANDBOX_NAME = "env-sandbox-name" (saving and restoring the
original value), calls stopAll({ pidDir }), and asserts spawnSyncSpy was invoked
with an argument array containing "env-sandbox-name" (use expect.arrayContaining
and expect.any(Object) like the existing assertion); also mock/restore
console.log (logSpy) as in the original test to avoid noisy output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/services.ts`:
- Around line 288-295: The shutdown logic for determining sandboxName (variable
sandboxName in stopAll) is missing the environment variable
NEMOCLAW_SANDBOX_NAME used elsewhere; update the fallback chain for sandboxName
(currently opts.sandboxName ?? process.env.NEMOCLAW_SANDBOX ??
process.env.SANDBOX_NAME) to also check process.env.NEMOCLAW_SANDBOX_NAME so
stopSandboxChannels(sandboxName) runs when that env var was set; ensure you only
change the fallback order to include process.env.NEMOCLAW_SANDBOX_NAME and keep
existing warnings the same.

---

Nitpick comments:
In `@src/lib/services-sandbox.test.ts`:
- Around line 182-203: Add a sibling test in src/lib/services-sandbox.test.ts
that mirrors the existing "reads sandbox name from NEMOCLAW_SANDBOX env when not
in opts" case but sets process.env.NEMOCLAW_SANDBOX_NAME = "env-sandbox-name"
(saving and restoring the original value), calls stopAll({ pidDir }), and
asserts spawnSyncSpy was invoked with an argument array containing
"env-sandbox-name" (use expect.arrayContaining and expect.any(Object) like the
existing assertion); also mock/restore console.log (logSpy) as in the original
test to avoid noisy output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a7f17d09-5ef6-4950-af19-a09f81b7fe4b

📥 Commits

Reviewing files that changed from the base of the PR and between c02d29e and 718321a.

📒 Files selected for processing (2)
  • src/lib/services-sandbox.test.ts
  • src/lib/services.ts

Comment thread src/lib/services.ts
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix — the problem is real and the approach is clean. A few things to address before merge:

Bug: Missing NEMOCLAW_SANDBOX_NAME in env var fallback

The new stopAll fallback chain checks NEMOCLAW_SANDBOX and SANDBOX_NAME, but the rest of the codebase (nemoclaw.ts:1746, onboard.ts:1617, deploy.ts:119,251) consistently sets NEMOCLAW_SANDBOX_NAME. This is likely the most common env var to be set in practice. Could you add it to the chain?

const sandboxName =
  opts.sandboxName ??
  process.env.NEMOCLAW_SANDBOX ??
  process.env.NEMOCLAW_SANDBOX_NAME ??
  process.env.SANDBOX_NAME;

(CodeRabbit flagged this too — agree with that finding.)

Minor: npx prek run --all-files

The verification checklist shows this unchecked — could you confirm it passes?

Everything else looks good — pkill exit code handling, the 15s timeout, ordering (sandbox stop before host-side cleanup), and the test coverage are all solid. 👍

brandonpelfrey and others added 2 commits April 16, 2026 11:18
… NEMOCLAW_SANDBOX_NAME fallback

Address review feedback on #1977:

1. Use `--name` flag for `openshell sandbox exec` instead of positional
   arg. The positional form is not supported by current openshell and
   caused exit 127.

2. Narrow pkill pattern from `"openclaw gateway"` to
   `"openclaw gateway run"` so decoy or unrelated sandbox processes
   (e.g. `openclaw gateway decoy`, `openclaw gateway status`) are not
   killed.

3. Add `NEMOCLAW_SANDBOX_NAME` to the env var fallback chain in
   `stopAll()`, between `NEMOCLAW_SANDBOX` and `SANDBOX_NAME`. This is
   the env var set by `nemoclaw.ts`, `onboard.ts`, and `deploy.ts`.

4. Add three regression tests: `--name` flag assertion, precise pkill
   pattern assertion, and `NEMOCLAW_SANDBOX_NAME` env var fallback.

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
@brandonpelfrey brandonpelfrey marked this pull request as draft April 16, 2026 18:55
@wscurran wscurran added bug Something isn't working NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels Apr 16, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 29, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…box name

Three fixes for the in-sandbox gateway stop failing to kill the real process:

1. pkill pattern: 'openclaw gateway run' → 'openclaw.gateway.run' regex
   so it matches both 'openclaw-gateway run' (the actual binary) and the
   space-separated form.

2. resolveDefaultSandboxName now checks NEMOCLAW_SANDBOX_NAME /
   NEMOCLAW_SANDBOX / SANDBOX_NAME env vars before falling back to the
   registry default. This ensures explicit env overrides are respected.

3. Log the sandbox name being targeted so mismatches are immediately
   visible in CLI output.
@wscurran wscurran added the VDR Linked to VDR finding label Apr 29, 2026
…inary

The gateway re-execs after startup, changing its argv from
'openclaw gateway run' to 'openclaw-gateway' (dropping the 'run'
subcommand). The previous pattern 'openclaw.gateway.run' never matched
the running process because 'run' was no longer in the cmdline.

Use 'openclaw[- ]gateway' which matches both:
- the launcher form: 'openclaw gateway run' (space-separated)
- the re-exec'd form: 'openclaw-gateway' (hyphenated, no 'run')

This is consistent with the proven pgrep/pkill patterns used in
test/e2e/test-issue-2478-crash-loop-recovery.sh.
@brandonpelfrey brandonpelfrey marked this pull request as ready for review April 29, 2026 21:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/services.ts`:
- Around line 296-307: In stopAll, compute the effective sandboxName first using
the same precedence as src/lib/services-command.ts (opts.sandboxName ??
process.env.NEMOCLAW_SANDBOX_NAME ?? process.env.NEMOCLAW_SANDBOX ??
process.env.SANDBOX_NAME), then set opts.sandboxName to that value (if not
already) and only after that call resolvePidDir(opts) so pidDir targets the same
sandbox; keep the stopSandboxChannels(sandboxName) call as-is to stop the
correct channels.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ed3cdccd-2a16-4bac-b4b5-2330f3d05aa7

📥 Commits

Reviewing files that changed from the base of the PR and between 510c5ba and 0205ca1.

📒 Files selected for processing (4)
  • src/lib/services-command.test.ts
  • src/lib/services-command.ts
  • src/lib/services-sandbox.test.ts
  • src/lib/services.ts
✅ Files skipped from review due to trivial changes (1)
  • src/lib/services-command.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/services-sandbox.test.ts

Comment thread src/lib/services.ts
stopAll() read NEMOCLAW_SANDBOX / NEMOCLAW_SANDBOX_NAME / SANDBOX_NAME
env vars and passed them directly to stopSandboxChannels without
SAFE_NAME_RE validation. Malformed names (spaces, path traversal)
could flow into 'openshell sandbox exec --name ...' causing
inconsistent or mis-targeted shutdown behavior.

Apply the same SAFE_NAME_RE check used by resolvePidDir/
resolveDefaultSandboxName so invalid names are rejected with a
warning instead of being passed to openshell.

Add regression tests for spaces and path-traversal in env var names.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/lib/services.ts (1)

297-317: ⚠️ Potential issue | 🟠 Major

Resolve effective sandbox before PID-dir resolution to avoid target drift.

At Line 297, pidDir is resolved before sandbox selection/validation, and at Line 303-305 env precedence differs from resolveDefaultSandboxName (NEMOCLAW_SANDBOX_NAME should win). This can make host cleanup and in-sandbox shutdown hit different sandboxes.

🔧 Suggested fix
 export function stopAll(opts: ServiceOptions = {}): void {
-  const pidDir = resolvePidDir(opts);
-  ensurePidDir(pidDir);
-
   // Stop the in-sandbox OpenClaw gateway (and its messaging channels).
   const rawSandboxName =
     opts.sandboxName ??
-    process.env.NEMOCLAW_SANDBOX ??
     process.env.NEMOCLAW_SANDBOX_NAME ??
+    process.env.NEMOCLAW_SANDBOX ??
     process.env.SANDBOX_NAME;
   const sandboxName =
     rawSandboxName && SAFE_NAME_RE.test(rawSandboxName) && !rawSandboxName.includes("..")
       ? rawSandboxName
       : undefined;
+  const pidDir = resolvePidDir({ ...opts, sandboxName });
+  ensurePidDir(pidDir);
   if (sandboxName) {
     stopSandboxChannels(sandboxName);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/services.ts` around lines 297 - 317, The pidDir is being resolved
before determining the effective sandbox which can cause host cleanup and
in-sandbox shutdown to target different sandboxes; move the sandbox
resolution/validation (the rawSandboxName -> sandboxName logic that uses
SAFE_NAME_RE and the '..' check and calls stopSandboxChannels) to run before
resolvePidDir/ensurePidDir (i.e., determine sandboxName first), and adjust the
environment-variable precedence to match resolveDefaultSandboxName so
NEMOCLAW_SANDBOX_NAME wins over NEMOCLAW_SANDBOX and SANDBOX_NAME; keep using
stopSandboxChannels(sandboxName) when valid and preserve the same warning
behavior when invalid or missing.
🧹 Nitpick comments (1)
src/lib/services-sandbox.test.ts (1)

234-262: Add a dual-env precedence test to lock intended sandbox resolution.

Current tests cover each env var in isolation, but not the case where both are set. Adding that case will catch regressions in precedence handling.

🧪 Suggested test addition
+  it("prefers NEMOCLAW_SANDBOX_NAME over NEMOCLAW_SANDBOX when both are set", () => {
+    const logSpy = vi.spyOn(console, "log").mockImplementation(() => {});
+    const savedNemoclaw = process.env.NEMOCLAW_SANDBOX;
+    const savedNemoclawName = process.env.NEMOCLAW_SANDBOX_NAME;
+    process.env.NEMOCLAW_SANDBOX = "legacy-sandbox";
+    process.env.NEMOCLAW_SANDBOX_NAME = "preferred-sandbox";
+
+    try {
+      stopAll({ pidDir });
+    } finally {
+      if (savedNemoclaw !== undefined) process.env.NEMOCLAW_SANDBOX = savedNemoclaw;
+      else delete process.env.NEMOCLAW_SANDBOX;
+      if (savedNemoclawName !== undefined) process.env.NEMOCLAW_SANDBOX_NAME = savedNemoclawName;
+      else delete process.env.NEMOCLAW_SANDBOX_NAME;
+    }
+
+    expect(spawnSyncSpy).toHaveBeenCalledWith(
+      "/usr/local/bin/openshell",
+      expect.arrayContaining(["preferred-sandbox"]),
+      expect.any(Object),
+    );
+    logSpy.mockRestore();
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/services-sandbox.test.ts` around lines 234 - 262, Add a test that
sets both NEMOCLAW_SANDBOX and NEMOCLAW_SANDBOX_NAME, calls stopAll({ pidDir }),
and asserts the resolved sandbox follows the intended precedence (e.g.,
NEMOCLAW_SANDBOX should override NEMOCLAW_SANDBOX_NAME). Use the same pattern as
existing tests: spy console.log, set savedEnv backups, set both env vars (e.g.,
process.env.NEMOCLAW_SANDBOX="direct-sandbox" and
process.env.NEMOCLAW_SANDBOX_NAME="named-sandbox"), call stopAll, then expect
spawnSyncSpy toHaveBeenCalledWith("/usr/local/bin/openshell",
expect.arrayContaining(["direct-sandbox"]), expect.any(Object)), and restore env
and log spies; reference stopAll, NEMOCLAW_SANDBOX, NEMOCLAW_SANDBOX_NAME, and
spawnSyncSpy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/services-sandbox.test.ts`:
- Line 8: The import symbol realSpawnSync is declared but unused; remove the
unused import from the top of the test module or, if you intend to keep it
intentionally unused, rename it to _realSpawnSync to follow the unused-variable
naming guideline—update the import statement that currently imports
realSpawnSync from "node:child_process" accordingly and run tests/lint to
confirm no unused-import errors remain.

---

Duplicate comments:
In `@src/lib/services.ts`:
- Around line 297-317: The pidDir is being resolved before determining the
effective sandbox which can cause host cleanup and in-sandbox shutdown to target
different sandboxes; move the sandbox resolution/validation (the rawSandboxName
-> sandboxName logic that uses SAFE_NAME_RE and the '..' check and calls
stopSandboxChannels) to run before resolvePidDir/ensurePidDir (i.e., determine
sandboxName first), and adjust the environment-variable precedence to match
resolveDefaultSandboxName so NEMOCLAW_SANDBOX_NAME wins over NEMOCLAW_SANDBOX
and SANDBOX_NAME; keep using stopSandboxChannels(sandboxName) when valid and
preserve the same warning behavior when invalid or missing.

---

Nitpick comments:
In `@src/lib/services-sandbox.test.ts`:
- Around line 234-262: Add a test that sets both NEMOCLAW_SANDBOX and
NEMOCLAW_SANDBOX_NAME, calls stopAll({ pidDir }), and asserts the resolved
sandbox follows the intended precedence (e.g., NEMOCLAW_SANDBOX should override
NEMOCLAW_SANDBOX_NAME). Use the same pattern as existing tests: spy console.log,
set savedEnv backups, set both env vars (e.g.,
process.env.NEMOCLAW_SANDBOX="direct-sandbox" and
process.env.NEMOCLAW_SANDBOX_NAME="named-sandbox"), call stopAll, then expect
spawnSyncSpy toHaveBeenCalledWith("/usr/local/bin/openshell",
expect.arrayContaining(["direct-sandbox"]), expect.any(Object)), and restore env
and log spies; reference stopAll, NEMOCLAW_SANDBOX, NEMOCLAW_SANDBOX_NAME, and
spawnSyncSpy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8b7cc99b-f05b-40e5-acdc-7adeae1b1687

📥 Commits

Reviewing files that changed from the base of the PR and between 0205ca1 and 7efb6c3.

📒 Files selected for processing (2)
  • src/lib/services-sandbox.test.ts
  • src/lib/services.ts

Comment thread src/lib/services-sandbox.test.ts
…ndboxName

stopAll() used NEMOCLAW_SANDBOX ?? NEMOCLAW_SANDBOX_NAME ?? SANDBOX_NAME
but resolveDefaultSandboxName() in services-command.ts uses
NEMOCLAW_SANDBOX_NAME ?? NEMOCLAW_SANDBOX ?? SANDBOX_NAME.

Align to NEMOCLAW_SANDBOX_NAME-first so both paths target the same
sandbox when multiple env vars are set.

Add regression test asserting NEMOCLAW_SANDBOX_NAME wins over
NEMOCLAW_SANDBOX.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/lib/services.ts (1)

296-322: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Resolve pidDir from the same sandbox you stop.

stopAll() still calls resolvePidDir(opts) before it derives the effective sandboxName, so host-side cleanup can target a different PID directory than the in-sandbox shutdown when NEMOCLAW_SANDBOX_NAME is set. This is the same mismatch previously flagged and it remains open.

Suggested fix
 export function stopAll(opts: ServiceOptions = {}): void {
-  const pidDir = resolvePidDir(opts);
-  ensurePidDir(pidDir);
-
   // Stop the in-sandbox OpenClaw gateway (and its messaging channels).
   const rawSandboxName =
     opts.sandboxName ??
     process.env.NEMOCLAW_SANDBOX_NAME ??
     process.env.NEMOCLAW_SANDBOX ??
     process.env.SANDBOX_NAME;
   const sandboxName =
     rawSandboxName && SAFE_NAME_RE.test(rawSandboxName) && !rawSandboxName.includes("..")
       ? rawSandboxName
       : undefined;
+  const pidDir = resolvePidDir({ ...opts, sandboxName });
+  ensurePidDir(pidDir);
   if (sandboxName) {
     stopSandboxChannels(sandboxName);
   } else if (rawSandboxName) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/services.ts` around lines 296 - 322, stopAll currently calls
resolvePidDir(opts) before computing the effective sandboxName (rawSandboxName
-> sandboxName), which can cause host-side cleanup to use the wrong PID
directory; fix by deriving sandboxName first (using rawSandboxName, SAFE_NAME_RE
and the same environment precedence logic) and then call resolvePidDir passing
the resolved sandboxName (or adjust resolvePidDir signature to accept
opts.sandboxName) so stopService(pidDir, "cloudflared") uses the PID directory
for the same sandbox you just stopped; update references in stopAll
(rawSandboxName, sandboxName, resolvePidDir, stopService) accordingly and remove
the earlier precomputed pidDir.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/services.ts`:
- Around line 296-322: stopAll currently calls resolvePidDir(opts) before
computing the effective sandboxName (rawSandboxName -> sandboxName), which can
cause host-side cleanup to use the wrong PID directory; fix by deriving
sandboxName first (using rawSandboxName, SAFE_NAME_RE and the same environment
precedence logic) and then call resolvePidDir passing the resolved sandboxName
(or adjust resolvePidDir signature to accept opts.sandboxName) so
stopService(pidDir, "cloudflared") uses the PID directory for the same sandbox
you just stopped; update references in stopAll (rawSandboxName, sandboxName,
resolvePidDir, stopService) accordingly and remove the earlier precomputed
pidDir.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 75bb081d-031f-4a71-894b-1303a9639de1

📥 Commits

Reviewing files that changed from the base of the PR and between 7efb6c3 and 75a0ac1.

📒 Files selected for processing (2)
  • src/lib/services-sandbox.test.ts
  • src/lib/services.ts
✅ Files skipped from review due to trivial changes (1)
  • src/lib/services-sandbox.test.ts

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
@brandonpelfrey
Copy link
Copy Markdown
Collaborator Author

Automated PR review summary

Reviewed PR #1977: fix(cli): stop in-sandbox messaging channels on nemoclaw stop

Recommendation

  • Recommendation: PASS
  • Highest observed severity: low
  • Block merge: no
  • Why: The core PR risk was that stop would still fail to reach the sandboxed gateway or would use an overbroad matcher and kill the wrong process. In the real OpenShell sandbox, the stop command removed a live gateway process, did not kill an unrelated helper whose argv merely contained a similar token, and correctly targeted the intended sandbox via env selection.
  • Reviewer summary: Reviewed PR fix(cli): stop in-sandbox messaging channels on nemoclaw stop #1977 against the installed NemoClaw/OpenShell environment. End-to-end sandbox probes confirmed that nemoclaw stop now terminates in-sandbox gateway processes, avoids killing unrelated lookalike processes, and honors env-based sandbox selection in the tested runtime path.

Installation and setup findings

  • NemoClaw was installed from the local checkout and the real NemoClaw-managed sandbox works. I used the repo install.sh flow, verified nemoclaw was installed, identified sandbox my-assistant through NemoClaw CLI, executed 2+2=4 inside that sandbox over SSH, and confirmed in-sandbox model reachability with OpenClaw returning 4. The only wrinkle was the first installer invocation timing out near the end of policy setup after the sandbox was already created.

What was validated

  • The PR revision was checked out in an isolated review environment.
  • The local checkout was installed using the repository installer flow as closely as the environment allowed.
  • Adversarial, PR-specific probes were then run against the installed environment and relevant repository context.
  • Diff summary:
 .../skills/nemoclaw-contributor-create-pr/SKILL.md |    5 +
 .../nemoclaw-user-configure-inference/SKILL.md     |  468 +++-
 .../references/inference-options.md                |    5 +-
 .../references/switch-inference-providers.md       |  177 --
 .../references/use-local-inference.md              |  284 ---
 .../references/best-practices.md                   |   21 +-
 .../references/credential-storage.md               |    2 -
 .../skills/nemoclaw-user-deploy-remote/SKILL.md    |  164 +-
 .../references/install-openclaw-plugins.md         |   93 -
 .agents/skills/nemoclaw-user-get-started/SKILL.md  |  103 +-
 .../references/quickstart-hermes.md                |  146 --
 .../references/windows-preparation.md              |   13 +-
 .../skills/nemoclaw-user-manage-policy/SKILL.md    |  269 ++-
 .../references/customize-network-policy.md         |  280 ---
 .../references/architecture.md                     |    5 -
 .../nemoclaw-user-reference/references/commands.md |  113 +-
 .../references/troubleshooting.md                  |   52 +-
 .agents/skills/nemoclaw-user-workspace/SKILL.md    |    2 -
 .../references/workspace-files.md                  |    1 -
 .coderabbit.yaml      
...[truncated]

Failing tests and unresolved impact

  • No failing adversarial tests were captured.

Passing tests and why they mattered

Passing test 1: nemoclaw stop terminates the real in-sandbox gateway

  • What was tested: With a live openclaw-gateway process in the real OpenShell sandbox, nemoclaw stop should remove it, not just stop host-side services.
  • Why it mattered: If false, messaging channels can continue polling after users believe nemoclaw stop has shut them down.
  • Observed result: Sandbox pre-scan showed PID 205 openclaw-gateway; nemoclaw stop reported in-sandbox gateway stopped; post-scan was empty.
  • Command: bash /tmp/pr1977-test1.sh
  • Recommended follow-up coverage: Add an end-to-end integration/regression test against a real sandboxed gateway process because mocked spawn results do not prove the runtime stop path reaches the sandbox.

Passing test 2: Stop logic does not kill unrelated lookalike processes

  • What was tested: The new process matcher should avoid killing unrelated processes whose argv merely contains openclaw-gateway as a suffix/helper token.
  • Why it mattered: If false, nemoclaw stop can terminate unrelated sandbox workloads via overbroad process matching.
  • Observed result: Injected helper process python3 ... openclaw-gateway-helper; stop reported gateway not running; helper PID 781 remained after stop.
  • Command: bash /tmp/pr1977-test2.sh
  • Recommended follow-up coverage: Add a regression test covering helper/suffix names so future matcher changes do not reintroduce unsafe substring-based kills.

Passing test 3: Env fallback targets the intended sandbox

  • What was tested: nemoclaw stop should honor env-based sandbox selection and successfully stop the target sandbox when NEMOCLAW_SANDBOX is set.
  • Why it mattered: If false, stop can act on the wrong sandbox or fail to stop channels when users rely on env-based selection.
  • Observed result: Spawned a gateway-like target in sandbox my-assistant; with NEMOCLAW_SANDBOX=my-assistant and conflicting SANDBOX_NAME, stop chose my-assistant and removed the target process.
  • Command: bash /tmp/pr1977-test3.sh
  • Recommended follow-up coverage: Keep the unit coverage and add one CLI-level regression test that exercises env precedence through the installed command rather than only direct module calls.

Bottom line

  • Based on the install evidence and adversarial probes, this PR looks reasonable to approve.

@cv cv requested a review from ericksoa May 1, 2026 23:15
@brandonpelfrey brandonpelfrey dismissed ericksoa’s stale review May 1, 2026 23:17

More recent review from Carlos, concerns addressed.

@brandonpelfrey brandonpelfrey merged commit 69403e0 into main May 1, 2026
15 checks passed
cv pushed a commit that referenced this pull request May 2, 2026
## Summary
Daily release-prep documentation refresh for merged PRs from the past 24
hours.
This updates user-facing docs for Telegram mention-only mode, in-sandbox
messaging shutdown, Hermes onboarding/runtime behavior, and
compatible-endpoint smoke validation, then bumps the docs metadata to
0.0.33 after tag v0.0.32.

## Related Issue
None.

## Changes
- #2417 / c7e49ad: Document `TELEGRAM_REQUIRE_MENTION` for Telegram
group-chat replies in `docs/manage-sandboxes/messaging-channels.md` and
`docs/reference/commands.md`.
- #1977 / 69403e0: Update `nemoclaw tunnel stop` and deprecated
`nemoclaw stop` docs to explain that NemoClaw also attempts to stop the
in-sandbox OpenClaw gateway and messaging polling.
- #2781 / b83ffe2, #2859 / 4df8be6, and #2846 / 0968dfd: Refresh the
Hermes quickstart for the default `my-hermes` sandbox name, cross-agent
same-name guard, agent type visibility in `nemoclaw list`, Brave prompt
omission, and supported prebaked Hermes integrations.
- #2849 / fd240ff: Document the Telegram plus OpenAI-compatible
endpoint `inference.local` smoke check in inference options and
troubleshooting.
- Bump `docs/versions1.json` and `docs/project.json` from 0.0.32 to
0.0.33 for daily release preparation.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Additional checks run:
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --dry-run`
- `git diff --check`
- `make docs` passed with the existing local version-switcher read
message.
- Full `npx prek run --all-files` and `npm test` were skipped for this
doc-only automation run. Commit and pre-push hooks otherwise passed
docs, lint, secret, and conversion checks until the local `Test (skills
YAML)` hook failed because `vitest/config` is not installed in this
fresh worktree.

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Updated Hermes quickstart: default sandbox name is "hermes"; guidance
to use distinct sandbox names, note same-name reuse is prevented, Hermes
wizard does not request Brave Web Search, and sandbox listings now show
agent type.
* Clarified provider onboarding: bounded in-sandbox smoke check runs
when Telegram messaging is enabled.
* Expanded Telegram docs: added TELEGRAM_REQUIRE_MENTION (DMs still
governed by TELEGRAM_ALLOWED_IDS), onboarding examples,
stop-messaging/tunnel behavior, and troubleshooting.
  * Promoted docs to version 0.0.33.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). VDR Linked to VDR finding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[All Platform]nemoclaw stop only stops cloudflared but does not stop Telegram/Discord/Slack messaging bridges running inside the sandbox

4 participants