fix(onboard): use effectivePort in dashboard wait and guard forward port collision by Dongni-Yang · Pull Request #2086 · NVIDIA/NemoClaw

Dongni-Yang · 2026-04-20T04:45:49Z

Closes #2063 — nemoclaw onboard --agent hermes fails when an existing OpenClaw sandbox is already running.

Two bugs found and fixed:

Bug 1 (onboard.ts) — The sandbox readiness wait loop hardcoded DASHBOARD_PORT (18789) when probing inside every sandbox. Hermes runs on port 8642, so the loop always timed out after 28 s and printed a misleading "Dashboard taking longer than expected" warning.

Bug 2 (ensureDashboardForward) — No guard against a port already forwarded by a different sandbox. A second onboard using the same port would silently steal the first sandbox's forward, leaving it unreachable.

Changes

File	Change
`src/lib/onboard.ts`	Use `effectivePort` (= `agent.forwardPort` for non-OpenClaw agents) instead of hardcoded `DASHBOARD_PORT` in the readiness wait loop
`src/lib/onboard.ts`	`ensureDashboardForward`: check `openshell forward list` before stop/start; throw a clear error if the port belongs to a different sandbox
`src/lib/gateway-state.test.ts`	20 new unit tests for `isGatewayHealthy`, `getGatewayReuseState`, `isSandboxReady`, `hasStaleGateway`

Root cause

createSandbox already computes effectivePort = agent ? agent.forwardPort : CONTROL_UI_PORT but the readiness loop immediately below used DASHBOARD_PORT directly. One-token fix makes the loop consistent with the rest of the function.

Test plan

npm run build:cli — no TypeScript errors
npm test — 1634 pass, 0 new failures
Manual: nemoclaw onboard then nemoclaw onboard --agent hermes — both complete; openshell forward list shows ports 18789 and 8642 as distinct entries

Signed-off-by: Dongni Yang dongniy@nvidia.com

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Readiness checks now probe the actual dynamically allocated dashboard port during sandbox startup.
- Added validation to detect and error on port-forward conflicts to prevent routing to the wrong sandbox.
Tests
- Updated test stubs to reflect richer port-forward status output and refined readiness command matching, and added a test for port-forward collision handling.

…nsureDashboardForward against port collision When onboarding a non-OpenClaw agent (e.g. Hermes, forwardPort=8642) next to an existing OpenClaw sandbox (forwardPort=18789), the dashboard readiness loop probed the wrong port (hardcoded DASHBOARD_PORT=18789) inside the Hermes sandbox. This caused a spurious 28-second wait and a misleading "Dashboard taking longer than expected" warning on every Hermes onboard, masking the real health state. Additionally ensureDashboardForward had no guard against a port already forwarded by a different sandbox. If two agents share the same forward port the first sandbox's forward would be silently stolen, leaving it unreachable. - Replace hardcoded DASHBOARD_PORT with effectivePort (already computed as agent.forwardPort for non-OpenClaw agents) in the readiness wait loop - Add a pre-check in ensureDashboardForward: inspect `openshell forward list` before issuing stop/start; throw a clear actionable error if the port is claimed by a different sandbox - Add gateway-state.test.ts (20 tests) covering isGatewayHealthy, getGatewayReuseState, isSandboxReady, hasStaleGateway, and related pure classifiers Fixes NVIDIA#2063 Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-20T04:46:18Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5755f9c8-ff3d-44be-8178-68f362ed7d15

📥 Commits

Reviewing files that changed from the base of the PR and between cf1c597 and 2ef0638.

📒 Files selected for processing (1)

test/onboard.test.ts

📝 Walkthrough

Walkthrough

Readiness checks now curl the sandbox-local dashboard URL using the dynamically chosen effectivePort. ensureDashboardForward() parses openshell forward list output to detect port-forward collisions (throws if the local port is forwarded by another sandbox); tests updated to reflect new forward-list output and stricter exec command matching.

Changes

Cohort / File(s)	Summary
Dashboard probing & forward validation `src/lib/onboard.ts`	Readiness loop uses `http://localhost:${effectivePort}/`. `ensureDashboardForward()` runs `openshell forward list`, parses status lines to detect if `portToStop` is forwarded by a different sandbox and errors on collision; otherwise retains stop/start forward behavior.
Test mocking adjustments `test/onboard.test.ts`	Updated `runner.runCapture` mocks so `'forward' 'list'` returns structured status lines (`<sandbox> 127.0.0.1 <port> <localPort> running`) and tightened sandbox exec detection to the tokenized `"'sandbox' 'exec'"`. Added test asserting createSandbox rejects on forward collision.

Sequence Diagram

sequenceDiagram
    participant Tool as NemoClaw Tool
    participant OpenShell as OpenShell CLI
    participant Sandbox as Sandbox Instance

    Tool->>OpenShell: forward list
    OpenShell-->>Tool: list of forward entries (status lines)
    Tool->>Tool: parse entries for portToStop

    alt port forwarded to different sandbox
        Tool->>Tool: throw error (port already forwarded by other sandbox)
    else port free or owned by same sandbox
        Tool->>OpenShell: forward stop (if needed)
        OpenShell-->>Tool: ack
        Tool->>OpenShell: forward start (establish mapping)
        OpenShell-->>Tool: ack
    end

    Tool->>Sandbox: curl http://localhost:{effectivePort}/ (readiness)
    Sandbox-->>Tool: HTTP response (ready / not ready)

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix(onboard): use effectivePort in dashboard wait and guard forward port collision #2086 — same code-level change: switch readiness probe to effectivePort and add parsing/guard for openshell forward list collisions.

Poem

🐰 I sniffed the ports and hopped with glee,
I checked the forwards and kept them free,
A tiny curl to the effective sea,
I warned of clashes loud and spry,
Tests now match — I twitched my ear and cried "Hi!" 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: fixing dashboard port usage (effectivePort) and adding guard against port collision in forward setup.
Linked Issues check	✅ Passed	The PR implementation addresses all primary coding objectives from issue `#2063`: using effectivePort for readiness probe and detecting port-forward collisions before starting new forwards.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the two bugs preventing Hermes sandbox creation alongside OpenClaw; test updates properly support the implementation changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…d wait loop Upstream changed the sandbox readiness check from runCapture (shell string) to runCaptureOpenshell (argv array) for safety. Our fix uses effectivePort instead of CONTROL_UI_PORT. Resolution keeps both: the safer argv form from upstream and the agent-aware port from this branch. Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/lib/gateway-state.test.ts (1)

53-73: Add explicit coverage for active-unnamed reuse-state branches.

getGatewayReuseState has active-unnamed paths (see src/lib/gateway-state.ts Lines 90-114), but this suite currently covers "healthy", "foreign-active", "stale", and "missing" only.

Suggested test additions

 describe("getGatewayReuseState", () => {
@@
   it("returns missing when no state is present", () => {
     expect(getGatewayReuseState("", "", "")).toBe("missing");
   });
+
+  it("returns active-unnamed when connected and active gateway matches but metadata is missing", () => {
+    expect(getGatewayReuseState(STATUS_CONNECTED_NEMOCLAW, "", "")).toBe("active-unnamed");
+  });
+
+  it("returns active-unnamed when active gateway info exists without reusable metadata", () => {
+    expect(getGatewayReuseState("", "", ACTIVE_INFO_NEMOCLAW)).toBe("active-unnamed");
+  });
 });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lib/gateway-state.test.ts` around lines 53 - 73, Add tests for the
"active-unnamed" branches of getGatewayReuseState: create inputs where an active
gateway is present but its name is empty/unnamed and assert the function returns
"active-unnamed" (e.g., call getGatewayReuseState with a connected status like
STATUS_CONNECTED_NEMOCLAW or STATUS_CONNECTED_OTHER, provide an ACTIVE_INFO
value that represents an unnamed active gateway, and appropriate GW_INFO
values), and also add the complementary case where the local gateway has no
metadata but an unnamed active gateway exists to cover both unnamed-active
paths; use the existing test style and constants (getGatewayReuseState,
STATUS_CONNECTED_NEMOCLAW, STATUS_CONNECTED_OTHER, GW_INFO_NEMOCLAW,
ACTIVE_INFO_NEMOCLAW) to locate where to add these new it(...) cases.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/gateway-state.test.ts`:
- Line 12: The test imports the compiled artifact
("../../dist/lib/gateway-state") which can cause stale builds to be tested;
change the import to point at the TypeScript source module (the gateway-state
source) instead of dist — i.e., replace the "../../dist/lib/gateway-state"
import in src/lib/gateway-state.test.ts with the corresponding source import
(e.g., "../../src/lib/gateway-state" or "./gateway-state" depending on relative
layout) so the tests run against the live TypeScript code, and ensure your test
runner is configured to load TS sources (ts-node/ts-jest or the
tsconfig.cli.json settings).

In `@src/lib/onboard.ts`:
- Around line 4739-4741: The error message that references
NEMOCLAW_DASHBOARD_PORT is misleading for agents that specify an explicit
forwardPort in their manifest; update the string that currently includes `Port
${portToStop} is already forwarded... Set NEMOCLAW_DASHBOARD_PORT...` so it
directs users to change the agent's dashboard port override (the manifest
`forwardPort` / local dashboard URL override) instead of suggesting the
environment variable; locate where `portToStop` is used to build this message
and replace the remediation text to reference `forwardPort`/local dashboard URL
override for Hermes-style agents rather than `NEMOCLAW_DASHBOARD_PORT`.
- Around line 4736-4742: The check using existingForwards.includes is too broad;
instead parse the runCaptureOpenshell(["forward","list"]) output by lines,
locate the single line that corresponds to the port (match the token containing
`:${portToStop}` as the port column), extract that line's sandbox name field and
compare it exactly to `sandboxName`; if the extracted owner exists and is not
equal to `sandboxName`, throw the same error. Use `existingForwards.split("\n")`
(or equivalent), find the line where the port column equals `:${portToStop}`,
parse the owner/sandbox column from that line, and base the collision check on
that exact owner rather than substring matches.

---

Nitpick comments:
In `@src/lib/gateway-state.test.ts`:
- Around line 53-73: Add tests for the "active-unnamed" branches of
getGatewayReuseState: create inputs where an active gateway is present but its
name is empty/unnamed and assert the function returns "active-unnamed" (e.g.,
call getGatewayReuseState with a connected status like STATUS_CONNECTED_NEMOCLAW
or STATUS_CONNECTED_OTHER, provide an ACTIVE_INFO value that represents an
unnamed active gateway, and appropriate GW_INFO values), and also add the
complementary case where the local gateway has no metadata but an unnamed active
gateway exists to cover both unnamed-active paths; use the existing test style
and constants (getGatewayReuseState, STATUS_CONNECTED_NEMOCLAW,
STATUS_CONNECTED_OTHER, GW_INFO_NEMOCLAW, ACTIVE_INFO_NEMOCLAW) to locate where
to add these new it(...) cases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d01f7876-76bc-418b-8814-f88697cdb433

📥 Commits

Reviewing files that changed from the base of the PR and between 67db244 and d2e8310.

📒 Files selected for processing (2)

src/lib/gateway-state.test.ts
src/lib/onboard.ts

coderabbitai · 2026-04-20T04:51:16Z

+  hasStaleGateway,
+  getReportedGatewayName,
+  isGatewayConnected,
+} from "../../dist/lib/gateway-state";


⚠️ Potential issue | 🟠 Major

Importing from dist makes these unit tests unreliable.

Line 12 imports compiled output (../../dist/...) instead of source. That can silently test stale build artifacts and miss regressions in src/lib/gateway-state.ts.

Proposed fix

} from "vitest"; import { isGatewayHealthy, getGatewayReuseState, isSandboxReady, hasStaleGateway, getReportedGatewayName, isGatewayConnected, -} from "../../dist/lib/gateway-state"; +} from "./gateway-state";

As per coding guidelines, "Use TypeScript for CLI code in src/lib/ with type-checking via tsconfig.cli.json".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} from "../../dist/lib/gateway-state";

} from "vitest";

import {

isGatewayHealthy,

getGatewayReuseState,

isSandboxReady,

hasStaleGateway,

getReportedGatewayName,

isGatewayConnected,

} from "./gateway-state";

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/gateway-state.test.ts` at line 12, The test imports the compiled artifact ("../../dist/lib/gateway-state") which can cause stale builds to be tested; change the import to point at the TypeScript source module (the gateway-state source) instead of dist — i.e., replace the "../../dist/lib/gateway-state" import in src/lib/gateway-state.test.ts with the corresponding source import (e.g., "../../src/lib/gateway-state" or "./gateway-state" depending on relative layout) so the tests run against the live TypeScript code, and ensure your test runner is configured to load TS sources (ts-node/ts-jest or the tsconfig.cli.json settings).

coderabbitai

♻️ Duplicate comments (2)

src/lib/onboard.ts (2)

5564-5566: ⚠️ Potential issue | 🟠 Major

Match the conflicting forward on the exact local-port entry, not with blob-level substring checks.

This guard can still misclassify ownership. With output like 9999 -> other:8642, includes(\:${portToStop}`)fires even though local port8642is free, and any unrelated line containingsandboxNamesuppresses a real collision. Parse the singleforward listrow whose **left-hand local port** equalsportToStop`, then compare that row’s owner exactly before deciding whether to throw.

Suggested fix

   const existingForwards = runCaptureOpenshell(["forward", "list"], { ignoreError: true });
-  if (existingForwards?.includes(`:${portToStop}`) && !existingForwards?.includes(sandboxName)) {
+  const conflictingOwner = String(existingForwards || "")
+    .split(/\r?\n/)
+    .map((line) => line.trim())
+    .map((line) => {
+      const match = line.match(/^(\d+)\s*->\s*([^:]+):(\d+)$/);
+      return match
+        ? { localPort: match[1], owner: match[2], remotePort: match[3] }
+        : null;
+    })
+    .find((entry) => entry && entry.localPort === String(portToStop))?.owner;
+  if (conflictingOwner && conflictingOwner !== sandboxName) {
     throw new Error(
       `Port ${portToStop} is already forwarded for another sandbox. ` +
         `Set NEMOCLAW_DASHBOARD_PORT to a different port before onboarding ` +
         `a second sandbox.`,
     );
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 5564 - 5566, The current check uses
existingForwards?.includes(`:${portToStop}`) which can match any substring;
instead parse the output of runCaptureOpenshell(["forward", "list"]) into rows,
split each row on the arrow or delimiter to extract the local-port (left-hand)
value and the owner field, find the single row whose local-port equals
portToStop, and only if that row exists and its owner is not exactly equal to
sandboxName throw the Error; update the logic around existingForwards,
runCaptureOpenshell, portToStop and sandboxName to perform exact matching on the
parsed left-hand port and exact owner comparison.

5567-5569: ⚠️ Potential issue | 🟡 Minor

The remediation hint is still wrong for agent-defined dashboard ports.

For agents that set forwardPort, NEMOCLAW_DASHBOARD_PORT does not change the claimed local forward port, so this sends Hermes-style users to a no-op. Point them at the local dashboard URL/port override instead (for example CHAT_UI_URL) or at the agent-specific forward-port setting.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 5567 - 5569, The current error message that
references NEMOCLAW_DASHBOARD_PORT is incorrect when an agent has an explicit
forwardPort; update the message generation in the onboarding code (where
portToStop is used and the string is built) to detect agent-defined forward port
(e.g., check the agent's forwardPort/config or the presence of a forwardPort
variable) and show a different remediation: if the agent defines forwardPort,
instruct the user to change the agent's forwardPort or override the local
dashboard URL (e.g., CHAT_UI_URL) or the agent-specific forward-port setting;
otherwise keep the existing suggestion to set NEMOCLAW_DASHBOARD_PORT. Ensure
the conditional logic is applied where the string is composed so the correct
hint is emitted for both cases.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 5564-5566: The current check uses
existingForwards?.includes(`:${portToStop}`) which can match any substring;
instead parse the output of runCaptureOpenshell(["forward", "list"]) into rows,
split each row on the arrow or delimiter to extract the local-port (left-hand)
value and the owner field, find the single row whose local-port equals
portToStop, and only if that row exists and its owner is not exactly equal to
sandboxName throw the Error; update the logic around existingForwards,
runCaptureOpenshell, portToStop and sandboxName to perform exact matching on the
parsed left-hand port and exact owner comparison.
- Around line 5567-5569: The current error message that references
NEMOCLAW_DASHBOARD_PORT is incorrect when an agent has an explicit forwardPort;
update the message generation in the onboarding code (where portToStop is used
and the string is built) to detect agent-defined forward port (e.g., check the
agent's forwardPort/config or the presence of a forwardPort variable) and show a
different remediation: if the agent defines forwardPort, instruct the user to
change the agent's forwardPort or override the local dashboard URL (e.g.,
CHAT_UI_URL) or the agent-specific forward-port setting; otherwise keep the
existing suggestion to set NEMOCLAW_DASHBOARD_PORT. Ensure the conditional logic
is applied where the string is composed so the correct hint is emitted for both
cases.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 75946eec-47b5-4b07-8da1-39725da3a05e

📥 Commits

Reviewing files that changed from the base of the PR and between d2e8310 and 44c6324.

📒 Files selected for processing (1)

src/lib/onboard.ts

…n check and test mock - ensureDashboardForward: replace blob substring match with line-level parser so the collision check is scoped to the exact port entry (format: "<port> -> <sandbox>:<port>") rather than the full listing - Error message now references CHAT_UI_URL instead of NEMOCLAW_DASHBOARD_PORT — the env var only applies to OpenClaw default port; agents with explicit forwardPort in their manifest need CHAT_UI_URL to override the local listen address - test/onboard.test.ts: fix runCapture mock to match openshellShellCommand output ('\''sandbox'\'' '\''exec'\'') so the dashboard-ready loop exits on the first iteration instead of sleeping 28 s and timing out the test - Remove duplicate src/lib/gateway-state.test.ts (superseded by the existing test/gateway-state.test.ts which covers the same functions with broader scenarios) Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

src/lib/onboard.ts (1)

5564-5575: ⚠️ Potential issue | 🟠 Major

Match the forwarded endpoint by port suffix, not bare-token equality.

Line 5572 only matches when openshell forward list prints the local endpoint as the bare port. If the CLI emits 127.0.0.1:${portToStop} or 0.0.0.0:${portToStop}, this guard misses the collision and forward stop can still steal another sandbox’s forward.

Suggested fix

   const portLine = existingForwards
     ?.split("\n")
     .map((l) => l.trim())
     .find((l) => {
-      const localPort = l.split(/\s+/)[0];
-      return localPort === portToStop;
+      const localEndpoint = l.split(/\s+/)[0] || "";
+      return localEndpoint === portToStop || localEndpoint.endsWith(`:${portToStop}`);
     });

What is the output format of `openshell forward list` for local forwards? Does the local endpoint column include the host (for example `127.0.0.1:18789` or `0.0.0.0:8642`) or only the bare port?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 5564 - 5575, The current match logic in the
block using runCaptureOpenshell(["forward", "list"]) only compares the first
token as a bare port (variable portToStop) and misses entries like
"127.0.0.1:1234"; update the find and parsing so you compare the port suffix
instead: inside the .find callback extract the local endpoint token (same
split(/\s+/)[0]), then derive its port by taking the substring after the last
":" (or the whole token if no ":"), and compare that port string to portToStop;
likewise compute portOwner from portLine the same way (keep existing portOwner
extraction from split(/\s+/)[2] but treat that token's sandbox portion the same)
so collisions where host:port is printed are detected (refer to
runCaptureOpenshell, portToStop, portLine, portOwner, sandboxName).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 5564-5575: The current match logic in the block using
runCaptureOpenshell(["forward", "list"]) only compares the first token as a bare
port (variable portToStop) and misses entries like "127.0.0.1:1234"; update the
find and parsing so you compare the port suffix instead: inside the .find
callback extract the local endpoint token (same split(/\s+/)[0]), then derive
its port by taking the substring after the last ":" (or the whole token if no
":"), and compare that port string to portToStop; likewise compute portOwner
from portLine the same way (keep existing portOwner extraction from
split(/\s+/)[2] but treat that token's sandbox portion the same) so collisions
where host:port is printed are detected (refer to runCaptureOpenshell,
portToStop, portLine, portOwner, sandboxName).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b4f72c64-7cac-45b3-8947-96cc64c7f933

📥 Commits

Reviewing files that changed from the base of the PR and between 44c6324 and bcaa37f.

📒 Files selected for processing (2)

src/lib/onboard.ts
test/onboard.test.ts

…mocks openshell forward list outputs SANDBOX BIND PORT PID STATUS columns. The ensureDashboardForward parser was reading column [0] (SANDBOX name) as the port and column [2] with colon-split for the owner, which never matched any real output — silently disabling the collision guard. Fix: read PORT from column [2] and SANDBOX name from column [0]. Update all test mocks from the fictional "PORT -> NAME:PORT" format to match the real openshell output: "NAME 127.0.0.1 PORT PID STATUS". Regression introduced in this branch; related to open issue NVIDIA#2007. Signed-off-by: Dongni Yang <dongniy@nvidia.com>

wscurran · 2026-04-20T14:28:05Z

✨ Thanks for submitting this PR that proposes a fix to the dashboard wait and forward port collision. The changes and test plan you provided will help us review this further. Possibly related to issue #2063.

Possibly closes:

#2063 nemoclaw fails to create hermes sandbox next to openclaw

Possibly related open issues:

#2063 nemoclaw fails to create hermes sandbox next to openclaw

Exercises the ensureDashboardForward collision path added in this PR: mock `forward list` to return a different sandbox owning port 18789, then assert createSandbox throws with the expected error message.

ericksoa

LGTM — the two bug fixes are correct and well-scoped:

Bug 1 (readiness loop): Trivial one-token fix replacing hardcoded CONTROL_UI_PORT with the already-computed effectivePort. Clearly correct.
Bug 2 (forward collision guard): Line-level parsing of openshell forward list columns (SANDBOX BIND PORT PID STATUS) with exact port match at column 2 and sandbox-name comparison at column 0. Error message correctly references CHAT_UI_URL for agent-specific port overrides.

I pushed one additional commit adding a test that exercises the collision detection path (mock forward list returning a different sandbox name, assert the expected error). All 136 tests pass.

Minor note: PR description still references "20 new unit tests" in gateway-state.test.ts, but those were added then removed within the PR (duplicate of existing test/gateway-state.test.ts). The description could be updated but isn't blocking.

## Summary - TC-SBX-10 fails because PR #2086 added a port-collision guard in `ensureDashboardForward()` but the E2E test was not updated to use a different `CHAT_UI_URL` for sandbox B - Sets `CHAT_UI_URL=http://127.0.0.1:18790` when onboarding the second sandbox ## Test plan - [ ] Nightly E2E `sandbox-operations-e2e` job passes (TC-SBX-10, TC-SBX-11) Fixes regression from #2086.  ## Summary by CodeRabbit * **Tests** * Updated test infrastructure configuration for sandbox operations. --- **Note:** This release contains no user-facing changes. Updates are limited to internal testing procedures.

## Summary Catches up the user-facing reference and troubleshooting docs with the CLI and policy behavior changes that landed in v0.0.21. Drafted via the `nemoclaw-contributor-update-docs` skill against commits in `v0.0.20..v0.0.21`, filtered through `docs/.docs-skip`. ## Changes - **`docs/reference/commands.md`** - `nemoclaw list`: session indicator (●) for connected sandboxes (#2117). - `nemoclaw <name> connect`: active-session note; auto-recovery from SSH identity drift after a host reboot (#2117, #2064). - `nemoclaw <name> status`: three-state Inference line (`healthy` / `unreachable` / `not probed`) covering both local and remote providers; new `Connected` line (#2002, #2117). - `nemoclaw <name> destroy` and `rebuild`: active-session warning with second confirm; rebuild reapplies policy presets to the recreated sandbox (#2117, #2026). - `nemoclaw <name> policy-add` and `policy-remove`: positional preset argument and non-interactive flow via `--yes`/`--force`/`NEMOCLAW_NON_INTERACTIVE=1` (#2070). - `nemoclaw <name> policy-list`: registry-vs-gateway desync detection (#2089). - **`docs/reference/troubleshooting.md`** - `Reconnect after a host reboot`: now reflects automatic stale `known_hosts` pruning on `connect` (#2064). - `Running multiple sandboxes simultaneously`: onboard's forward-port collision guard (#2086). - New section: `openclaw config set` or `unset` is blocked inside the sandbox (#2081). - **`docs/network-policy/customize-network-policy.md`**: non-interactive `policy-add`/`policy-remove` form; preset preservation across rebuild (#2070, #2026). - **`docs/inference/use-local-inference.md`**: NIM section now covers the NGC API key prompt with masked input and `docker login nvcr.io --password-stdin` behavior (#2043). - **Generated skills regenerated** to pick up the source changes (`.agents/skills/nemoclaw-user-reference/references/{commands,troubleshooting}.md`, plus minor heading-flow deltas elsewhere). The pre-commit `Regenerate agent skills from docs` hook ran and confirmed source ↔ generated parity. Commits skipped per `docs/.docs-skip` or no doc impact: `bbbaa0fb` (skip-features), `7cb482cb` (skip-features), `8dee23fd` (skip-terms), plus the usual CI / test / refactor / install-plumbing churn. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes for the modified files (the one failing test, `test/cli.test.ts > unknown command exits 1`, also fails on `origin/main` and is unrelated to these markdown-only changes) - [ ] `npm test` passes — skipped; same pre-existing CLI-dispatch test failure unrelated to docs - [ ] Tests added or updated for new or changed behavior — n/a, doc-only - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — not run locally - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) — n/a, no new pages ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Multi-session SSH connections with concurrent session support. * Three-state inference health reporting (healthy/unreachable/not probed) across all providers. * Automatic SSH host key rotation detection and recovery. * Non-interactive policy preset management via positional arguments. * Session indicators in sandbox list view. * **Bug Fixes** * Protected destructive operations with active-session warnings. * Policy presets now preserved during sandbox rebuilds. * **Documentation** * NGC registry authentication requirements for container images. * Multi-sandbox onboarding and reconnection guidance. * Troubleshooting updates for port conflicts and SSH issues.  Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

Dongni-Yang changed the title ~~fix(#2063): use agent forwardPort in dashboard wait; guard ensureDashboardForward against port collision~~ fix(onboard): use effectivePort in dashboard wait and guard forward port collision Apr 20, 2026

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

wscurran added bug Something isn't working fix labels Apr 20, 2026

test(onboard): add coverage for forward port collision guard

2ef0638

Exercises the ensureDashboardForward collision path added in this PR: mock `forward list` to return a different sandbox owning port 18789, then assert createSandbox throws with the expected error message.

ericksoa approved these changes Apr 20, 2026

View reviewed changes

ericksoa merged commit ac9afba into NVIDIA:main Apr 20, 2026
7 of 8 checks passed

oparoz mentioned this pull request Apr 20, 2026

nemoclaw fails to create hermes sandbox next to openclaw #2063

Closed

2 tasks

ericksoa mentioned this pull request Apr 21, 2026

fix(e2e): set alternate dashboard port for multi-sandbox test #2124

Merged

1 task

miyoungc mentioned this pull request Apr 21, 2026

docs: catch up reference and troubleshooting docs for v0.0.21 #2193

Merged

13 tasks

gaveezy mentioned this pull request Apr 22, 2026

nightly-e2e failure: 5 jobs failed on 2026-04-21 (run 24697002568) #2231

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): use effectivePort in dashboard wait and guard forward port collision#2086

fix(onboard): use effectivePort in dashboard wait and guard forward port collision#2086
ericksoa merged 5 commits intoNVIDIA:mainfrom
Dongni-Yang:fix/hermes-coexistence-2063

Dongni-Yang commented Apr 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

wscurran commented Apr 20, 2026

Uh oh!

ericksoa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Dongni-Yang commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Root cause

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

wscurran commented Apr 20, 2026

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dongni-Yang commented Apr 20, 2026 •

edited

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading