fix(onboard): use effectivePort in dashboard wait and guard forward port collision#2086
Conversation
…nsureDashboardForward against port collision When onboarding a non-OpenClaw agent (e.g. Hermes, forwardPort=8642) next to an existing OpenClaw sandbox (forwardPort=18789), the dashboard readiness loop probed the wrong port (hardcoded DASHBOARD_PORT=18789) inside the Hermes sandbox. This caused a spurious 28-second wait and a misleading "Dashboard taking longer than expected" warning on every Hermes onboard, masking the real health state. Additionally ensureDashboardForward had no guard against a port already forwarded by a different sandbox. If two agents share the same forward port the first sandbox's forward would be silently stolen, leaving it unreachable. - Replace hardcoded DASHBOARD_PORT with effectivePort (already computed as agent.forwardPort for non-OpenClaw agents) in the readiness wait loop - Add a pre-check in ensureDashboardForward: inspect `openshell forward list` before issuing stop/start; throw a clear actionable error if the port is claimed by a different sandbox - Add gateway-state.test.ts (20 tests) covering isGatewayHealthy, getGatewayReuseState, isSandboxReady, hasStaleGateway, and related pure classifiers Fixes NVIDIA#2063 Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughReadiness checks now curl the sandbox-local dashboard URL using the dynamically chosen Changes
Sequence DiagramsequenceDiagram
participant Tool as NemoClaw Tool
participant OpenShell as OpenShell CLI
participant Sandbox as Sandbox Instance
Tool->>OpenShell: forward list
OpenShell-->>Tool: list of forward entries (status lines)
Tool->>Tool: parse entries for portToStop
alt port forwarded to different sandbox
Tool->>Tool: throw error (port already forwarded by other sandbox)
else port free or owned by same sandbox
Tool->>OpenShell: forward stop (if needed)
OpenShell-->>Tool: ack
Tool->>OpenShell: forward start (establish mapping)
OpenShell-->>Tool: ack
end
Tool->>Sandbox: curl http://localhost:{effectivePort}/ (readiness)
Sandbox-->>Tool: HTTP response (ready / not ready)
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…d wait loop Upstream changed the sandbox readiness check from runCapture (shell string) to runCaptureOpenshell (argv array) for safety. Our fix uses effectivePort instead of CONTROL_UI_PORT. Resolution keeps both: the safer argv form from upstream and the agent-aware port from this branch. Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/lib/gateway-state.test.ts (1)
53-73: Add explicit coverage foractive-unnamedreuse-state branches.
getGatewayReuseStatehasactive-unnamedpaths (seesrc/lib/gateway-state.tsLines 90-114), but this suite currently covers"healthy","foreign-active","stale", and"missing"only.Suggested test additions
describe("getGatewayReuseState", () => { @@ it("returns missing when no state is present", () => { expect(getGatewayReuseState("", "", "")).toBe("missing"); }); + + it("returns active-unnamed when connected and active gateway matches but metadata is missing", () => { + expect(getGatewayReuseState(STATUS_CONNECTED_NEMOCLAW, "", "")).toBe("active-unnamed"); + }); + + it("returns active-unnamed when active gateway info exists without reusable metadata", () => { + expect(getGatewayReuseState("", "", ACTIVE_INFO_NEMOCLAW)).toBe("active-unnamed"); + }); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/gateway-state.test.ts` around lines 53 - 73, Add tests for the "active-unnamed" branches of getGatewayReuseState: create inputs where an active gateway is present but its name is empty/unnamed and assert the function returns "active-unnamed" (e.g., call getGatewayReuseState with a connected status like STATUS_CONNECTED_NEMOCLAW or STATUS_CONNECTED_OTHER, provide an ACTIVE_INFO value that represents an unnamed active gateway, and appropriate GW_INFO values), and also add the complementary case where the local gateway has no metadata but an unnamed active gateway exists to cover both unnamed-active paths; use the existing test style and constants (getGatewayReuseState, STATUS_CONNECTED_NEMOCLAW, STATUS_CONNECTED_OTHER, GW_INFO_NEMOCLAW, ACTIVE_INFO_NEMOCLAW) to locate where to add these new it(...) cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/gateway-state.test.ts`:
- Line 12: The test imports the compiled artifact
("../../dist/lib/gateway-state") which can cause stale builds to be tested;
change the import to point at the TypeScript source module (the gateway-state
source) instead of dist — i.e., replace the "../../dist/lib/gateway-state"
import in src/lib/gateway-state.test.ts with the corresponding source import
(e.g., "../../src/lib/gateway-state" or "./gateway-state" depending on relative
layout) so the tests run against the live TypeScript code, and ensure your test
runner is configured to load TS sources (ts-node/ts-jest or the
tsconfig.cli.json settings).
In `@src/lib/onboard.ts`:
- Around line 4739-4741: The error message that references
NEMOCLAW_DASHBOARD_PORT is misleading for agents that specify an explicit
forwardPort in their manifest; update the string that currently includes `Port
${portToStop} is already forwarded... Set NEMOCLAW_DASHBOARD_PORT...` so it
directs users to change the agent's dashboard port override (the manifest
`forwardPort` / local dashboard URL override) instead of suggesting the
environment variable; locate where `portToStop` is used to build this message
and replace the remediation text to reference `forwardPort`/local dashboard URL
override for Hermes-style agents rather than `NEMOCLAW_DASHBOARD_PORT`.
- Around line 4736-4742: The check using existingForwards.includes is too broad;
instead parse the runCaptureOpenshell(["forward","list"]) output by lines,
locate the single line that corresponds to the port (match the token containing
`:${portToStop}` as the port column), extract that line's sandbox name field and
compare it exactly to `sandboxName`; if the extracted owner exists and is not
equal to `sandboxName`, throw the same error. Use `existingForwards.split("\n")`
(or equivalent), find the line where the port column equals `:${portToStop}`,
parse the owner/sandbox column from that line, and base the collision check on
that exact owner rather than substring matches.
---
Nitpick comments:
In `@src/lib/gateway-state.test.ts`:
- Around line 53-73: Add tests for the "active-unnamed" branches of
getGatewayReuseState: create inputs where an active gateway is present but its
name is empty/unnamed and assert the function returns "active-unnamed" (e.g.,
call getGatewayReuseState with a connected status like STATUS_CONNECTED_NEMOCLAW
or STATUS_CONNECTED_OTHER, provide an ACTIVE_INFO value that represents an
unnamed active gateway, and appropriate GW_INFO values), and also add the
complementary case where the local gateway has no metadata but an unnamed active
gateway exists to cover both unnamed-active paths; use the existing test style
and constants (getGatewayReuseState, STATUS_CONNECTED_NEMOCLAW,
STATUS_CONNECTED_OTHER, GW_INFO_NEMOCLAW, ACTIVE_INFO_NEMOCLAW) to locate where
to add these new it(...) cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: d01f7876-76bc-418b-8814-f88697cdb433
📒 Files selected for processing (2)
src/lib/gateway-state.test.tssrc/lib/onboard.ts
| hasStaleGateway, | ||
| getReportedGatewayName, | ||
| isGatewayConnected, | ||
| } from "../../dist/lib/gateway-state"; |
There was a problem hiding this comment.
Importing from dist makes these unit tests unreliable.
Line 12 imports compiled output (../../dist/...) instead of source. That can silently test stale build artifacts and miss regressions in src/lib/gateway-state.ts.
Proposed fix
} from "vitest";
import {
isGatewayHealthy,
getGatewayReuseState,
isSandboxReady,
hasStaleGateway,
getReportedGatewayName,
isGatewayConnected,
-} from "../../dist/lib/gateway-state";
+} from "./gateway-state";As per coding guidelines, "Use TypeScript for CLI code in src/lib/ with type-checking via tsconfig.cli.json".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } from "../../dist/lib/gateway-state"; | |
| } from "vitest"; | |
| import { | |
| isGatewayHealthy, | |
| getGatewayReuseState, | |
| isSandboxReady, | |
| hasStaleGateway, | |
| getReportedGatewayName, | |
| isGatewayConnected, | |
| } from "./gateway-state"; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/gateway-state.test.ts` at line 12, The test imports the compiled
artifact ("../../dist/lib/gateway-state") which can cause stale builds to be
tested; change the import to point at the TypeScript source module (the
gateway-state source) instead of dist — i.e., replace the
"../../dist/lib/gateway-state" import in src/lib/gateway-state.test.ts with the
corresponding source import (e.g., "../../src/lib/gateway-state" or
"./gateway-state" depending on relative layout) so the tests run against the
live TypeScript code, and ensure your test runner is configured to load TS
sources (ts-node/ts-jest or the tsconfig.cli.json settings).
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/lib/onboard.ts (2)
5564-5566:⚠️ Potential issue | 🟠 MajorMatch the conflicting forward on the exact local-port entry, not with blob-level substring checks.
This guard can still misclassify ownership. With output like
9999 -> other:8642,includes(\:${portToStop}`)fires even though local port8642is free, and any unrelated line containingsandboxNamesuppresses a real collision. Parse the singleforward listrow whose **left-hand local port** equalsportToStop`, then compare that row’s owner exactly before deciding whether to throw.Suggested fix
const existingForwards = runCaptureOpenshell(["forward", "list"], { ignoreError: true }); - if (existingForwards?.includes(`:${portToStop}`) && !existingForwards?.includes(sandboxName)) { + const conflictingOwner = String(existingForwards || "") + .split(/\r?\n/) + .map((line) => line.trim()) + .map((line) => { + const match = line.match(/^(\d+)\s*->\s*([^:]+):(\d+)$/); + return match + ? { localPort: match[1], owner: match[2], remotePort: match[3] } + : null; + }) + .find((entry) => entry && entry.localPort === String(portToStop))?.owner; + if (conflictingOwner && conflictingOwner !== sandboxName) { throw new Error( `Port ${portToStop} is already forwarded for another sandbox. ` + `Set NEMOCLAW_DASHBOARD_PORT to a different port before onboarding ` + `a second sandbox.`, ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 5564 - 5566, The current check uses existingForwards?.includes(`:${portToStop}`) which can match any substring; instead parse the output of runCaptureOpenshell(["forward", "list"]) into rows, split each row on the arrow or delimiter to extract the local-port (left-hand) value and the owner field, find the single row whose local-port equals portToStop, and only if that row exists and its owner is not exactly equal to sandboxName throw the Error; update the logic around existingForwards, runCaptureOpenshell, portToStop and sandboxName to perform exact matching on the parsed left-hand port and exact owner comparison.
5567-5569:⚠️ Potential issue | 🟡 MinorThe remediation hint is still wrong for agent-defined dashboard ports.
For agents that set
forwardPort,NEMOCLAW_DASHBOARD_PORTdoes not change the claimed local forward port, so this sends Hermes-style users to a no-op. Point them at the local dashboard URL/port override instead (for exampleCHAT_UI_URL) or at the agent-specific forward-port setting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 5567 - 5569, The current error message that references NEMOCLAW_DASHBOARD_PORT is incorrect when an agent has an explicit forwardPort; update the message generation in the onboarding code (where portToStop is used and the string is built) to detect agent-defined forward port (e.g., check the agent's forwardPort/config or the presence of a forwardPort variable) and show a different remediation: if the agent defines forwardPort, instruct the user to change the agent's forwardPort or override the local dashboard URL (e.g., CHAT_UI_URL) or the agent-specific forward-port setting; otherwise keep the existing suggestion to set NEMOCLAW_DASHBOARD_PORT. Ensure the conditional logic is applied where the string is composed so the correct hint is emitted for both cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 5564-5566: The current check uses
existingForwards?.includes(`:${portToStop}`) which can match any substring;
instead parse the output of runCaptureOpenshell(["forward", "list"]) into rows,
split each row on the arrow or delimiter to extract the local-port (left-hand)
value and the owner field, find the single row whose local-port equals
portToStop, and only if that row exists and its owner is not exactly equal to
sandboxName throw the Error; update the logic around existingForwards,
runCaptureOpenshell, portToStop and sandboxName to perform exact matching on the
parsed left-hand port and exact owner comparison.
- Around line 5567-5569: The current error message that references
NEMOCLAW_DASHBOARD_PORT is incorrect when an agent has an explicit forwardPort;
update the message generation in the onboarding code (where portToStop is used
and the string is built) to detect agent-defined forward port (e.g., check the
agent's forwardPort/config or the presence of a forwardPort variable) and show a
different remediation: if the agent defines forwardPort, instruct the user to
change the agent's forwardPort or override the local dashboard URL (e.g.,
CHAT_UI_URL) or the agent-specific forward-port setting; otherwise keep the
existing suggestion to set NEMOCLAW_DASHBOARD_PORT. Ensure the conditional logic
is applied where the string is composed so the correct hint is emitted for both
cases.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 75946eec-47b5-4b07-8da1-39725da3a05e
📒 Files selected for processing (1)
src/lib/onboard.ts
…n check and test mock
- ensureDashboardForward: replace blob substring match with line-level
parser so the collision check is scoped to the exact port entry
(format: "<port> -> <sandbox>:<port>") rather than the full listing
- Error message now references CHAT_UI_URL instead of
NEMOCLAW_DASHBOARD_PORT — the env var only applies to OpenClaw default
port; agents with explicit forwardPort in their manifest need
CHAT_UI_URL to override the local listen address
- test/onboard.test.ts: fix runCapture mock to match
openshellShellCommand output ('\''sandbox'\'' '\''exec'\'')
so the dashboard-ready loop exits on the first iteration instead
of sleeping 28 s and timing out the test
- Remove duplicate src/lib/gateway-state.test.ts (superseded by
the existing test/gateway-state.test.ts which covers the same
functions with broader scenarios)
Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/lib/onboard.ts (1)
5564-5575:⚠️ Potential issue | 🟠 MajorMatch the forwarded endpoint by port suffix, not bare-token equality.
Line 5572 only matches when
openshell forward listprints the local endpoint as the bare port. If the CLI emits127.0.0.1:${portToStop}or0.0.0.0:${portToStop}, this guard misses the collision andforward stopcan still steal another sandbox’s forward.Suggested fix
const portLine = existingForwards ?.split("\n") .map((l) => l.trim()) .find((l) => { - const localPort = l.split(/\s+/)[0]; - return localPort === portToStop; + const localEndpoint = l.split(/\s+/)[0] || ""; + return localEndpoint === portToStop || localEndpoint.endsWith(`:${portToStop}`); });What is the output format of `openshell forward list` for local forwards? Does the local endpoint column include the host (for example `127.0.0.1:18789` or `0.0.0.0:8642`) or only the bare port?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 5564 - 5575, The current match logic in the block using runCaptureOpenshell(["forward", "list"]) only compares the first token as a bare port (variable portToStop) and misses entries like "127.0.0.1:1234"; update the find and parsing so you compare the port suffix instead: inside the .find callback extract the local endpoint token (same split(/\s+/)[0]), then derive its port by taking the substring after the last ":" (or the whole token if no ":"), and compare that port string to portToStop; likewise compute portOwner from portLine the same way (keep existing portOwner extraction from split(/\s+/)[2] but treat that token's sandbox portion the same) so collisions where host:port is printed are detected (refer to runCaptureOpenshell, portToStop, portLine, portOwner, sandboxName).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 5564-5575: The current match logic in the block using
runCaptureOpenshell(["forward", "list"]) only compares the first token as a bare
port (variable portToStop) and misses entries like "127.0.0.1:1234"; update the
find and parsing so you compare the port suffix instead: inside the .find
callback extract the local endpoint token (same split(/\s+/)[0]), then derive
its port by taking the substring after the last ":" (or the whole token if no
":"), and compare that port string to portToStop; likewise compute portOwner
from portLine the same way (keep existing portOwner extraction from
split(/\s+/)[2] but treat that token's sandbox portion the same) so collisions
where host:port is printed are detected (refer to runCaptureOpenshell,
portToStop, portLine, portOwner, sandboxName).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b4f72c64-7cac-45b3-8947-96cc64c7f933
📒 Files selected for processing (2)
src/lib/onboard.tstest/onboard.test.ts
…mocks openshell forward list outputs SANDBOX BIND PORT PID STATUS columns. The ensureDashboardForward parser was reading column [0] (SANDBOX name) as the port and column [2] with colon-split for the owner, which never matched any real output — silently disabling the collision guard. Fix: read PORT from column [2] and SANDBOX name from column [0]. Update all test mocks from the fictional "PORT -> NAME:PORT" format to match the real openshell output: "NAME 127.0.0.1 PORT PID STATUS". Regression introduced in this branch; related to open issue NVIDIA#2007. Signed-off-by: Dongni Yang <dongniy@nvidia.com>
|
✨ Thanks for submitting this PR that proposes a fix to the dashboard wait and forward port collision. The changes and test plan you provided will help us review this further. Possibly related to issue #2063. Possibly closes: Possibly related open issues: |
Exercises the ensureDashboardForward collision path added in this PR: mock `forward list` to return a different sandbox owning port 18789, then assert createSandbox throws with the expected error message.
ericksoa
left a comment
There was a problem hiding this comment.
LGTM — the two bug fixes are correct and well-scoped:
-
Bug 1 (readiness loop): Trivial one-token fix replacing hardcoded
CONTROL_UI_PORTwith the already-computedeffectivePort. Clearly correct. -
Bug 2 (forward collision guard): Line-level parsing of
openshell forward listcolumns (SANDBOX BIND PORT PID STATUS) with exact port match at column 2 and sandbox-name comparison at column 0. Error message correctly referencesCHAT_UI_URLfor agent-specific port overrides.
I pushed one additional commit adding a test that exercises the collision detection path (mock forward list returning a different sandbox name, assert the expected error). All 136 tests pass.
Minor note: PR description still references "20 new unit tests" in gateway-state.test.ts, but those were added then removed within the PR (duplicate of existing test/gateway-state.test.ts). The description could be updated but isn't blocking.
## Summary - TC-SBX-10 fails because PR #2086 added a port-collision guard in `ensureDashboardForward()` but the E2E test was not updated to use a different `CHAT_UI_URL` for sandbox B - Sets `CHAT_UI_URL=http://127.0.0.1:18790` when onboarding the second sandbox ## Test plan - [ ] Nightly E2E `sandbox-operations-e2e` job passes (TC-SBX-10, TC-SBX-11) Fixes regression from #2086. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Updated test infrastructure configuration for sandbox operations. --- **Note:** This release contains no user-facing changes. Updates are limited to internal testing procedures. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Catches up the user-facing reference and troubleshooting docs with the CLI and policy behavior changes that landed in v0.0.21. Drafted via the `nemoclaw-contributor-update-docs` skill against commits in `v0.0.20..v0.0.21`, filtered through `docs/.docs-skip`. ## Changes - **`docs/reference/commands.md`** - `nemoclaw list`: session indicator (●) for connected sandboxes (#2117). - `nemoclaw <name> connect`: active-session note; auto-recovery from SSH identity drift after a host reboot (#2117, #2064). - `nemoclaw <name> status`: three-state Inference line (`healthy` / `unreachable` / `not probed`) covering both local and remote providers; new `Connected` line (#2002, #2117). - `nemoclaw <name> destroy` and `rebuild`: active-session warning with second confirm; rebuild reapplies policy presets to the recreated sandbox (#2117, #2026). - `nemoclaw <name> policy-add` and `policy-remove`: positional preset argument and non-interactive flow via `--yes`/`--force`/`NEMOCLAW_NON_INTERACTIVE=1` (#2070). - `nemoclaw <name> policy-list`: registry-vs-gateway desync detection (#2089). - **`docs/reference/troubleshooting.md`** - `Reconnect after a host reboot`: now reflects automatic stale `known_hosts` pruning on `connect` (#2064). - `Running multiple sandboxes simultaneously`: onboard's forward-port collision guard (#2086). - New section: `openclaw config set` or `unset` is blocked inside the sandbox (#2081). - **`docs/network-policy/customize-network-policy.md`**: non-interactive `policy-add`/`policy-remove` form; preset preservation across rebuild (#2070, #2026). - **`docs/inference/use-local-inference.md`**: NIM section now covers the NGC API key prompt with masked input and `docker login nvcr.io --password-stdin` behavior (#2043). - **Generated skills regenerated** to pick up the source changes (`.agents/skills/nemoclaw-user-reference/references/{commands,troubleshooting}.md`, plus minor heading-flow deltas elsewhere). The pre-commit `Regenerate agent skills from docs` hook ran and confirmed source ↔ generated parity. Commits skipped per `docs/.docs-skip` or no doc impact: `bbbaa0fb` (skip-features), `7cb482cb` (skip-features), `8dee23fd` (skip-terms), plus the usual CI / test / refactor / install-plumbing churn. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes for the modified files (the one failing test, `test/cli.test.ts > unknown command exits 1`, also fails on `origin/main` and is unrelated to these markdown-only changes) - [ ] `npm test` passes — skipped; same pre-existing CLI-dispatch test failure unrelated to docs - [ ] Tests added or updated for new or changed behavior — n/a, doc-only - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — not run locally - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) — n/a, no new pages ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Multi-session SSH connections with concurrent session support. * Three-state inference health reporting (healthy/unreachable/not probed) across all providers. * Automatic SSH host key rotation detection and recovery. * Non-interactive policy preset management via positional arguments. * Session indicators in sandbox list view. * **Bug Fixes** * Protected destructive operations with active-session warnings. * Policy presets now preserved during sandbox rebuilds. * **Documentation** * NGC registry authentication requirements for container images. * Multi-sandbox onboarding and reconnection guidance. * Troubleshooting updates for port conflicts and SSH issues. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #2063 —
nemoclaw onboard --agent hermesfails when an existing OpenClaw sandbox is already running.Two bugs found and fixed:
Bug 1 (
onboard.ts) — The sandbox readiness wait loop hardcodedDASHBOARD_PORT(18789) when probing inside every sandbox. Hermes runs on port 8642, so the loop always timed out after 28 s and printed a misleading "Dashboard taking longer than expected" warning.Bug 2 (
ensureDashboardForward) — No guard against a port already forwarded by a different sandbox. A second onboard using the same port would silently steal the first sandbox's forward, leaving it unreachable.Changes
src/lib/onboard.tseffectivePort(=agent.forwardPortfor non-OpenClaw agents) instead of hardcodedDASHBOARD_PORTin the readiness wait loopsrc/lib/onboard.tsensureDashboardForward: checkopenshell forward listbefore stop/start; throw a clear error if the port belongs to a different sandboxsrc/lib/gateway-state.test.tsisGatewayHealthy,getGatewayReuseState,isSandboxReady,hasStaleGatewayRoot cause
createSandboxalready computeseffectivePort = agent ? agent.forwardPort : CONTROL_UI_PORTbut the readiness loop immediately below usedDASHBOARD_PORTdirectly. One-token fix makes the loop consistent with the rest of the function.Test plan
npm run build:cli— no TypeScript errorsnpm test— 1634 pass, 0 new failuresnemoclaw onboardthennemoclaw onboard --agent hermes— both complete;openshell forward listshows ports 18789 and 8642 as distinct entriesSigned-off-by: Dongni Yang dongniy@nvidia.com
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests