Skip to content

fix(onboard): preserve concurrent instance gateway and dashboard during onboard#4598

Merged
cv merged 33 commits into
mainfrom
fix/4422-refuse-gateway-drift-on-live-sandbox
Jun 5, 2026
Merged

fix(onboard): preserve concurrent instance gateway and dashboard during onboard#4598
cv merged 33 commits into
mainfrom
fix/4422-refuse-gateway-drift-on-live-sandbox

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

@laitingsheng laitingsheng commented Jun 1, 2026

Summary

Two preflight cleanup paths assumed the OpenShell gateway and dashboard forward were process-wide singletons. When a second NemoClaw onboard ran with NEMOCLAW_GATEWAY_PORT=N, the preflight retired the existing per-port gateway as "legacy" and killed the first sandbox's dashboard SSH forward — leaving the first sandbox unreachable. This PR scopes both cleanups so the second instance starts its own gateway alongside the first instead of replacing it.

Related Issue

Fixes #4422 · Refs #3053

#4422 is the specific SIGKILL-on-second-onboard symptom: a second onboard with NEMOCLAW_GATEWAY_PORT=N destroyed the previous instance's per-port gateway and dashboard forward. This PR fixes both preflight cleanups so concurrent onboards no longer step on each other.

#3053 is the broader ask — full multi-instance segregation of registry, credentials, snapshots, messaging, and lifecycle behind a configurable NEMOCLAW_INSTANCE identity. That work is out of scope here and tracked separately; this PR removes the destructive cross-talk that previously prevented two NemoClaw-managed sandboxes from coexisting at all, but does not yet introduce the instance identity primitive.

Changes

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Manage concurrent gateway ports safely across multiple sandboxes on the same host.
  • Bug Fixes

    • Improved cleanup for orphaned SSH port-forwards that block dashboard ports.
  • Tests

    • Added E2E test validating concurrent gateway-port scenarios.
    • Added/updated unit tests for gateway-state and orphaned-forward handling.
  • Chores

    • Added nightly E2E workflow job for concurrent gateway port testing and integrated it into reporting.
  • Documentation

    • Expanded nightly E2E job documentation for related tests.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Gateway handler and preflight logic updated to handle concurrent gateways (foreign-active case) and delegate orphaned OpenShell forward cleanup to a new helper; unit tests added; a comprehensive E2E script validates concurrent gateway/dashboard allocation; nightly workflow runs the test and uploads failure artifacts.

Changes

Concurrent Gateway Ports Support

Layer / File(s) Summary
Gateway handler foreign-active state handling
src/lib/onboard/machine/handlers/gateway.ts, src/lib/onboard/machine/handlers/gateway.test.ts
When gatewayReuseState is foreign-active, legacy metadata retirement is skipped and the state is normalized to missing before starting the Docker-driver gateway. Unit test verifies retirement and legacy-replacement note are bypassed while the gateway starts.
Orphaned dashboard forward helper and tests
src/lib/onboard/orphaned-dashboard-forward.ts, src/lib/onboard/orphaned-dashboard-forward.test.ts
Adds tryCleanupOrphanedDashboardForward plus DI types and runners; implements outcome classifications (not-openshell, list-failed, owned-by-live, killed-cleared, killed-still-blocked) and tests covering each classification and control flow.
Preflight port conflict awareness and wiring
src/lib/onboard.ts
Preflight now calls tryCleanupOrphanedDashboardForward when a dashboard port is blocked by an SSH listener, refreshing port checks only if cleanup freed the port and otherwise following the existing port-unavailable error path.
Concurrent gateway ports E2E test
test/e2e/test-concurrent-gateway-ports.sh
New bash E2E that starts a fake OpenAI server, ensures no default-install sandbox, onboards Sandbox A on 8080 and Sandbox B on a separate gateway port, verifies distinct dashboard ports, listeners, nemoclaw list contents, and that destroying B preserves A.
Nightly E2E workflow integration
.github/workflows/nightly-e2e.yaml
Workflow docs and workflow_dispatch inputs updated; new job concurrent-gateway-ports-e2e added with repo-guard and selective-dispatch; job runs checkout, Docker Hub auth, NemoClaw install, invokes the E2E script, and uploads sandbox A/B onboard and sandbox B destroy artifacts on failure; job wired into notify-on-failure, report-to-pr, and scorecard needs.

Sequence Diagram(s)

sequenceDiagram
  participant Workflow as Nightly CI
  participant Job as concurrent-gateway-ports-e2e
  participant Test as test-concurrent-gateway-ports.sh
  participant FakeServer as Fake_OpenAI_Server
  participant SandboxA as Sandbox_A
  participant SandboxB as Sandbox_B

  Workflow->>Job: Trigger (schedule or workflow_dispatch)
  Job->>Job: Authenticate to Docker Hub
  Job->>Test: Execute test script
  Test->>FakeServer: Start fake OpenAI server
  Test->>SandboxA: Onboard (gateway port 8080)
  SandboxA->>SandboxA: Allocate dashboard port (e.g. 18789)
  Test->>SandboxB: Onboard (alternate gateway port)
  SandboxB->>SandboxB: Allocate distinct dashboard port
  Test->>Test: Verify both listeners and sandbox phases
  Test->>SandboxB: Destroy sandbox B
  Test->>SandboxA: Verify sandbox A remains healthy
  Test->>Job: Exit with results
  Job->>Workflow: Upload artifacts on failure
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

fix, Sandbox, Docker, E2E, platform: container

Suggested reviewers

  • prekshivyas
  • cv

Poem

🐰 Two gateways hop, each on its own port,
Dashboards find homes, no collisions to thwart.
Tests spin a fake server to keep the beat,
Nightly CI listens and gathers each log sheet.
A rabbit cheers: concurrent and neat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: preventing concurrent instance gateway/dashboard destruction during onboard when using different gateway ports.
Linked Issues check ✅ Passed The PR implements core requirements from #4422: skip retiring legacy gateway when gatewayReuseState is 'foreign-active' so concurrent instances preserve each other's gateways; preserve dashboard ports by checking if SSH forward is owned by another live sandbox before killing it; and adds E2E verification.
Out of Scope Changes check ✅ Passed All changes directly address the linked issue #4422: gateway.ts/onboard.ts handle concurrent gateway/dashboard preservation, orphaned-dashboard-forward.ts/test implements cleanup logic, E2E test validates concurrent instances, and CI workflow registers the test.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/4422-refuse-gateway-drift-on-live-sandbox

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

PR Review Advisor

Findings: 2 needs attention, 4 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 5 still apply, 0 new items found

Review findings

🛠️ Needs attention

🔎 Worth checking

  • Source-of-truth review needed: src/lib/onboard/orphaned-dashboard-forward.ts: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: The helper documents outcome meanings and safe skip behavior, but no comment or tracking reference identifies the source-level fix or removal condition.
  • Clarify strict behavior for explicit dashboard-port conflicts (src/lib/onboard.ts:2083): The orphaned-forward cleanup path treats `owned-by-live` and `list-failed` as a continue path from preflight, relying on later dashboard auto-allocation. That is safe for default auto-allocation, but the same branch is reached when `--control-ui-port` or a present `NEMOCLAW_DASHBOARD_PORT` selected a specific port, where users may expect that exact port to be honored or the command to fail.
    • Recommendation: Define and test the contract for explicit dashboard ports held by a live foreign sandbox or when `openshell forward list` fails. If explicit ports must be strict, fall through to the generic port-blocked error instead of continuing; if auto-allocation is intended, document it and add tests.
    • Evidence: `_preflightDashboardPort = opts.controlUiPort ?? (process.env.NEMOCLAW_DASHBOARD_PORT != null ? DASHBOARD_PORT : null)` makes explicit env/CLI ports enter this preflight check. The new `outcome.kind !== "not-openshell"` branch continues for `owned-by-live` and `list-failed`.
  • Document the removal condition for orphaned-forward cleanup (src/lib/onboard/orphaned-dashboard-forward.ts:42): The new helper is a localized recovery/workaround around OpenShell forward state. It documents the invalid states and has useful negative-path tests, but it does not state when the workaround can be removed, which risks permanent source-of-truth drift instead of fixing forward ownership/lifecycle at the source.
    • Recommendation: Add a short comment or tracking reference explaining the source-level fix and removal condition, such as when OpenShell/NemoClaw records per-sandbox dashboard-forward ownership and reconciles stale forwards at create/destroy time rather than inferring ownership from process command lines.
    • Evidence: `orphaned-dashboard-forward.ts` explains `not-openshell`, `list-failed`, `owned-by-live`, and kill outcomes, and tests cover those branches. No code comment identifies the upstream/source fix or removal condition.
  • Add a workflow guard for target_ref secret withholding (.github/workflows/nightly-e2e.yaml:2091): The workflow already encodes a trusted-code boundary for Docker Hub credentials, but the new dispatchable job bypasses that pattern for NVIDIA_API_KEY. Without a static guard, future jobs can repeat the same pattern even after this job is fixed.
    • Recommendation: Add or extend workflow lint/static tests to assert that jobs using the target-ref checkout do not pass repository secrets to run steps when workflow_dispatch has a non-empty `target_ref`, unless the job checks out trusted workflow-ref scripts and documents the boundary.
    • Evidence: The `dockerhub-auth-step` is gated with `if: ${{ github.event_name != 'workflow_dispatch' || inputs.target_ref == '' }}`, while this new job directly injects `NVIDIA_API_KEY` into checked-out code.

🌱 Nice ideas

  • None.
Consider writing more tests for
Since last review details

Current findings:

  • Source-of-truth review needed: src/lib/onboard/orphaned-dashboard-forward.ts: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: The helper documents outcome meanings and safe skip behavior, but no comment or tracking reference identifies the source-level fix or removal condition.
  • Withhold NVIDIA_API_KEY from checked-out target_ref code (.github/workflows/nightly-e2e.yaml:2096): The new concurrent-gateway-ports job reuses the workflow's target-ref checkout, which can run code from `${{ inputs.target_ref || github.ref }}` during workflow_dispatch, but still passes the repository NVIDIA_API_KEY secret to both checked-out `install.sh` and the checked-out E2E script. A selected PR/head ref could modify either file and exfiltrate the secret.
    • Recommendation: Remove NVIDIA_API_KEY from this job if the fake OpenAI endpoint is sufficient, or gate it the same way as Docker Hub credentials so it is empty when `github.event_name == 'workflow_dispatch' && inputs.target_ref != ''`. Add a static workflow guard for this trusted-code boundary.
    • Evidence: The checkout anchor uses `ref: ${{ inputs.target_ref || github.ref }}` and nearby comments already say explicit target_ref can execute untrusted PR-head code. The new job passes `NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}` to `bash install.sh` and to `bash test/e2e/test-concurrent-gateway-ports.sh`, while the script uses a local fake endpoint with `COMPATIBLE_API_KEY=dummy`.
  • Exercise the remaining [WSL2 x86_64][Sandbox] NEMOCLAW_GATEWAY_PORT=N onboard recreates global gateway and destroys previous sandbox — concurrent instances unsupported #4422 coexistence clauses (test/e2e/test-concurrent-gateway-ports.sh:319): The new E2E validates core coexistence, distinct gateway/dashboard ports, list output, and destroying B leaves A healthy, but it does not exercise [WSL2 x86_64][Sandbox] NEMOCLAW_GATEWAY_PORT=N onboard recreates global gateway and destroys previous sandbox — concurrent instances unsupported #4422's explicit `connect` requirement or independent agent-state/no-cross-talk requirement. Because the PR is marked Fixes [WSL2 x86_64][Sandbox] NEMOCLAW_GATEWAY_PORT=N onboard recreates global gateway and destroys previous sandbox — concurrent instances unsupported #4422, those untested clauses leave acceptance incomplete.
  • Clarify strict behavior for explicit dashboard-port conflicts (src/lib/onboard.ts:2083): The orphaned-forward cleanup path treats `owned-by-live` and `list-failed` as a continue path from preflight, relying on later dashboard auto-allocation. That is safe for default auto-allocation, but the same branch is reached when `--control-ui-port` or a present `NEMOCLAW_DASHBOARD_PORT` selected a specific port, where users may expect that exact port to be honored or the command to fail.
    • Recommendation: Define and test the contract for explicit dashboard ports held by a live foreign sandbox or when `openshell forward list` fails. If explicit ports must be strict, fall through to the generic port-blocked error instead of continuing; if auto-allocation is intended, document it and add tests.
    • Evidence: `_preflightDashboardPort = opts.controlUiPort ?? (process.env.NEMOCLAW_DASHBOARD_PORT != null ? DASHBOARD_PORT : null)` makes explicit env/CLI ports enter this preflight check. The new `outcome.kind !== "not-openshell"` branch continues for `owned-by-live` and `list-failed`.
  • Document the removal condition for orphaned-forward cleanup (src/lib/onboard/orphaned-dashboard-forward.ts:42): The new helper is a localized recovery/workaround around OpenShell forward state. It documents the invalid states and has useful negative-path tests, but it does not state when the workaround can be removed, which risks permanent source-of-truth drift instead of fixing forward ownership/lifecycle at the source.
    • Recommendation: Add a short comment or tracking reference explaining the source-level fix and removal condition, such as when OpenShell/NemoClaw records per-sandbox dashboard-forward ownership and reconciles stale forwards at create/destroy time rather than inferring ownership from process command lines.
    • Evidence: `orphaned-dashboard-forward.ts` explains `not-openshell`, `list-failed`, `owned-by-live`, and kill outcomes, and tests cover those branches. No code comment identifies the upstream/source fix or removal condition.
  • Add a workflow guard for target_ref secret withholding (.github/workflows/nightly-e2e.yaml:2091): The workflow already encodes a trusted-code boundary for Docker Hub credentials, but the new dispatchable job bypasses that pattern for NVIDIA_API_KEY. Without a static guard, future jobs can repeat the same pattern even after this job is fixed.
    • Recommendation: Add or extend workflow lint/static tests to assert that jobs using the target-ref checkout do not pass repository secrets to run steps when workflow_dispatch has a non-empty `target_ref`, unless the job checks out trusted workflow-ref scripts and documents the boundary.
    • Evidence: The `dockerhub-auth-step` is gated with `if: ${{ github.event_name != 'workflow_dispatch' || inputs.target_ref == '' }}`, while this new job directly injects `NVIDIA_API_KEY` into checked-out code.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

E2E Advisor Recommendation

Required E2E: concurrent-gateway-ports-e2e
Optional E2E: double-onboard-e2e, tunnel-lifecycle-e2e, sandbox-survival-e2e

Dispatch hint: concurrent-gateway-ports-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • concurrent-gateway-ports-e2e (medium): Direct coverage for the changed behavior: two sandboxes on one host using distinct NEMOCLAW_GATEWAY_PORT values, live dashboard forward segregation, no collateral cleanup of another sandbox's gateway/forward, and destroying one sandbox without breaking the other.

Optional E2E

  • double-onboard-e2e (medium): Adjacent confidence for repeated onboard/re-onboard lifecycle behavior on a single host; useful because this PR changes onboard gateway reuse and preflight cleanup paths, but it does not specifically exercise concurrent gateway ports.
  • tunnel-lifecycle-e2e (medium): Adjacent confidence for OpenShell forward/tunnel lifecycle behavior, relevant to the dashboard SSH forward cleanup change but less directly targeted than the new concurrent gateway ports E2E.
  • sandbox-survival-e2e (medium): Optional sandbox lifecycle regression check around gateway stop/start and sandbox survival. Helpful because gateway reuse/recreation logic changed, but not merge-blocking when the targeted concurrent gateway test passes.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: concurrent-gateway-ports-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw-double-same-provider
Optional scenario E2E: ubuntu-repo-cloud-openclaw, ubuntu-repo-cloud-openclaw-resume, wsl-repo-cloud-openclaw

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-double-same-provider

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required scenario E2E

  • ubuntu-repo-cloud-openclaw-double-same-provider: Onboarding gateway reuse/startup logic and dashboard-forward preflight cleanup changed. This routed scenario is the closest scenario-suite coverage for repeated OpenClaw onboarding against an existing gateway/sandbox state on Ubuntu.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-double-same-provider

Optional scenario E2E

  • ubuntu-repo-cloud-openclaw: Provides baseline Ubuntu OpenClaw onboarding coverage for the changed preflight/gateway path, though it does not specifically exercise concurrent non-default gateway ports.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
  • ubuntu-repo-cloud-openclaw-resume: Adjacent lifecycle coverage for gateway state handling during resume paths touched by the gateway handler changes.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-resume
  • wsl-repo-cloud-openclaw: Optional platform-adjacent coverage because orphaned dashboard-forward cleanup relies on SSH/process/port behavior that can differ under WSL. Special-runner scenario, so not primary.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw

Relevant changed files

  • src/lib/onboard.ts
  • src/lib/onboard/machine/handlers/gateway.ts
  • src/lib/onboard/orphaned-dashboard-forward.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/state/gateway.ts (1)

54-82: ⚡ Quick win

Consider extracting the shared liveness predicate.

The row-liveness check (cols.includes("Ready") || cols.includes("Running")) && !cols.includes("NotReady") is now duplicated in isSandboxReady (Line 57) and listLiveSandboxNames (Line 77). Extracting a small isLiveSandboxRow(cols: string[]) helper keeps the two call sites from drifting if the OpenShell status vocabulary changes.

♻️ Proposed extraction
+function isLiveSandboxRow(cols: string[]): boolean {
+  return (cols.includes("Ready") || cols.includes("Running")) && !cols.includes("NotReady");
+}
+
 export function isSandboxReady(output: string, sandboxName: string): boolean {
   const cols = parseSandboxRow(output, sandboxName);
   if (!cols) return false;
-  return (cols.includes("Ready") || cols.includes("Running")) && !cols.includes("NotReady");
+  return isLiveSandboxRow(cols);
 }
@@
   for (const line of clean.split("\n")) {
     const cols = line.trim().split(/\s+/);
     if (cols.length < 2) continue;
     const name = cols[0];
     if (!name) continue;
-    if ((cols.includes("Ready") || cols.includes("Running")) && !cols.includes("NotReady")) {
+    if (isLiveSandboxRow(cols)) {
       names.push(name);
     }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/state/gateway.ts` around lines 54 - 82, The liveness predicate is
duplicated between isSandboxReady and listLiveSandboxNames; extract a helper
like isLiveSandboxRow(cols: string[]): boolean that implements
(cols.includes("Ready") || cols.includes("Running")) &&
!cols.includes("NotReady"), then replace the duplicated checks in isSandboxReady
(which calls parseSandboxRow) and listLiveSandboxNames (which splits lines into
cols) to call isLiveSandboxRow(cols) so both sites share the single canonical
predicate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/state/gateway.ts`:
- Around line 54-82: The liveness predicate is duplicated between isSandboxReady
and listLiveSandboxNames; extract a helper like isLiveSandboxRow(cols:
string[]): boolean that implements (cols.includes("Ready") ||
cols.includes("Running")) && !cols.includes("NotReady"), then replace the
duplicated checks in isSandboxReady (which calls parseSandboxRow) and
listLiveSandboxNames (which splits lines into cols) to call
isLiveSandboxRow(cols) so both sites share the single canonical predicate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b3649375-6721-4481-9cea-45c8b1ea1c68

📥 Commits

Reviewing files that changed from the base of the PR and between d5b6670 and 6737a62.

📒 Files selected for processing (6)
  • docs/reference/troubleshooting.mdx
  • src/lib/onboard.ts
  • src/lib/onboard/preflight-gateway-cleanup-decision.test.ts
  • src/lib/onboard/preflight-gateway-cleanup-decision.ts
  • src/lib/state/gateway.ts
  • test/gateway-state.test.ts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ❌ Some jobs failed

Run: 26734588554
Target ref: 6737a6229a21db222e3d6ba37d3bd1a8e4d5d822
Workflow ref: main
Requested jobs: double-onboard-e2e,onboard-negative-paths-e2e,sandbox-survival-e2e
Summary: 2 passed, 1 failed, 0 skipped

Job Result
double-onboard-e2e ✅ success
onboard-negative-paths-e2e ❌ failure
sandbox-survival-e2e ✅ success

Failed jobs: onboard-negative-paths-e2e. Check run artifacts for logs.

…omments)

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ❌ Some jobs failed

Run: 26736515114
Target ref: ab5648a2f091b6796a0a5285cfe70e42f81ea48d
Workflow ref: main
Requested jobs: double-onboard-e2e,onboard-negative-paths-e2e
Summary: 1 passed, 1 failed, 0 skipped

Job Result
double-onboard-e2e ✅ success
onboard-negative-paths-e2e ❌ failure

Failed jobs: onboard-negative-paths-e2e. Check run artifacts for logs.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ❌ Some jobs failed

Run: 26740349870
Target ref: 84bdda87bcbbd3ada9ef5238c80fd8755cba2dfb
Workflow ref: main
Requested jobs: double-onboard-e2e,onboard-negative-paths-e2e
Summary: 1 passed, 1 failed, 0 skipped

Job Result
double-onboard-e2e ✅ success
onboard-negative-paths-e2e ❌ failure

Failed jobs: onboard-negative-paths-e2e. Check run artifacts for logs.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…ay groundwork

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng changed the title fix(onboard): refuse gateway recreate when live sandboxes exist refactor(state): introduce getGatewayName resolver for parallel-gateway groundwork Jun 1, 2026
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26745027988
Target ref: 0ef1f56470297468a85768165430568b21f4ad4c
Workflow ref: main
Requested jobs: cloud-onboard-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26745392857
Target ref: a1c55fecdb28bd62a3c436410c4773629228c2f4
Workflow ref: main
Requested jobs: cloud-onboard-e2e,sandbox-survival-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
sandbox-survival-e2e ✅ success

…cessor

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26746811574
Target ref: fa4550762968788c27181380a13ab8983833ee9c
Workflow ref: main
Requested jobs: cloud-onboard-e2e,sandbox-operations-e2e,snapshot-commands-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
sandbox-operations-e2e ✅ success
snapshot-commands-e2e ✅ success

…ULT_GATEWAY_NAME

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26748043704
Target ref: fdddf5366f0aa16d42abbb0ccbddc137e84e11f2
Workflow ref: main
Requested jobs: sandbox-operations-e2e,inference-routing-e2e,snapshot-commands-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
inference-routing-e2e ✅ success
sandbox-operations-e2e ✅ success
snapshot-commands-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26749185051
Target ref: 48e58ae6fd800b4610f2099f9b671827a750930c
Workflow ref: main
Requested jobs: sandbox-operations-e2e,openclaw-inference-switch-e2e,sandbox-survival-e2e,snapshot-commands-e2e,onboard-resume-e2e
Summary: 4 passed, 0 failed, 0 skipped

Job Result
onboard-resume-e2e ✅ success
openclaw-inference-switch-e2e ✅ success
sandbox-operations-e2e ⚠️ cancelled
sandbox-survival-e2e ✅ success
snapshot-commands-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26749747697
Target ref: b9e28babae3df68b1c8ac6ee2c8379bd28e33449
Workflow ref: main
Requested jobs: cloud-onboard-e2e,sandbox-operations-e2e,openclaw-inference-switch-e2e,snapshot-commands-e2e,onboard-resume-e2e
Summary: 5 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
onboard-resume-e2e ✅ success
openclaw-inference-switch-e2e ✅ success
sandbox-operations-e2e ✅ success
snapshot-commands-e2e ✅ success

…es + stricter writes

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added refactor PR restructures code without intended behavior change and removed fix labels Jun 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ❌ Some jobs failed

Run: 26997817230
Target ref: fix/4422-refuse-gateway-drift-on-live-sandbox
Requested jobs: concurrent-gateway-ports-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
concurrent-gateway-ports-e2e ❌ failure

Failed jobs: concurrent-gateway-ports-e2e. Check run artifacts for logs.

…erminal

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ❌ Some jobs failed

Run: 26998381153
Target ref: fix/4422-refuse-gateway-drift-on-live-sandbox
Requested jobs: concurrent-gateway-ports-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
concurrent-gateway-ports-e2e ❌ failure

Failed jobs: concurrent-gateway-ports-e2e. Check run artifacts for logs.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26999184001
Target ref: fix/4422-refuse-gateway-drift-on-live-sandbox
Requested jobs: concurrent-gateway-ports-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
concurrent-gateway-ports-e2e ✅ success

@laitingsheng laitingsheng marked this pull request as ready for review June 5, 2026 06:43
@laitingsheng laitingsheng added nightly-e2e Nightly E2E test failures area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression area: e2e End-to-end tests, nightly failures, or validation infrastructure and removed nightly-e2e Nightly E2E test failures labels Jun 5, 2026
@laitingsheng laitingsheng changed the title refactor(state): introduce getGatewayName resolver for parallel-gateway groundwork fix(onboard): preserve concurrent instance gateway and dashboard during onboard Jun 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2089-2100: The code currently calls
runCaptureOpenshell(["forward","list"], {ignoreError: true, ...}) so
failures/timeouts are treated as empty output and may cause an incorrect kill;
change this to surface command failures and bail on unknown ownership: call
runCaptureOpenshell without ignoreError (or wrap it in try/catch), check for
errors/timeouts and if the command failed log a warning and skip killing the SSH
forward for that port (i.e. do not fall through to the PID kill), otherwise
proceed to call getOccupiedPorts(forwardListOutput) as before; update the
runCaptureOpenshell usage and surrounding control flow in the block that
references runCaptureOpenshell and getOccupiedPorts so ownership is only assumed
on successful command output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a79fe900-00d9-45be-90a6-125ebe458b06

📥 Commits

Reviewing files that changed from the base of the PR and between 40042b8 and beb9014.

📒 Files selected for processing (5)
  • .github/workflows/nightly-e2e.yaml
  • src/lib/onboard.ts
  • src/lib/onboard/machine/handlers/gateway.test.ts
  • src/lib/onboard/machine/handlers/gateway.ts
  • test/e2e/test-concurrent-gateway-ports.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/nightly-e2e.yaml

Comment thread src/lib/onboard.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26999971280
Target ref: beb901495aae1048285cfcc1814755e1e909890f
Workflow ref: main
Requested jobs: sandbox-survival-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
sandbox-survival-e2e ✅ success

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2083-2089: The guard currently only invokes
tryCleanupOrphanedDashboardForward when port === DASHBOARD_PORT, which skips
orphan-forward handling when a custom --control-ui-port is used; update the
condition so any ssh listener for the dashboard check is handled regardless of
the numeric DASHBOARD_PORT (for example check portCheck.process === "ssh" &&
(port === DASHBOARD_PORT || label === "dashboard") or otherwise detect the
dashboard check by its label) and then call tryCleanupOrphanedDashboardForward
with the same args; if outcome.kind === "killed-still-blocked" replace portCheck
with outcome.portCheck, else if outcome.kind !== "not-openshell" continue — keep
the existing outcome handling but remove the strict numeric DASHBOARD_PORT
requirement so createSandbox can auto-allocate a different dashboard port later.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 25898d10-0f69-49a3-a11a-9ac86433b02f

📥 Commits

Reviewing files that changed from the base of the PR and between beb9014 and 69dcdef.

📒 Files selected for processing (6)
  • .github/workflows/nightly-e2e.yaml
  • src/lib/onboard.ts
  • src/lib/onboard/machine/handlers/gateway.test.ts
  • src/lib/onboard/orphaned-dashboard-forward.test.ts
  • src/lib/onboard/orphaned-dashboard-forward.ts
  • test/e2e/test-concurrent-gateway-ports.sh
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/lib/onboard/machine/handlers/gateway.test.ts
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-concurrent-gateway-ports.sh

Comment thread src/lib/onboard.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 27001186985
Target ref: 69dcdefee627aadbd9268eeaaa4902645ef36b71
Workflow ref: main
Requested jobs: tunnel-lifecycle-e2e,sandbox-survival-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
sandbox-survival-e2e ✅ success
tunnel-lifecycle-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 27001064634
Target ref: fix/4422-refuse-gateway-drift-on-live-sandbox
Requested jobs: concurrent-gateway-ports-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
concurrent-gateway-ports-e2e ✅ success

@laitingsheng laitingsheng added the v0.0.60 Release target label Jun 5, 2026
@laitingsheng laitingsheng requested a review from prekshivyas June 5, 2026 07:52
@laitingsheng laitingsheng removed the feature PR adds or expands user-visible functionality label Jun 5, 2026
Copy link
Copy Markdown
Contributor

@prekshivyas prekshivyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed against the current head (69dcdef) — refreshing my earlier approval, which predated a substantial rework. Verified the fix end to end, including the primitives it depends on:

  • Gateway skip: gateway.ts skips retireLegacyGatewayForDockerDriverUpgrade when gatewayReuseState === "foreign-active" and normalizes to missing, so a second onboard starts its own per-port gateway alongside instead of retiring the neighbor's. The foreign-active state is real production logic — getGatewayReuseState (src/lib/state/gateway.ts) returns it when a live gateway with a different name exists ((connected || activeInfo) && activeGatewayName !== gatewayName), i.e. exactly the concurrent-instance case. Not dead code, and it's a pure, separately-tested function.
  • Dashboard forward: the extracted tryCleanupOrphanedDashboardForward helper only kills a forward when there is no live owner. Ownership comes from getOccupiedPorts, which maps port→sandbox but only for live forwards — so a stale entry is killable and a live foreign owner is protected. list-failed and owned-by-live both skip the kill (and forward list is deliberately allowed to throw rather than swallow to empty, preventing a wrongful kill on enumeration failure).
  • Integration: the onboard.ts caller, inside the for (const {port} of requiredPorts) loop, handles every outcome correctly — killed-cleared/owned-by-live/list-failedcontinue (proceed, auto-allocate a different dashboard port); not-openshell/killed-still-blocked → fall through to the port-blocked error with refreshed diagnostics.

Unit tests cover the foreign-active no-retire branch and the helper outcomes; the concurrent-gateway-ports E2E asserts two sandboxes reach Ready on distinct gateway/dashboard ports and survive each other's teardown. CI green (28 pass / 1 skip). Good to merge.

…ay-drift-on-live-sandbox

# Conflicts:
#	.github/workflows/nightly-e2e.yaml
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/lib/onboard.ts (1)

2083-2089: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle non-default dashboard ports in the orphan-forward path.

This still gates the helper on port === DASHBOARD_PORT, so --control-ui-port <non-default> skips the cleanup/ownership check and falls straight into the fatal port-blocked path even though createSandbox() later auto-allocates a different dashboard port. That keeps the concurrent-instance bug alive for custom dashboard ports.

💡 Suggested fix
-      if (port === DASHBOARD_PORT && portCheck.process === "ssh" && portCheck.pid) {
+      if (envVar === "NEMOCLAW_DASHBOARD_PORT" && portCheck.process === "ssh" && portCheck.pid) {
         const outcome = await tryCleanupOrphanedDashboardForward({
           port, pid: portCheck.pid, label, portCheckOptions,
           captureProcessArgs, runCaptureOpenshell, run, sleepSeconds, checkPortAvailable,
         });
         if (outcome.kind === "killed-still-blocked") portCheck = outcome.portCheck;
         else if (outcome.kind !== "not-openshell") continue;
       }

As per coding guidelines, src/lib/onboard.ts: "This file contains core onboarding logic. Changes here affect the full sandbox creation and configuration flow."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 2083 - 2089, The code currently only calls
tryCleanupOrphanedDashboardForward when port === DASHBOARD_PORT, which skips
orphan-forward cleanup for custom dashboard ports; remove that gate and always
invoke tryCleanupOrphanedDashboardForward (passing the current port, pid, label,
portCheckOptions, captureProcessArgs, runCaptureOpenshell, run, sleepSeconds,
checkPortAvailable) so any non-default dashboard port is checked/cleaned before
falling into the fatal blocked-port path; keep the existing outcome handling
(use outcome.portCheck when kind === "killed-still-blocked" and continue only
when kind === "not-openshell").
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 2083-2089: The code currently only calls
tryCleanupOrphanedDashboardForward when port === DASHBOARD_PORT, which skips
orphan-forward cleanup for custom dashboard ports; remove that gate and always
invoke tryCleanupOrphanedDashboardForward (passing the current port, pid, label,
portCheckOptions, captureProcessArgs, runCaptureOpenshell, run, sleepSeconds,
checkPortAvailable) so any non-default dashboard port is checked/cleaned before
falling into the fatal blocked-port path; keep the existing outcome handling
(use outcome.portCheck when kind === "killed-still-blocked" and continue only
when kind === "not-openshell").

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 135a8518-cc25-4a94-b715-3b97eeef124d

📥 Commits

Reviewing files that changed from the base of the PR and between 69dcdef and c1e28bd.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • src/lib/onboard.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/nightly-e2e.yaml

@cv cv merged commit 668f2a1 into main Jun 5, 2026
33 checks passed
@cv cv deleted the fix/4422-refuse-gateway-drift-on-live-sandbox branch June 5, 2026 17:54
miyoungc added a commit that referenced this pull request Jun 6, 2026
## Summary
- Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the
dev announcement from discussion #4877.
- Fills the source-doc gaps found during release-prep review across
inference, policy tiers, command behavior, security boundaries, Hermes
dashboard/tooling, runtime context, and troubleshooting.
- Refreshes generated agent skills under `.agents/skills/` from the
current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`.

## Source summary
- #4037 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
system-only runtime context that stays out of visible chat.
- #4875 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
try-first sandbox network/filesystem guidance and clearer failure
classification.
- #4788 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents shared OpenClaw
device-approval policy for startup and connect.
- #4768 -> `docs/reference/network-policies.mdx`,
`docs/network-policy/integration-policy-examples.mdx`,
`docs/get-started/quickstart.mdx`,
`docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`:
Documents `weather`, `public-reference`, and Hermes managed-tool gateway
preset behavior.
- #3788 and #4864 -> `docs/reference/network-policies.mdx`,
`docs/reference/commands.mdx`: Documents non-interactive policy-tier
fail-fast behavior and interactive prompt fallback.
- #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware
default sandbox resolution for `list`, `status`, and `tunnel` commands.
- #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel
status` behavior.
- #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy
preset descriptions in `policy-list`.
- #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents
package-managed OpenShell gateway service and Docker-driver
gateway-marker behavior.
- #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent
gateway/dashboard cleanup isolation by sandbox name and port.
- #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU
patch rollback behavior.
- #4610 -> `docs/reference/troubleshooting.mdx`,
`docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission
guidance aligned and removes skipped experimental wording.
- #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling
for custom `onboard --from <Dockerfile>` contexts in generated skills.
- #4870 -> `docs/reference/commands.mdx`,
`docs/manage-sandboxes/runtime-controls.mdx`: Documents
`NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage.
- #4641 -> `docs/inference/inference-options.mdx`,
`docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM
platform-digest pulls and served-model id adoption.
- #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents
stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash
coverage.
- #4852 -> `docs/inference/use-local-inference.mdx`,
`docs/reference/troubleshooting.mdx`: Documents Ollama model fit
filtering, 16K context floor, cold-load retry, and failed-model
exclusion.
- #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents
API-family sync, Hermes `api_mode`, and Bedrock Runtime exception.
- #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents
Nemotron managed-inference native tool-search fallback.
- #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents
interactive multimodal input prompting.
- #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass
normalization in generated troubleshooting coverage.
- #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents
prebuilt Hermes dashboard assets and TUI recovery without runtime
rebuilds.
- #4854 -> `docs/inference/switch-inference-providers.mdx`,
`docs/reference/commands.mdx`: Documents Hermes proxy API-key
placeholder preservation during inference switches.
- #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`.agents/skills/`: Keeps messaging enrollment behavior aligned with
manifest-hook implementation.
- #4771 -> `docs/security/best-practices.mdx`,
`docs/security/credential-storage.mdx`: Documents Hermes
placeholder-only secret boundary for sandbox-visible runtime files.
- #4787 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents expanded memory scanner
examples for OpenAI project keys and Slack app-level tokens.
- #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill
install mirroring into the agent home directory.
- #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep
structure and generated `.agents/skills/` refresh as the template for
this release.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/
--prefix nemoclaw-user --doc-platform fern-mdx --dry-run`
- `npm run docs`
- `git diff --check`
- skip-term scan across `docs/`, `.agents/skills/`, and `skills/`
- `npm run build:cli`
- `npm run typecheck:cli`
- Commit and pre-push hook suites, including markdownlint, gitleaks,
env-var docs gate, docs-to-skills verification, and skills YAML tests

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* DeepSeek-V4-Flash now available as default inference model for DGX
Station.
* Hermes dashboard improved with dedicated port and OAuth-authenticated
tool gateway selection.
* Added weather and public-reference policy presets for expanded agent
capabilities.
* Enhanced Ollama model selection with GPU memory filtering and
automatic retry for timeouts.

* **Bug Fixes**
  * Improved policy tier validation to prevent invalid configurations.
* Better sandbox cleanup scoping by port to prevent conflicts across
deployments.
  * Added GPU patch failure recovery with automatic rollback.

* **Documentation**
* Expanded troubleshooting guides for inference, security, and sandbox
lifecycle.
  * Added .dockerignore best practices for custom deployments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression v0.0.60 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[WSL2 x86_64][Sandbox] NEMOCLAW_GATEWAY_PORT=N onboard recreates global gateway and destroys previous sandbox — concurrent instances unsupported

5 participants