Skip to content

test(e2e): add dashboard reachability coverage#2123

Open
evantakahashi wants to merge 5 commits intoNVIDIA:mainfrom
evantakahashi:test/e2e-dashboard-reachability
Open

test(e2e): add dashboard reachability coverage#2123
evantakahashi wants to merge 5 commits intoNVIDIA:mainfrom
evantakahashi:test/e2e-dashboard-reachability

Conversation

@evantakahashi
Copy link
Copy Markdown

@evantakahashi evantakahashi commented Apr 21, 2026

Summary

Adds E2E coverage that the OpenClaw dashboard is actually reachable from the host on the forwarded port (default 127.0.0.1:18789) after nemoclaw onboard. Wires the script into nightly-e2e.yaml as a new top-level job with a 30-minute timeout.

Related Issue

Closes #2100

Changes

  • New: test/e2e/test-dashboard-reachability.sh with three test cases:
    • TC-DASH-01 — port bound on host (polled 30×1s; openshell forward start --background is async per src/lib/onboard.ts:5620)
    • TC-DASH-02 — dashboard returns HTTP 200 (polled 30×1s)
    • TC-DASH-03 — response body signature (hard fail on non-HTML; soft warn-pass when HTML lacks OpenClaw/Control-UI marker, since the dashboard may be an SPA shell)
  • Workflow: added dashboard-reachability-e2e job in .github/workflows/nightly-e2e.yaml (modeled on sandbox-operations-e2e) and wired it into notify-on-failure's needs: and failure if: expression.
  • Script follows the conventions of test/e2e/test-sandbox-operations.sh (timeout re-exec, preflight, onboard, teardown via EXIT trap, summary).

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes (excluding pre-existing flaky test/onboard.test.ts:3940 test, which reproduces on unmodified main — unrelated to this PR)
  • npm test — same note: flaky onboard.test.ts:3940 is pre-existing
  • Tests added for new behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes (N/A — test-only PR)

Local validation notes

  • Assertion logic validated via stub HTTP server across 5 scenarios (HTML with marker / SPA shell / JSON / HTTP 500 / nothing listening): all 15 assertions produced expected pass/fail.
  • Full end-to-end run could not be completed locally due to Docker-Desktop-for-macOS image-upload timeout during nemoclaw onboard. First real integration execution will be the next nightly run after merge. Happy to iterate on any issues surfaced by notify-on-failure.

AI Disclosure

  • AI-assisted — tool: Claude Code

Notes for reviewers

Things intentionally not covered by this PR (possible follow-ups):

  • Dashboard reachability after gateway restart / sandbox recreate
  • Multi-sandbox dashboards on different ports
  • Auth token attachment to the dashboard URL
  • CHAT_UI_URL env-var override path (test assumes default port 18789; CI does not set CHAT_UI_URL)
  • Gzip/chunked response bodies (curl default, uncompressed; low risk on localhost)

The TC-DASH-03 marker-check regex (openclaw|control[- ]?ui|nemoclaw) is intentionally soft — passes on HTML-structure alone when no marker is present — because the real dashboard HTML could not be verified from this repo (served by upstream OpenClaw). Happy to tighten this if maintainers confirm the expected markers.

Summary by CodeRabbit

  • Tests

    • Added an end-to-end dashboard reachability test that verifies port availability, HTTP 200 responses, and page content markers; produces timestamped logs, tracks pass/fail counts, enforces CI timeouts, and fails the job on test failures.
  • Chores

    • Added a scheduled CI job to run the dashboard reachability test, upload failure logs as artifacts, and updated failure-notification logic to include this job.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new E2E Bash test test-dashboard-reachability.sh and a GitHub Actions job dashboard-reachability-e2e that runs it (restricted to github.repository == 'NVIDIA/NemoClaw'); the workflow's notify-on-failure now depends on and checks this job's failure status. Tests verify host→pod dashboard reachability and HTML signature.

Changes

Cohort / File(s) Summary
Workflow Configuration
.github/workflows/nightly-e2e.yaml
Added dashboard-reachability-e2e job (repo-restricted) that runs test/e2e/test-dashboard-reachability.sh and uploads test-dashboard-reachability-*.log on failure; updated notify-on-failure needs and failure condition to include this job.
E2E Test Implementation
test/e2e/test-dashboard-reachability.sh
New strict-mode executable script with optional CI-wide timeout. Performs preflight (Docker, env vars, install), non-interactive sandbox onboarding, starts openshell forward background port-forward, runs three tests (port LISTEN via lsof, HTTP 200 polling via curl, HTML/marker body checks), logs results, performs teardown, and exits non-zero on failures.

Sequence Diagram

sequenceDiagram
    autonumber
    participant CI as CI/Test Runner
    participant Env as Docker\nNemoClaw Env
    participant Gateway as OpenClaw\nGateway
    participant Dashboard as Dashboard\nService
    participant Host as Host\nTCP Stack

    CI->>Env: preflight (Docker, env vars, install)
    CI->>Env: `nemoclaw onboard` sandbox
    Env-->>CI: sandbox ready
    CI->>Gateway: `openshell forward start --background` (port 18789)
    Gateway-->>CI: forward established
    CI->>Host: check LISTEN on 127.0.0.1:18789 (`lsof`)
    Host-->>CI: port bound
    CI->>Host: HTTP GET http://127.0.0.1:18789/ (`curl`)
    Host->>Gateway: forwarded request
    Gateway->>Dashboard: route request
    Dashboard-->>Gateway: HTTP 200 + HTML
    Gateway-->>Host: response
    Host-->>CI: HTTP 200 + body
    CI->>CI: verify HTML markers → pass/fail
    CI->>Gateway: stop forward
    CI->>Env: destroy sandbox / cleanup
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through ports and logs tonight,
I nudged the dashboard till it shone bright.
HTTP two-oh-oh and HTML unfurled,
Port-forwards held to show the world,
A tiny hop for tests that guard the site 🌿🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): add dashboard reachability coverage' directly and specifically summarizes the main change: adding E2E test coverage for dashboard reachability.
Linked Issues check ✅ Passed The PR implements all primary objectives from issue #2100: E2E script with pre-cleanup, onboarding, port-forward via openshell, HTTP 200 polling (30×1s), and body-signature validation.
Out of Scope Changes check ✅ Passed All changes are in-scope: new E2E test script and workflow integration to run the test nightly, directly supporting issue #2100's objectives.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/nightly-e2e.yaml (1)

365-377: Add timeout buffer so failure artifact upload is more reliable.

The job timeout (30m) matches the script’s default timeout (1800s). If both hit at the edge, the job can terminate before the artifact step runs.

Proposed tweak
       - name: Run dashboard reachability E2E test
         env:
           NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
           NEMOCLAW_NON_INTERACTIVE: "1"
           NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1"
           NEMOCLAW_POLICY_TIER: "open"
+          NEMOCLAW_E2E_TIMEOUT_SECONDS: "1500"
           GITHUB_TOKEN: ${{ github.token }}
         run: bash test/e2e/test-dashboard-reachability.sh
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/nightly-e2e.yaml around lines 365 - 377, The job timeout
equals the script timeout so the workflow can end before artifact upload runs;
update the job's timeout-minutes to a higher value than the script default
(e.g., change timeout-minutes from 30 to 35) or reduce the script's internal
timeout in test/e2e/test-dashboard-reachability.sh so the "Run dashboard
reachability E2E test" step has a buffer to complete and allow artifact upload
to execute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 275-281: TC-DASH-03 currently hard-fails when HTML is served but
no OpenClaw marker is found; change the failure to a soft-pass by replacing the
fail call with a warning plus a pass (or call the existing warn helper then
pass) so the test records a warning but does not fail; update the block that
checks variable "body" (the grep condition) to call warn "TC-DASH-03: Body
signature missing — HTML served but no OpenClaw/Control-UI marker" and then pass
"TC-DASH-03: Soft-pass: HTML served without marker" instead of invoking fail,
referencing the TC-DASH-03 check and the pass/fail/warn helper functions used in
the script.
- Around line 222-227: The test currently does a single lsof probe for
$DASHBOARD_PORT and can race-fail; replace that single check in the TC-DASH-01
block with a polling loop (like TC-DASH-02) that retries lsof
-iTCP:"$DASHBOARD_PORT" -sTCP:LISTEN with a short sleep and overall timeout, and
only call pass "TC-DASH-01: Port $DASHBOARD_PORT is bound" when the loop
succeeds; otherwise call fail "TC-DASH-01: Dashboard port bound" with the same
timeout failure message. Use the same retry interval and timeout semantics as
TC-DASH-02 so the behavior matches and reference the existing pass/fail helpers
and the DASHBOARD_PORT variable.

---

Nitpick comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 365-377: The job timeout equals the script timeout so the workflow
can end before artifact upload runs; update the job's timeout-minutes to a
higher value than the script default (e.g., change timeout-minutes from 30 to
35) or reduce the script's internal timeout in
test/e2e/test-dashboard-reachability.sh so the "Run dashboard reachability E2E
test" step has a buffer to complete and allow artifact upload to execute.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8b7cc14d-d36e-4c85-b7c5-3ed68d0395af

📥 Commits

Reviewing files that changed from the base of the PR and between 1d4b973 and 4e4a05d.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-dashboard-reachability.sh

Comment thread test/e2e/test-dashboard-reachability.sh Outdated
Comment thread test/e2e/test-dashboard-reachability.sh Outdated
@wscurran wscurran added Integration: OpenClaw Support for OpenClaw enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps labels Apr 21, 2026
@hunglp6d
Copy link
Copy Markdown
Contributor

@evantakahashi you should make all the commits to be "Verified" to meet the PR requirement, see other PR with done for reference

@evantakahashi evantakahashi force-pushed the test/e2e-dashboard-reachability branch from 299b312 to e2f142c Compare April 21, 2026 06:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 81-84: The presence check uses substring/regex matching via
`nemoclaw list | grep -q "$name"`, which allows false positives; replace these
with exact literal-line matching such as `nemoclaw list 2>/dev/null | grep -xFq
-- "$name"` (or equivalent anchored grep `grep -q "^${name}$"`) wherever `grep
-q "$name"` is used (e.g., the `nemoclaw list | grep -q "$name"` checks), so the
sandbox name must match the whole line exactly and avoid treating similarly
named sandboxes as a match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ceded0de-2952-42ba-9c57-ff54da933da3

📥 Commits

Reviewing files that changed from the base of the PR and between 299b312 and e2f142c.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-dashboard-reachability.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/nightly-e2e.yaml

Comment thread test/e2e/test-dashboard-reachability.sh Outdated
evantakahashi added a commit to evantakahashi/NemoClaw that referenced this pull request Apr 21, 2026
Harden `nemoclaw list` existence checks with `grep -Fqw --` so a
sandbox name cannot be satisfied by a substring/prefix match
(e.g. `test-dash` matching `test-dashboard-foo`). Addresses
CodeRabbit review feedback on PR NVIDIA#2123.

Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
test/e2e/test-dashboard-reachability.sh (1)

81-82: ⚠️ Potential issue | 🟡 Minor

grep -Fqw is still not a true exact sandbox-name match.

At Lines [81], [139], [170], and [287], grep -Fqw -- "$name" can still match hyphenated prefixes (e.g., test-dash vs test-dash-old). That can mask existence checks.

#!/bin/bash
set -euo pipefail

cat > /tmp/sandbox-names.txt <<'EOF'
test-dash
test-dash-old
EOF

echo "grep -Fqw match for test-dash (shows false-positive behavior):"
grep -nFqw -- "test-dash" /tmp/sandbox-names.txt || true

echo
echo "exact first-field match (preferred behavior):"
awk -v n="test-dash" '$1==n { print NR ":" $0 }' /tmp/sandbox-names.txt
Expected: `grep -Fqw` reports both lines as matches in some cases with punctuation boundaries, while exact field logic should only match `test-dash`.

Also applies to: 139-139, 170-170, 287-287

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-dashboard-reachability.sh` around lines 81 - 82, The sandbox
existence checks using "grep -Fqw -- \"$name\"" can false-match hyphenated
prefixes; update each occurrence (the if checks around nemoclaw list using the
variable name) to perform an exact first-field match instead of word-boundary
grep. Replace the grep call with an awk test that compares the first field to
the variable (e.g., use awk -v n="$name" '$1==n { found=1; exit } END { exit
!found }' semantics) for the checks referenced (the if blocks around the
nemoclaw list call at the shown site and the other occurrences noted) so only
exact sandbox names in the first column count as matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 137-142: The current cleanup defaults install_sandbox to
"my-assistant" and will destroy that sandbox if present, risking deletion of a
user's real sandbox; change install_sandbox handling so it never defaults to a
real-sounding name—require NEMOCLAW_SANDBOX_NAME to be explicitly set or
generate and use an ephemeral test-only name (e.g., include a test
prefix/UUID/$$) and only attempt destroy when that env var or ephemeral pattern
is present; update the check around nemoclaw and the destroy invocation
(references: variable install_sandbox and command nemoclaw "$install_sandbox"
destroy) to skip deletion when install_sandbox is unset or does not match the
test-only naming convention.

---

Duplicate comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 81-82: The sandbox existence checks using "grep -Fqw -- \"$name\""
can false-match hyphenated prefixes; update each occurrence (the if checks
around nemoclaw list using the variable name) to perform an exact first-field
match instead of word-boundary grep. Replace the grep call with an awk test that
compares the first field to the variable (e.g., use awk -v n="$name" '$1==n {
found=1; exit } END { exit !found }' semantics) for the checks referenced (the
if blocks around the nemoclaw list call at the shown site and the other
occurrences noted) so only exact sandbox names in the first column count as
matches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7d16ec1e-2939-4d6b-b1c1-bc7676beae4d

📥 Commits

Reviewing files that changed from the base of the PR and between e2f142c and f32e110.

📒 Files selected for processing (1)
  • test/e2e/test-dashboard-reachability.sh

Comment thread test/e2e/test-dashboard-reachability.sh Outdated
evantakahashi added a commit to evantakahashi/NemoClaw that referenced this pull request Apr 21, 2026
Replace `grep -Fqw` existence checks with an awk first-field match helper
(`sandbox_exists`) so hyphenated prefixes can no longer false-positive
(e.g. `test-dash` matching `test-dash-old`). Also set an ephemeral
`NEMOCLAW_SANDBOX_NAME=test-dash-install-$$` before running install.sh
so cleanup can never destroy a user's real `my-assistant` sandbox when
this script is run locally. Addresses CodeRabbit review feedback on
PR NVIDIA#2123.

Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 29-30: Replace the hardcoded SANDBOX value with an ephemeral name
so the test won't destroy an unrelated sandbox: change the SANDBOX assignment
(the SANDBOX variable used later by the cleanup/destroy calls) to generate a
unique name (e.g., include PID, timestamp, random suffix or use mktemp -u) like
SANDBOX="${SANDBOX:-test-dash-$(date +%s)-$$-$RANDOM}" and ensure the existing
teardown/destroy code that references SANDBOX uses that generated value.
- Around line 205-207: The test currently swallows failures from the openshell
forward start command via "|| log ...", allowing the suite to continue with a
stale listener; change the behavior so the script fails fast if the forward
cannot be established: replace the "openshell forward start --background
\"$DASHBOARD_PORT\" \"$SANDBOX\" >>\"$LOG_FILE\" 2>&1 || log ..." fallback with
a check that runs openshell, captures its exit status (or uses set -e), logs the
full output to $LOG_FILE on error using log and then exits non-zero; reference
the openshell forward start invocation and the log function/variables
DASHBOARD_PORT, SANDBOX and LOG_FILE when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a2810495-720a-4c6b-9847-b3072ed2a5bd

📥 Commits

Reviewing files that changed from the base of the PR and between f32e110 and a5997c3.

📒 Files selected for processing (1)
  • test/e2e/test-dashboard-reachability.sh

Comment thread test/e2e/test-dashboard-reachability.sh Outdated
Comment thread test/e2e/test-dashboard-reachability.sh Outdated
Adds test/e2e/test-dashboard-reachability.sh validating that the
OpenClaw dashboard is reachable from the host on the forwarded port
after onboard: port bound (polled), HTTP 200 (polled), HTML body
signature (soft marker check). Wires it into nightly-e2e.yaml as a
new top-level job with a 30-minute timeout and adds it to
notify-on-failure.

Closes NVIDIA#2100

Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Poll port-bound check (TC-DASH-01) since `openshell forward start
--background` forks and returns before the port is bound. Soften
body-marker check (TC-DASH-03) to WARN+pass on missing marker so
SPA shells don't trip the assertion while still hard-failing on
non-HTML responses. Remove dead TIMEOUT_CMD detection.

Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Harden `nemoclaw list` existence checks with `grep -Fqw --` so a
sandbox name cannot be satisfied by a substring/prefix match
(e.g. `test-dash` matching `test-dashboard-foo`). Addresses
CodeRabbit review feedback on PR NVIDIA#2123.

Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Replace `grep -Fqw` existence checks with an awk first-field match helper
(`sandbox_exists`) so hyphenated prefixes can no longer false-positive
(e.g. `test-dash` matching `test-dash-old`). Also set an ephemeral
`NEMOCLAW_SANDBOX_NAME=test-dash-install-$$` before running install.sh
so cleanup can never destroy a user's real `my-assistant` sandbox when
this script is run locally. Addresses CodeRabbit review feedback on
PR NVIDIA#2123.

Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Make SANDBOX an ephemeral per-run name (NEMOCLAW_E2E_SANDBOX_NAME
override, defaulting to `test-dash-$$`) so cleanup can never destroy
a user's unrelated `test-dash` sandbox on local runs — same class of
fix already applied to the install sandbox.

Harden the defensive forward re-establishment: stop any existing
forward first, then hard-fail if `openshell forward start` fails.
Previously we swallowed the non-zero exit with `|| log ...`, which
meant TC-DASH-0{1,2,3} could pass spuriously against a stale
listener from another process.

Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
@evantakahashi evantakahashi force-pushed the test/e2e-dashboard-reachability branch from 76dfbf5 to 20f7c4c Compare April 21, 2026 20:52
cv pushed a commit that referenced this pull request Apr 21, 2026
…#2196)

## Summary
Reverts #2186 (which made Brev E2E failures 10× slower) and replaces
\`sudo npm link\` with a direct \`sudo ln -sf\` symlink in both the
launchable setup and the in-test bootstrap path. \`npm link\` is
overkill for what we actually need and hangs indefinitely on cold CPU
Brev instances. Direct symlink is O(1) and deterministic.

## Related Issue
Regression from my own #2186. Surfaced while validating #2183 (ollama
E2E).

## Changes
This PR contains two commits:

1. **Revert #2186.** Restores the launchable setup to its pre-\`sudo npm
link\` state. This alone would leave non-full suites broken by the
original pre-existing #1888 issue, but without burning a 20-min Brev
instance per run.
2. **Replace \`sudo npm link\` with direct symlink.**
- \`scripts/brev-launchable-ci-cpu.sh\` — after plugin build, create
\`/usr/local/bin/nemoclaw → \$NEMOCLAW_CLONE_DIR/bin/nemoclaw.js\` via
\`sudo ln -sf\`. Drop \`sudo chown -R\` (only the single symlink is
root-owned now, not node_modules).
- \`test/e2e/brev-e2e.test.ts\` — replace the in-test \`sudo npm link\`
with the same direct-symlink approach. Idempotent re-link so local dev
runs that skip the launchable still work.

## Evidence
- **Before #2186 (2026-04-14):** non-full suites failed at ~2 min with a
clear \`sudo npm link ETIMEDOUT\` (pre-existing #1888 regression, but
cheap to fail).
- **After #2186 (today):** non-full suites hang on \`sudo npm link\`
inside the launchable until the 20-min outer cap trips. 10× longer to
fail, 10× more Brev credit per failed run. Evidence: [run
24737205166](https://github.com/NVIDIA/NemoClaw/actions/runs/24737205166)
— stuck on \`Linking nemoclaw CLI globally...\` from 808s to 1198s with
no progress.
- \`npm link\` does two things: \`/usr/local/lib/node_modules/<name>\`
symlink + \`/usr/local/bin/<bin>\` symlink. We only need the second one.
Replacing with \`sudo ln -sf\` bypasses npm's global-prefix housekeeping
and the \`sudo chown -R node_modules\` traversal that likely drove the
hang.

## Type of Change
- [x] Code change (bug fix)

## Verification
- [x] \`bash -n\` on the launchable script — syntax OK
- [x] \`npm run typecheck:cli\` — passes
- [x] Pre-push hooks — passes (modulo the pre-existing flaky
\`test/install-preflight.test.ts:107\`, resolves on retry, unrelated)
- [ ] **End-to-end Brev E2E run** — to be validated via \`gh workflow
run e2e-brev.yaml --ref fix/brev-cpu-npm-link-hang --field
test_suite=credential-sanitization\` (any non-\`full\` suite proves the
bootstrap now completes). Will attach run URL before merge.

## Retro note
Shipped #2186 without running the Brev suite end-to-end — exact failure
mode I had flagged on #2123 earlier the same day. This PR is validated
the right way before merge.

## AI Disclosure
- [x] AI-assisted — tool: Claude Code

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Made CLI installation deterministic and more reliable with explicit
build and idempotent system-wide linking so the tool is consistently
available and executable.

* **Tests**
* Hardened remote test setup and reduced a CLI-linking timeout for
faster feedback.
* On workflow failures, automatically collect VM diagnostics and upload
them as a debug artifact for easier troubleshooting.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. Integration: OpenClaw Support for OpenClaw

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] add coverage for OpenClaw dashboard reachability

3 participants