test(e2e): add dashboard reachability coverage#2123
test(e2e): add dashboard reachability coverage#2123evantakahashi wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new E2E Bash test Changes
Sequence DiagramsequenceDiagram
autonumber
participant CI as CI/Test Runner
participant Env as Docker\nNemoClaw Env
participant Gateway as OpenClaw\nGateway
participant Dashboard as Dashboard\nService
participant Host as Host\nTCP Stack
CI->>Env: preflight (Docker, env vars, install)
CI->>Env: `nemoclaw onboard` sandbox
Env-->>CI: sandbox ready
CI->>Gateway: `openshell forward start --background` (port 18789)
Gateway-->>CI: forward established
CI->>Host: check LISTEN on 127.0.0.1:18789 (`lsof`)
Host-->>CI: port bound
CI->>Host: HTTP GET http://127.0.0.1:18789/ (`curl`)
Host->>Gateway: forwarded request
Gateway->>Dashboard: route request
Dashboard-->>Gateway: HTTP 200 + HTML
Gateway-->>Host: response
Host-->>CI: HTTP 200 + body
CI->>CI: verify HTML markers → pass/fail
CI->>Gateway: stop forward
CI->>Env: destroy sandbox / cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.github/workflows/nightly-e2e.yaml (1)
365-377: Add timeout buffer so failure artifact upload is more reliable.The job timeout (30m) matches the script’s default timeout (1800s). If both hit at the edge, the job can terminate before the artifact step runs.
Proposed tweak
- name: Run dashboard reachability E2E test env: NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }} NEMOCLAW_NON_INTERACTIVE: "1" NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1" NEMOCLAW_POLICY_TIER: "open" + NEMOCLAW_E2E_TIMEOUT_SECONDS: "1500" GITHUB_TOKEN: ${{ github.token }} run: bash test/e2e/test-dashboard-reachability.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/nightly-e2e.yaml around lines 365 - 377, The job timeout equals the script timeout so the workflow can end before artifact upload runs; update the job's timeout-minutes to a higher value than the script default (e.g., change timeout-minutes from 30 to 35) or reduce the script's internal timeout in test/e2e/test-dashboard-reachability.sh so the "Run dashboard reachability E2E test" step has a buffer to complete and allow artifact upload to execute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 275-281: TC-DASH-03 currently hard-fails when HTML is served but
no OpenClaw marker is found; change the failure to a soft-pass by replacing the
fail call with a warning plus a pass (or call the existing warn helper then
pass) so the test records a warning but does not fail; update the block that
checks variable "body" (the grep condition) to call warn "TC-DASH-03: Body
signature missing — HTML served but no OpenClaw/Control-UI marker" and then pass
"TC-DASH-03: Soft-pass: HTML served without marker" instead of invoking fail,
referencing the TC-DASH-03 check and the pass/fail/warn helper functions used in
the script.
- Around line 222-227: The test currently does a single lsof probe for
$DASHBOARD_PORT and can race-fail; replace that single check in the TC-DASH-01
block with a polling loop (like TC-DASH-02) that retries lsof
-iTCP:"$DASHBOARD_PORT" -sTCP:LISTEN with a short sleep and overall timeout, and
only call pass "TC-DASH-01: Port $DASHBOARD_PORT is bound" when the loop
succeeds; otherwise call fail "TC-DASH-01: Dashboard port bound" with the same
timeout failure message. Use the same retry interval and timeout semantics as
TC-DASH-02 so the behavior matches and reference the existing pass/fail helpers
and the DASHBOARD_PORT variable.
---
Nitpick comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 365-377: The job timeout equals the script timeout so the workflow
can end before artifact upload runs; update the job's timeout-minutes to a
higher value than the script default (e.g., change timeout-minutes from 30 to
35) or reduce the script's internal timeout in
test/e2e/test-dashboard-reachability.sh so the "Run dashboard reachability E2E
test" step has a buffer to complete and allow artifact upload to execute.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 8b7cc14d-d36e-4c85-b7c5-3ed68d0395af
📒 Files selected for processing (2)
.github/workflows/nightly-e2e.yamltest/e2e/test-dashboard-reachability.sh
|
@evantakahashi you should make all the commits to be "Verified" to meet the PR requirement, see other PR with done for reference |
299b312 to
e2f142c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 81-84: The presence check uses substring/regex matching via
`nemoclaw list | grep -q "$name"`, which allows false positives; replace these
with exact literal-line matching such as `nemoclaw list 2>/dev/null | grep -xFq
-- "$name"` (or equivalent anchored grep `grep -q "^${name}$"`) wherever `grep
-q "$name"` is used (e.g., the `nemoclaw list | grep -q "$name"` checks), so the
sandbox name must match the whole line exactly and avoid treating similarly
named sandboxes as a match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ceded0de-2952-42ba-9c57-ff54da933da3
📒 Files selected for processing (2)
.github/workflows/nightly-e2e.yamltest/e2e/test-dashboard-reachability.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/nightly-e2e.yaml
Harden `nemoclaw list` existence checks with `grep -Fqw --` so a sandbox name cannot be satisfied by a substring/prefix match (e.g. `test-dash` matching `test-dashboard-foo`). Addresses CodeRabbit review feedback on PR NVIDIA#2123. Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/e2e/test-dashboard-reachability.sh (1)
81-82:⚠️ Potential issue | 🟡 Minor
grep -Fqwis still not a true exact sandbox-name match.At Lines [81], [139], [170], and [287],
grep -Fqw -- "$name"can still match hyphenated prefixes (e.g.,test-dashvstest-dash-old). That can mask existence checks.#!/bin/bash set -euo pipefail cat > /tmp/sandbox-names.txt <<'EOF' test-dash test-dash-old EOF echo "grep -Fqw match for test-dash (shows false-positive behavior):" grep -nFqw -- "test-dash" /tmp/sandbox-names.txt || true echo echo "exact first-field match (preferred behavior):" awk -v n="test-dash" '$1==n { print NR ":" $0 }' /tmp/sandbox-names.txtExpected: `grep -Fqw` reports both lines as matches in some cases with punctuation boundaries, while exact field logic should only match `test-dash`.Also applies to: 139-139, 170-170, 287-287
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-dashboard-reachability.sh` around lines 81 - 82, The sandbox existence checks using "grep -Fqw -- \"$name\"" can false-match hyphenated prefixes; update each occurrence (the if checks around nemoclaw list using the variable name) to perform an exact first-field match instead of word-boundary grep. Replace the grep call with an awk test that compares the first field to the variable (e.g., use awk -v n="$name" '$1==n { found=1; exit } END { exit !found }' semantics) for the checks referenced (the if blocks around the nemoclaw list call at the shown site and the other occurrences noted) so only exact sandbox names in the first column count as matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 137-142: The current cleanup defaults install_sandbox to
"my-assistant" and will destroy that sandbox if present, risking deletion of a
user's real sandbox; change install_sandbox handling so it never defaults to a
real-sounding name—require NEMOCLAW_SANDBOX_NAME to be explicitly set or
generate and use an ephemeral test-only name (e.g., include a test
prefix/UUID/$$) and only attempt destroy when that env var or ephemeral pattern
is present; update the check around nemoclaw and the destroy invocation
(references: variable install_sandbox and command nemoclaw "$install_sandbox"
destroy) to skip deletion when install_sandbox is unset or does not match the
test-only naming convention.
---
Duplicate comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 81-82: The sandbox existence checks using "grep -Fqw -- \"$name\""
can false-match hyphenated prefixes; update each occurrence (the if checks
around nemoclaw list using the variable name) to perform an exact first-field
match instead of word-boundary grep. Replace the grep call with an awk test that
compares the first field to the variable (e.g., use awk -v n="$name" '$1==n {
found=1; exit } END { exit !found }' semantics) for the checks referenced (the
if blocks around the nemoclaw list call at the shown site and the other
occurrences noted) so only exact sandbox names in the first column count as
matches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 7d16ec1e-2939-4d6b-b1c1-bc7676beae4d
📒 Files selected for processing (1)
test/e2e/test-dashboard-reachability.sh
Replace `grep -Fqw` existence checks with an awk first-field match helper (`sandbox_exists`) so hyphenated prefixes can no longer false-positive (e.g. `test-dash` matching `test-dash-old`). Also set an ephemeral `NEMOCLAW_SANDBOX_NAME=test-dash-install-$$` before running install.sh so cleanup can never destroy a user's real `my-assistant` sandbox when this script is run locally. Addresses CodeRabbit review feedback on PR NVIDIA#2123. Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-dashboard-reachability.sh`:
- Around line 29-30: Replace the hardcoded SANDBOX value with an ephemeral name
so the test won't destroy an unrelated sandbox: change the SANDBOX assignment
(the SANDBOX variable used later by the cleanup/destroy calls) to generate a
unique name (e.g., include PID, timestamp, random suffix or use mktemp -u) like
SANDBOX="${SANDBOX:-test-dash-$(date +%s)-$$-$RANDOM}" and ensure the existing
teardown/destroy code that references SANDBOX uses that generated value.
- Around line 205-207: The test currently swallows failures from the openshell
forward start command via "|| log ...", allowing the suite to continue with a
stale listener; change the behavior so the script fails fast if the forward
cannot be established: replace the "openshell forward start --background
\"$DASHBOARD_PORT\" \"$SANDBOX\" >>\"$LOG_FILE\" 2>&1 || log ..." fallback with
a check that runs openshell, captures its exit status (or uses set -e), logs the
full output to $LOG_FILE on error using log and then exits non-zero; reference
the openshell forward start invocation and the log function/variables
DASHBOARD_PORT, SANDBOX and LOG_FILE when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: a2810495-720a-4c6b-9847-b3072ed2a5bd
📒 Files selected for processing (1)
test/e2e/test-dashboard-reachability.sh
Adds test/e2e/test-dashboard-reachability.sh validating that the OpenClaw dashboard is reachable from the host on the forwarded port after onboard: port bound (polled), HTTP 200 (polled), HTML body signature (soft marker check). Wires it into nightly-e2e.yaml as a new top-level job with a 30-minute timeout and adds it to notify-on-failure. Closes NVIDIA#2100 Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Poll port-bound check (TC-DASH-01) since `openshell forward start --background` forks and returns before the port is bound. Soften body-marker check (TC-DASH-03) to WARN+pass on missing marker so SPA shells don't trip the assertion while still hard-failing on non-HTML responses. Remove dead TIMEOUT_CMD detection. Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Harden `nemoclaw list` existence checks with `grep -Fqw --` so a sandbox name cannot be satisfied by a substring/prefix match (e.g. `test-dash` matching `test-dashboard-foo`). Addresses CodeRabbit review feedback on PR NVIDIA#2123. Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Replace `grep -Fqw` existence checks with an awk first-field match helper (`sandbox_exists`) so hyphenated prefixes can no longer false-positive (e.g. `test-dash` matching `test-dash-old`). Also set an ephemeral `NEMOCLAW_SANDBOX_NAME=test-dash-install-$$` before running install.sh so cleanup can never destroy a user's real `my-assistant` sandbox when this script is run locally. Addresses CodeRabbit review feedback on PR NVIDIA#2123. Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
Make SANDBOX an ephemeral per-run name (NEMOCLAW_E2E_SANDBOX_NAME
override, defaulting to `test-dash-$$`) so cleanup can never destroy
a user's unrelated `test-dash` sandbox on local runs — same class of
fix already applied to the install sandbox.
Harden the defensive forward re-establishment: stop any existing
forward first, then hard-fail if `openshell forward start` fails.
Previously we swallowed the non-zero exit with `|| log ...`, which
meant TC-DASH-0{1,2,3} could pass spuriously against a stale
listener from another process.
Signed-off-by: Evan Takahashi <evan10takahashi@gmail.com>
76dfbf5 to
20f7c4c
Compare
…#2196) ## Summary Reverts #2186 (which made Brev E2E failures 10× slower) and replaces \`sudo npm link\` with a direct \`sudo ln -sf\` symlink in both the launchable setup and the in-test bootstrap path. \`npm link\` is overkill for what we actually need and hangs indefinitely on cold CPU Brev instances. Direct symlink is O(1) and deterministic. ## Related Issue Regression from my own #2186. Surfaced while validating #2183 (ollama E2E). ## Changes This PR contains two commits: 1. **Revert #2186.** Restores the launchable setup to its pre-\`sudo npm link\` state. This alone would leave non-full suites broken by the original pre-existing #1888 issue, but without burning a 20-min Brev instance per run. 2. **Replace \`sudo npm link\` with direct symlink.** - \`scripts/brev-launchable-ci-cpu.sh\` — after plugin build, create \`/usr/local/bin/nemoclaw → \$NEMOCLAW_CLONE_DIR/bin/nemoclaw.js\` via \`sudo ln -sf\`. Drop \`sudo chown -R\` (only the single symlink is root-owned now, not node_modules). - \`test/e2e/brev-e2e.test.ts\` — replace the in-test \`sudo npm link\` with the same direct-symlink approach. Idempotent re-link so local dev runs that skip the launchable still work. ## Evidence - **Before #2186 (2026-04-14):** non-full suites failed at ~2 min with a clear \`sudo npm link ETIMEDOUT\` (pre-existing #1888 regression, but cheap to fail). - **After #2186 (today):** non-full suites hang on \`sudo npm link\` inside the launchable until the 20-min outer cap trips. 10× longer to fail, 10× more Brev credit per failed run. Evidence: [run 24737205166](https://github.com/NVIDIA/NemoClaw/actions/runs/24737205166) — stuck on \`Linking nemoclaw CLI globally...\` from 808s to 1198s with no progress. - \`npm link\` does two things: \`/usr/local/lib/node_modules/<name>\` symlink + \`/usr/local/bin/<bin>\` symlink. We only need the second one. Replacing with \`sudo ln -sf\` bypasses npm's global-prefix housekeeping and the \`sudo chown -R node_modules\` traversal that likely drove the hang. ## Type of Change - [x] Code change (bug fix) ## Verification - [x] \`bash -n\` on the launchable script — syntax OK - [x] \`npm run typecheck:cli\` — passes - [x] Pre-push hooks — passes (modulo the pre-existing flaky \`test/install-preflight.test.ts:107\`, resolves on retry, unrelated) - [ ] **End-to-end Brev E2E run** — to be validated via \`gh workflow run e2e-brev.yaml --ref fix/brev-cpu-npm-link-hang --field test_suite=credential-sanitization\` (any non-\`full\` suite proves the bootstrap now completes). Will attach run URL before merge. ## Retro note Shipped #2186 without running the Brev suite end-to-end — exact failure mode I had flagged on #2123 earlier the same day. This PR is validated the right way before merge. ## AI Disclosure - [x] AI-assisted — tool: Claude Code <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Made CLI installation deterministic and more reliable with explicit build and idempotent system-wide linking so the tool is consistently available and executable. * **Tests** * Hardened remote test setup and reduced a CLI-linking timeout for faster feedback. * On workflow failures, automatically collect VM diagnostics and upload them as a debug artifact for easier troubleshooting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Charan Jagwani <cjagwani@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds E2E coverage that the OpenClaw dashboard is actually reachable from the host on the forwarded port (default
127.0.0.1:18789) afternemoclaw onboard. Wires the script intonightly-e2e.yamlas a new top-level job with a 30-minute timeout.Related Issue
Closes #2100
Changes
test/e2e/test-dashboard-reachability.shwith three test cases:TC-DASH-01— port bound on host (polled 30×1s;openshell forward start --backgroundis async persrc/lib/onboard.ts:5620)TC-DASH-02— dashboard returns HTTP 200 (polled 30×1s)TC-DASH-03— response body signature (hard fail on non-HTML; soft warn-pass when HTML lacks OpenClaw/Control-UI marker, since the dashboard may be an SPA shell)dashboard-reachability-e2ejob in.github/workflows/nightly-e2e.yaml(modeled onsandbox-operations-e2e) and wired it intonotify-on-failure'sneeds:and failureif:expression.test/e2e/test-sandbox-operations.sh(timeout re-exec, preflight, onboard, teardown via EXIT trap, summary).Type of Change
Verification
npx prek run --all-filespasses (excluding pre-existing flakytest/onboard.test.ts:3940test, which reproduces on unmodifiedmain— unrelated to this PR)npm test— same note: flakyonboard.test.ts:3940is pre-existingLocal validation notes
nemoclaw onboard. First real integration execution will be the next nightly run after merge. Happy to iterate on any issues surfaced bynotify-on-failure.AI Disclosure
Notes for reviewers
Things intentionally not covered by this PR (possible follow-ups):
CHAT_UI_URLenv-var override path (test assumes default port 18789; CI does not setCHAT_UI_URL)The
TC-DASH-03marker-check regex (openclaw|control[- ]?ui|nemoclaw) is intentionally soft — passes on HTML-structure alone when no marker is present — because the real dashboard HTML could not be verified from this repo (served by upstream OpenClaw). Happy to tighten this if maintainers confirm the expected markers.Summary by CodeRabbit
Tests
Chores