test(e2e): migrate sandbox lifecycle coverage#3902
Conversation
PR Review AdvisorRecommendation: info only This is an automated advisory review. A human maintainer must make the final merge decision. Limitations: Advisor execution failed: Could not configure advisor model openai/openai/gpt-5.5 Full advisor summaryPR Review AdvisorBase: PR review advisor failed: Could not configure advisor model openai/openai/gpt-5.5 Gate status
🔴 Blockers
🟡 Warnings
🔵 Suggestions
Acceptance coverage
Security review
Test / E2E status
✅ What looks good
Review completeness
|
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMigrates sandbox lifecycle E2E coverage into the scenario framework by adding a shared assertion library, per-check validation scripts, suite wiring with explicit requires_state, parity-map migrations to ChangesSandbox Lifecycle E2E Coverage Migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/validation_suites/suites.yaml (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd SPDX license header to this YAML source file.
This file is missing the required SPDX copyright and license header.
Proposed fix
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 suites:As per coding guidelines,
**/*.{js,ts,tsx,jsx,sh,yaml,yml,json,md,mdx}: Every source file must include an SPDX license header for copyright and Apache-2.0 license.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/validation_suites/suites.yaml` at line 1, This YAML is missing the required SPDX license header; add the standard SPDX header lines at the very top of this file (above the existing top-level key "suites:") including the SPDX copyright text entry and the SPDX-License-Identifier set to Apache-2.0 so the file complies with the project's licensing guideline.
♻️ Duplicate comments (1)
test/e2e/validation_suites/lib/sandbox_lifecycle.sh (1)
52-52:⚠️ Potential issue | 🟠 Major | ⚡ Quick winQuote
"$@"in the fallback command substitution.At Line 52, unquoted
$@can re-split arguments and change command behavior on the non-timeoutpath.Proposed fix
- SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)" || { + SANDBOX_LIFECYCLE_LAST_OUTPUT="$("$@" 2>&1)" || {#!/bin/bash shellcheck -s bash test/e2e/validation_suites/lib/sandbox_lifecycle.sh rg -n '\$\(\$@' test/e2e/validation_suites/lib/sandbox_lifecycle.shAs per coding guidelines,
**/*.sh: Shell scripts must be enforced by ShellCheck (.shellcheckrc) and formatted withshfmt.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh` at line 52, The assignment to SANDBOX_LIFECYCLE_LAST_OUTPUT uses an unquoted $@ in the fallback command substitution which can re-split arguments and change behavior; modify the command substitution to use quoted "$@" instead so the exact arguments are preserved (update the line that sets SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)" to use "$@" within the substitution), ensuring the non-timeout fallback path receives the same arguments as the timeout path.
🧹 Nitpick comments (2)
test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts (1)
470-470: ⚡ Quick winUse the repo’s PATH fallback pattern in mocked command env.
Line 470 should use
PATH: \${bin}:${process.env.PATH || ""}`to avoid brittle concatenation whenPATH` is unset in isolated test environments.Suggested patch
- const r = runBash(`set -euo pipefail; . "${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh"; sandbox_lifecycle_load_context; sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox; sandbox_lifecycle_assert_status_fields_present; sandbox_lifecycle_assert_logs_available; sandbox_lifecycle_assert_openshell_exec_ok`, { E2E_CONTEXT_DIR: tmp, PATH: `${bin}:${process.env.PATH}` }); + const r = runBash(`set -euo pipefail; . "${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh"; sandbox_lifecycle_load_context; sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox; sandbox_lifecycle_assert_status_fields_present; sandbox_lifecycle_assert_logs_available; sandbox_lifecycle_assert_openshell_exec_ok`, { E2E_CONTEXT_DIR: tmp, PATH: `${bin}:${process.env.PATH || ""}` });Based on learnings: In this repo’s tests, prefer
PATH: \${fakeBin}:${process.env.PATH || ""}`with POSIX:` separator.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts` at line 470, Update the runBash invocation so the mocked command environment uses the repo's PATH fallback pattern: when calling runBash (the call that sources "${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh" and runs sandbox_lifecycle_* assertions) set the PATH env to use the fallback `${bin}:${process.env.PATH || ""}` (i.e. include the empty-string fallback) instead of `${bin}:${process.env.PATH}` to avoid failures when PATH is unset in isolated test environments.test/e2e/scenario-framework-tests/e2e-parity-map.test.ts (1)
92-92: ⚡ Quick winHard-coded retirement date makes this test unnecessarily brittle.
Line 92 enforces
approved_at: 2026-05-20exactly. Any valid future metadata update will fail this test even when classification is correct. Prefer checking presence and date format instead of a fixed date literal.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/scenario-framework-tests/e2e-parity-map.test.ts` at line 92, The test currently asserts a hard-coded approval date using expect(entry).toMatch(...), which is brittle; update the assertion in the e2e-parity-map.test to check presence and valid date format instead of the fixed literal: replace the exact-date regex with one that matches an ISO date (e.g., YYYY-MM-DD) allowing optional quotes and whitespace, or alternatively parse the captured value with Date.parse to assert it's a valid date; keep using the same expect(entry).toMatch / expect(...) pattern so the change is localized to the assertion for entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/docs/parity-map.yaml`:
- Around line 9782-9787: The mapping uses the existing id
"validation.sandbox_operations.openshell_exec_ok" which collides with the
exec/chat coverage; replace that id with a restart-specific identifier (for
example "validation.sandbox_operations.restart_pod_ready" or
"validation.sandbox_operations.restart_pod_not_ready" depending on whether the
status is OK or representing a gap) so the entry "Sandbox pod did not reach
Running/Ready after restart" is tracked separately from the exec path; update
the id value on the YAML block containing status: mapped, layer: validation,
gap_domain: sandbox-lifecycle, owner: e2e-maintainers to the chosen
restart-specific id.
- Around line 10648-10661: The two parity-map entries mapping the legacy
messages "No credentials in snapshot directories" and "Credentials found:
$CRED_LEAKS" to id validation.sandbox_snapshot.create_succeeds are incorrect;
remove or change those mappings so credential-leak assertions are not marked as
covered by create_succeeds—either set their status to deferred (leave them
unmapped) or remap them to a dedicated snapshot leak/no-credentials assertion
(e.g., a future validation.sandbox_snapshot.no_credentials) once that test
exists; update the entries that reference the legacy strings and the id
validation.sandbox_snapshot.create_succeeds accordingly so leak checks remain
separate from create/list/restore coverage.
- Around line 4640-4646: The YAML remaps are incorrectly mapping prerequisite
checks (e.g., the legacy note "nemoclaw on PATH", "Docker is running", "NemoClaw
installed") to behavior IDs like validation.sandbox_lifecycle.gateway_health and
marker_written; instead, change those entries so their status is "deferred" or
replace them with dedicated preflight IDs (create new IDs such as
preflight.sandbox.nemoclaw_present or similar) rather than reusing behavior IDs;
update the specific entries that reference
validation.sandbox_lifecycle.gateway_health and marker_written (and the similar
blocks around the other occurrences noted) to use "deferred" or the new
preflight IDs and ensure owner/reusable metadata remains consistent.
In `@test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts`:
- Around line 121-127: The test named
"test_should_report_scoped_lifecycle_parity_at_or_above_100_percent" only checks
for section presence (using loadMetadataFromDir and renderCoverageReport into
md) so it doesn't enforce the 100% threshold; update the test to parse the
lifecycle parity percentage out of md (e.g., with a regex against md) and assert
the numeric value is >= 100, referencing the existing md variable and the
renderCoverageReport output (or alternatively rename the test to reflect
"section presence" if you prefer not to assert the numeric threshold).
In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts`:
- Around line 458-461: The test currently only asserts runBash(...) returns a
non-zero status, which allows unrelated failures to pass; update the
"test_should_apply_timeout_to_command_execution" case to assert timeout-specific
semantics from the runBash result: call runBash with `.
"${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh";
sandbox_lifecycle_run_with_timeout 1 bash -c 'sleep 5'` as before, then assert
either the canonical timeout exit code (e.g., r.status === 124) or that
r.stdout/r.stderr contains a timeout marker (e.g., matches /timeout|timed out/)
so the test verifies sandbox_lifecycle_run_with_timeout actually timed out
rather than failing for another reason.
In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh`:
- Around line 112-120: In
sandbox_lifecycle_assert_snapshot_create_list_restore_marker, the "marker
written" and "marker rolled back" assertions are meaningless because no marker
is written or checked; fix by explicitly creating a marker in the sandbox before
taking the snapshot and verifying its state after restore: use the sandbox
manipulation commands already used in this script (the same nemoclaw/sandbox
invocation pattern) to write a sentinel (e.g., create a file or set a flag) in
the sandbox prior to calling "nemoclaw snapshot create" and assert its presence
with sandbox_lifecycle_pass; after "nemoclaw snapshot restore ... latest" verify
the marker has been removed or reverted as expected and call
sandbox_lifecycle_pass or sandbox_lifecycle_fail accordingly so the existing
messages ("marker written" and "marker rolled back") reflect real checks.
- Around line 63-66: Several assertion functions call sandbox_lifecycle_fail via
|| but then continue execution, causing both FAIL and PASS to be reported;
update each affected function
(sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox,
sandbox_lifecycle_assert_status_fields_present,
sandbox_lifecycle_assert_logs_available,
sandbox_lifecycle_assert_openshell_exec_ok,
sandbox_lifecycle_assert_gateway_health,
sandbox_lifecycle_assert_snapshot_create_list_restore_marker) to immediately
exit after calling sandbox_lifecycle_fail by adding an explicit "return 1" (or
equivalent early return) right after each sandbox_lifecycle_fail invocation so
the function stops and does not proceed to sandbox_lifecycle_pass. Ensure you
add the return in every branch where sandbox_lifecycle_fail is used.
---
Outside diff comments:
In `@test/e2e/validation_suites/suites.yaml`:
- Line 1: This YAML is missing the required SPDX license header; add the
standard SPDX header lines at the very top of this file (above the existing
top-level key "suites:") including the SPDX copyright text entry and the
SPDX-License-Identifier set to Apache-2.0 so the file complies with the
project's licensing guideline.
---
Duplicate comments:
In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh`:
- Line 52: The assignment to SANDBOX_LIFECYCLE_LAST_OUTPUT uses an unquoted $@
in the fallback command substitution which can re-split arguments and change
behavior; modify the command substitution to use quoted "$@" instead so the
exact arguments are preserved (update the line that sets
SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)" to use "$@" within the substitution),
ensuring the non-timeout fallback path receives the same arguments as the
timeout path.
---
Nitpick comments:
In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts`:
- Line 470: Update the runBash invocation so the mocked command environment uses
the repo's PATH fallback pattern: when calling runBash (the call that sources
"${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh" and runs sandbox_lifecycle_*
assertions) set the PATH env to use the fallback `${bin}:${process.env.PATH ||
""}` (i.e. include the empty-string fallback) instead of
`${bin}:${process.env.PATH}` to avoid failures when PATH is unset in isolated
test environments.
In `@test/e2e/scenario-framework-tests/e2e-parity-map.test.ts`:
- Line 92: The test currently asserts a hard-coded approval date using
expect(entry).toMatch(...), which is brittle; update the assertion in the
e2e-parity-map.test to check presence and valid date format instead of the fixed
literal: replace the exact-date regex with one that matches an ISO date (e.g.,
YYYY-MM-DD) allowing optional quotes and whitespace, or alternatively parse the
captured value with Date.parse to assert it's a valid date; keep using the same
expect(entry).toMatch / expect(...) pattern so the change is localized to the
assertion for entry.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 708860d0-25b2-4c54-b107-1a6711143381
📒 Files selected for processing (12)
test/e2e/docs/README.mdtest/e2e/docs/parity-map.yamltest/e2e/scenario-framework-tests/e2e-coverage-report.test.tstest/e2e/scenario-framework-tests/e2e-lib-helpers.test.tstest/e2e/scenario-framework-tests/e2e-parity-map.test.tstest/e2e/validation_suites/lib/sandbox_lifecycle.shtest/e2e/validation_suites/sandbox/lifecycle/00-gateway-health.shtest/e2e/validation_suites/sandbox/lifecycle/01-gateway-recovery.shtest/e2e/validation_suites/sandbox/operations/00-list-and-status.shtest/e2e/validation_suites/sandbox/operations/01-logs-and-exec.shtest/e2e/validation_suites/sandbox/snapshot/00-create-list-restore.shtest/e2e/validation_suites/suites.yaml
…ndbox-lifecycle-coverage
|
✨ Related open issues: |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/docs/parity-map.yaml`:
- Around line 10647-10660: The two deferred parity entries whose legacy messages
are "No credentials in snapshot directories" and "Credentials found:
$CRED_LEAKS" currently put the coverage explanation into runner_requirement;
move the explanatory text back into the reason field (keeping "snapshot
credential-leak coverage is not asserted by
validation.sandbox_snapshot.create_succeeds") and replace runner_requirement
with the actual execution environment used elsewhere in this file (i.e., set
runner_requirement to the canonical runner name used for sandbox snapshot checks
instead of "dedicated snapshot credential leak assertion").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e299b67d-c111-476c-a122-0d6b01cdd1ee
📒 Files selected for processing (5)
.github/actions/basic-checks/action.yamltest/e2e/docs/parity-map.yamltest/e2e/scenario-framework-tests/e2e-coverage-report.test.tstest/e2e/scenario-framework-tests/e2e-lib-helpers.test.tstest/e2e/validation_suites/lib/sandbox_lifecycle.sh
Summary
Migrates sandbox lifecycle E2E coverage into the scenario validation suite so lifecycle, operations, and snapshot behavior are covered by reusable plan-driven assertions. This also expands parity metadata and framework tests so migrated lifecycle coverage stays visible and enforced.
Related Issue
Fixes #3813
Changes
test/e2e/validation_suites/lib/sandbox_lifecycle.sh.test/e2e/validation_suites/suites.yaml.test/e2e/docs/parity-map.yamlwith migrated lifecycle coverage metadata.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Additional targeted validation run during implementation:
npx vitest run test/e2e/scenario-framework-tests/e2e-context-helper.test.ts test/e2e/scenario-framework-tests/e2e-convention-lint.test.ts test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts test/e2e/scenario-framework-tests/e2e-expected-failure.test.ts test/e2e/scenario-framework-tests/e2e-expected-state-validator.test.ts test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts test/e2e/scenario-framework-tests/e2e-parity-map.test.tspassed: 7 files, 66 tests.test/e2e/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --plan-onlypassed.npx tsx scripts/e2e/check-parity-map.ts --strictpassed.Signed-off-by: Julie Yaunches jyaunches@nvidia.com
Summary by CodeRabbit