Skip to content

test(e2e): migrate sandbox lifecycle coverage#3902

Merged
jyaunches merged 35 commits into
mainfrom
issue-3813-migrate-sandbox-lifecycle-coverage
May 20, 2026
Merged

test(e2e): migrate sandbox lifecycle coverage#3902
jyaunches merged 35 commits into
mainfrom
issue-3813-migrate-sandbox-lifecycle-coverage

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches commented May 20, 2026

Summary

Migrates sandbox lifecycle E2E coverage into the scenario validation suite so lifecycle, operations, and snapshot behavior are covered by reusable plan-driven assertions. This also expands parity metadata and framework tests so migrated lifecycle coverage stays visible and enforced.

Related Issue

Fixes #3813

Changes

  • Added reusable sandbox lifecycle helper assertions under test/e2e/validation_suites/lib/sandbox_lifecycle.sh.
  • Added sandbox lifecycle, operations, and snapshot validation suite scripts.
  • Registered lifecycle-related suites in test/e2e/validation_suites/suites.yaml.
  • Expanded test/e2e/docs/parity-map.yaml with migrated lifecycle coverage metadata.
  • Updated scenario framework tests for parity-map strictness, coverage reporting, and helper behavior.
  • Updated E2E docs to reference the lifecycle migration coverage.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Additional targeted validation run during implementation:

  • npx vitest run test/e2e/scenario-framework-tests/e2e-context-helper.test.ts test/e2e/scenario-framework-tests/e2e-convention-lint.test.ts test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts test/e2e/scenario-framework-tests/e2e-expected-failure.test.ts test/e2e/scenario-framework-tests/e2e-expected-state-validator.test.ts test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts test/e2e/scenario-framework-tests/e2e-parity-map.test.ts passed: 7 files, 66 tests.
  • test/e2e/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --plan-only passed.
  • npx tsx scripts/e2e/check-parity-map.ts --strict passed.

Signed-off-by: Julie Yaunches jyaunches@nvidia.com

Summary by CodeRabbit

  • Tests
    • Added comprehensive sandbox lifecycle, operations, and snapshot end-to-end scripts, a reusable validation harness and helpers, parity-map updates for validation mappings/metadata, and a coverage-report test.
  • Documentation
    • Clarified README instructions for PASS/FAIL log-line formatting.
  • Chores
    • Fixed CI action artifact naming to ensure the lint tool downloads correctly.

Review Change Stack

@jyaunches jyaunches self-assigned this May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Review Advisor

Recommendation: info only
Confidence: low
Analyzed HEAD: d7a0c7f97c620c3798c4b8f8b114e4b0d1f757a9
Findings: 0 blocker(s), 1 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Advisor execution failed: Could not configure advisor model openai/openai/gpt-5.5

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: d7a0c7f97c620c3798c4b8f8b114e4b0d1f757a9
Recommendation: info only
Confidence: low

PR review advisor failed: Could not configure advisor model openai/openai/gpt-5.5

Gate status

  • CI: pending — 7 status context(s) appear pending.
  • Mergeability: fail — mergeStateStatus=BLOCKED
  • Review threads: pass — 12 review thread(s), all resolved.
  • Risky code tested: pass — No risky code areas detected by path heuristics.

🔴 Blockers

  • None.

🟡 Warnings

  • PR review advisor unavailable: The automated advisor could not complete: Could not configure advisor model openai/openai/gpt-5.5
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: Could not configure advisor model openai/openai/gpt-5.5

🔵 Suggestions

  • None.

Acceptance coverage

  • No linked acceptance clauses were analyzed.

Security review

  • warning — Secrets and Credentials: Advisor unavailable; human review required.
  • warning — Input Validation and Data Sanitization: Advisor unavailable; human review required.
  • warning — Authentication and Authorization: Advisor unavailable; human review required.
  • warning — Dependencies and Third-Party Libraries: Advisor unavailable; human review required.
  • warning — Error Handling and Logging: Advisor unavailable; human review required.
  • warning — Cryptography and Data Protection: Advisor unavailable; human review required.
  • warning — Configuration and Security Headers: Advisor unavailable; human review required.
  • warning — Security Testing: Advisor unavailable; human review required.
  • warning — Holistic Security Posture: Advisor unavailable; human review required.

Test / E2E status

  • Test depth: e2e_required — Runtime/sandbox/infrastructure paths need real execution coverage: .github/actions/basic-checks/action.yaml.
  • E2E Advisor: not_found (not found)

✅ What looks good

  • No positives were identified by the advisor.

Review completeness

  • Advisor execution failed: Could not configure advisor model openai/openai/gpt-5.5
  • Human maintainer review required: yes

Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Fixed
Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Fixed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

E2E Advisor Recommendation

Required E2E: scenario:ubuntu-repo-cloud-openclaw:suites=sandbox-lifecycle,sandbox-operations, scenario:ubuntu-repo-cloud-openclaw:suites=snapshot-lifecycle, parity-compare:bucket=lifecycle
Optional E2E: scenario:ubuntu-repo-cloud-openclaw:full-default-suites, branch-validation:full

Dispatch hint: Run workflow_dispatch twice for scenario=ubuntu-repo-cloud-openclaw: first with suite_filter=sandbox-lifecycle,sandbox-operations, then with suite_filter=snapshot-lifecycle.

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • scenario:ubuntu-repo-cloud-openclaw:suites=sandbox-lifecycle,sandbox-operations (medium; live Ubuntu sandbox with NVIDIA_API_KEY): The PR changes live sandbox lifecycle and operations suites and their shared helper. Run the migrated scenario with the new suite filter to validate gateway health/recovery, sandbox listing/status, logs, and openshell exec against a real sandbox.
  • scenario:ubuntu-repo-cloud-openclaw:suites=snapshot-lifecycle (medium; live Ubuntu sandbox with destructive snapshot restore): The PR adds destructive snapshot create/list/restore validation and wires the opt-in snapshot-lifecycle suite. This should run separately because restore mutates sandbox state.
  • parity-compare:bucket=lifecycle (medium; parity workflow with live scenario/legacy comparison when scenario and legacy_script are provided): The parity map remaps lifecycle, sandbox operations, survival, and snapshot legacy assertions to the new validation IDs. Run the existing parity workflow for the lifecycle bucket to catch mapped assertion divergence and strict map issues.

Optional E2E

  • scenario:ubuntu-repo-cloud-openclaw:full-default-suites (medium; live Ubuntu sandbox with NVIDIA_API_KEY): Useful broader confidence that the added suites and parity metadata did not regress the baseline smoke, inference, credentials, or baseline-onboarding flow for the canonical Ubuntu OpenClaw scenario.
  • branch-validation:full (medium; Brev CPU instance plus NVIDIA_API_KEY): Provides clean-machine install/onboard/sandbox validation on Brev. Optional because this PR primarily changes migrated E2E validation assets rather than production installer or runtime code.

New E2E recommendations

  • sandbox-lifecycle (high): The new gateway recovery helper only probes health and exec; parity-map entries still defer crash-loop respawn, guard-chain retention, missing proxy-env warning, and soak assertions. Consider adding a dedicated migrated lifecycle recovery suite for those behaviors.
    • Suggested test: Add a scenario validation suite that deliberately restarts/kills the gateway process, verifies PID change, guard/preload chain retention, warning behavior for missing proxy-env, and repeated inference health during a bounded soak.
  • sandbox-snapshot-security (high): Snapshot parity entries for credential leak checks were explicitly deferred because the new snapshot lifecycle suite covers marker rollback but not credential sanitization in snapshot/backup directories.
    • Suggested test: Add a snapshot security validation step that creates/restores snapshots and scans snapshot/backup directories for NVIDIA_API_KEY, provider tokens, auth profiles, and other raw credentials.
  • sandbox-operations (medium): New operations coverage validates list/status/logs/exec for one sandbox, while parity-map entries for multi-sandbox metadata, registry rebuild, process recovery, destroy cleanup, and A/B isolation remain deferred.
    • Suggested test: Add an opt-in multi-sandbox operations suite covering two sandboxes, registry rebuild, destroy cleanup, metadata presence, process recovery, and cross-sandbox isolation checks.

Dispatch hint

  • Workflow: .github/workflows/e2e-scenarios.yaml
  • jobs input: Run workflow_dispatch twice for scenario=ubuntu-repo-cloud-openclaw: first with suite_filter=sandbox-lifecycle,sandbox-operations, then with suite_filter=snapshot-lifecycle.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Migrates sandbox lifecycle E2E coverage into the scenario framework by adding a shared assertion library, per-check validation scripts, suite wiring with explicit requires_state, parity-map migrations to validation.* IDs, and tests validating helpers and coverage reporting.

Changes

Sandbox Lifecycle E2E Coverage Migration

Layer / File(s) Summary
Parity-map metadata migration
test/e2e/docs/README.md, test/e2e/docs/parity-map.yaml
Legacy parity entries for crash-loop recovery, sandbox operations, sandbox survival, and snapshot commands were migrated to validation.* IDs with layer: validation and gap_domain: sandbox-lifecycle; some legacy prerequisites were reclassified as deferred. README formatting for assertion logging was adjusted.
Sandbox lifecycle assertion library
test/e2e/validation_suites/lib/sandbox_lifecycle.sh
Adds context loading from context.env, SANDBOX_LIFECYCLE_LAST_OUTPUT, sandbox_lifecycle_pass/fail, sandbox_lifecycle_run_with_timeout (dry-run aware), and assertion helpers for nemoclaw list/status/logs, openshell exec, gateway health/recovery, and snapshot create/list/restore.
Lifecycle validation scripts
test/e2e/validation_suites/sandbox/lifecycle/00-gateway-health.sh, test/e2e/validation_suites/sandbox/lifecycle/01-gateway-recovery.sh, test/e2e/validation_suites/sandbox/operations/00-list-and-status.sh, test/e2e/validation_suites/sandbox/operations/01-logs-and-exec.sh, test/e2e/validation_suites/sandbox/snapshot/00-create-list-restore.sh
Per-check executable scripts that source the lifecycle library, load context, and invoke specific assertion helpers to validate sandbox and gateway behavior.
Suite orchestration and wiring
test/e2e/validation_suites/suites.yaml
Defines explicit sandbox-lifecycle, sandbox-operations, snapshot, and snapshot-lifecycle suites with gateway/sandbox health requires_state conditions and step references to the new validation scripts.
Framework test validation
test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts, test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts
Adds coverage-report test asserting lifecycle scope appears in the rendered report and helper tests validating context loading, PASS/FAIL emission, timeout enforcement, and mocked external-CLI assertion flows with expected validation markers.
CI action: hadolint asset fix
.github/actions/basic-checks/action.yaml
Fix asset filename casing for hadolint download URL in the composite action.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3800: Both PRs touch the E2E parity/coverage-reporting layer by updating test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts to assert parity-related report sections (and accompanying parity-map/README guidance).

Suggested labels

Sandbox, v0.0.46

Suggested reviewers

  • cv
  • cjagwani

Poem

🐰 I hopped through scripts and parity lines,
I sourced context.env and checked the signs;
PASS on stdout, FAIL on the side,
Gateway probes and snapshots tried,
A little rabbit cheers the tests that shine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): migrate sandbox lifecycle coverage' directly and concisely describes the main change—migrating sandbox lifecycle E2E coverage into the scenario framework.
Linked Issues check ✅ Passed The PR fully addresses issue #3813 requirements: added sandbox_lifecycle.sh library with reusable helpers [#3813], migrated legacy assertions to validation suite scripts [#3813], registered suites in suites.yaml [#3813], updated parity-map.yaml with stable IDs and metadata [#3813], and added scenario framework tests [#3813].
Out of Scope Changes check ✅ Passed All changes are in-scope: sandbox lifecycle library and test suites address #3813, parity-map updates document the migration, scenario framework tests validate new helpers, and the hadolint URL fix is a supporting maintenance change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-3813-migrate-sandbox-lifecycle-coverage

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/validation_suites/suites.yaml (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add SPDX license header to this YAML source file.

This file is missing the required SPDX copyright and license header.

Proposed fix
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 suites:

As per coding guidelines, **/*.{js,ts,tsx,jsx,sh,yaml,yml,json,md,mdx}: Every source file must include an SPDX license header for copyright and Apache-2.0 license.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/validation_suites/suites.yaml` at line 1, This YAML is missing the
required SPDX license header; add the standard SPDX header lines at the very top
of this file (above the existing top-level key "suites:") including the SPDX
copyright text entry and the SPDX-License-Identifier set to Apache-2.0 so the
file complies with the project's licensing guideline.
♻️ Duplicate comments (1)
test/e2e/validation_suites/lib/sandbox_lifecycle.sh (1)

52-52: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Quote "$@" in the fallback command substitution.

At Line 52, unquoted $@ can re-split arguments and change command behavior on the non-timeout path.

Proposed fix
-    SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)" || {
+    SANDBOX_LIFECYCLE_LAST_OUTPUT="$("$@" 2>&1)" || {
#!/bin/bash
shellcheck -s bash test/e2e/validation_suites/lib/sandbox_lifecycle.sh
rg -n '\$\(\$@' test/e2e/validation_suites/lib/sandbox_lifecycle.sh

As per coding guidelines, **/*.sh: Shell scripts must be enforced by ShellCheck (.shellcheckrc) and formatted with shfmt.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh` at line 52, The
assignment to SANDBOX_LIFECYCLE_LAST_OUTPUT uses an unquoted $@ in the fallback
command substitution which can re-split arguments and change behavior; modify
the command substitution to use quoted "$@" instead so the exact arguments are
preserved (update the line that sets SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)"
to use "$@" within the substitution), ensuring the non-timeout fallback path
receives the same arguments as the timeout path.
🧹 Nitpick comments (2)
test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts (1)

470-470: ⚡ Quick win

Use the repo’s PATH fallback pattern in mocked command env.

Line 470 should use PATH: \${bin}:${process.env.PATH || ""}`to avoid brittle concatenation whenPATH` is unset in isolated test environments.

Suggested patch
-      const r = runBash(`set -euo pipefail; . "${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh"; sandbox_lifecycle_load_context; sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox; sandbox_lifecycle_assert_status_fields_present; sandbox_lifecycle_assert_logs_available; sandbox_lifecycle_assert_openshell_exec_ok`, { E2E_CONTEXT_DIR: tmp, PATH: `${bin}:${process.env.PATH}` });
+      const r = runBash(`set -euo pipefail; . "${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh"; sandbox_lifecycle_load_context; sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox; sandbox_lifecycle_assert_status_fields_present; sandbox_lifecycle_assert_logs_available; sandbox_lifecycle_assert_openshell_exec_ok`, { E2E_CONTEXT_DIR: tmp, PATH: `${bin}:${process.env.PATH || ""}` });

Based on learnings: In this repo’s tests, prefer PATH: \${fakeBin}:${process.env.PATH || ""}`with POSIX:` separator.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts` at line 470,
Update the runBash invocation so the mocked command environment uses the repo's
PATH fallback pattern: when calling runBash (the call that sources
"${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh" and runs sandbox_lifecycle_*
assertions) set the PATH env to use the fallback `${bin}:${process.env.PATH ||
""}` (i.e. include the empty-string fallback) instead of
`${bin}:${process.env.PATH}` to avoid failures when PATH is unset in isolated
test environments.
test/e2e/scenario-framework-tests/e2e-parity-map.test.ts (1)

92-92: ⚡ Quick win

Hard-coded retirement date makes this test unnecessarily brittle.

Line 92 enforces approved_at: 2026-05-20 exactly. Any valid future metadata update will fail this test even when classification is correct. Prefer checking presence and date format instead of a fixed date literal.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/scenario-framework-tests/e2e-parity-map.test.ts` at line 92, The
test currently asserts a hard-coded approval date using
expect(entry).toMatch(...), which is brittle; update the assertion in the
e2e-parity-map.test to check presence and valid date format instead of the fixed
literal: replace the exact-date regex with one that matches an ISO date (e.g.,
YYYY-MM-DD) allowing optional quotes and whitespace, or alternatively parse the
captured value with Date.parse to assert it's a valid date; keep using the same
expect(entry).toMatch / expect(...) pattern so the change is localized to the
assertion for entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/docs/parity-map.yaml`:
- Around line 9782-9787: The mapping uses the existing id
"validation.sandbox_operations.openshell_exec_ok" which collides with the
exec/chat coverage; replace that id with a restart-specific identifier (for
example "validation.sandbox_operations.restart_pod_ready" or
"validation.sandbox_operations.restart_pod_not_ready" depending on whether the
status is OK or representing a gap) so the entry "Sandbox pod did not reach
Running/Ready after restart" is tracked separately from the exec path; update
the id value on the YAML block containing status: mapped, layer: validation,
gap_domain: sandbox-lifecycle, owner: e2e-maintainers to the chosen
restart-specific id.
- Around line 10648-10661: The two parity-map entries mapping the legacy
messages "No credentials in snapshot directories" and "Credentials found:
$CRED_LEAKS" to id validation.sandbox_snapshot.create_succeeds are incorrect;
remove or change those mappings so credential-leak assertions are not marked as
covered by create_succeeds—either set their status to deferred (leave them
unmapped) or remap them to a dedicated snapshot leak/no-credentials assertion
(e.g., a future validation.sandbox_snapshot.no_credentials) once that test
exists; update the entries that reference the legacy strings and the id
validation.sandbox_snapshot.create_succeeds accordingly so leak checks remain
separate from create/list/restore coverage.
- Around line 4640-4646: The YAML remaps are incorrectly mapping prerequisite
checks (e.g., the legacy note "nemoclaw on PATH", "Docker is running", "NemoClaw
installed") to behavior IDs like validation.sandbox_lifecycle.gateway_health and
marker_written; instead, change those entries so their status is "deferred" or
replace them with dedicated preflight IDs (create new IDs such as
preflight.sandbox.nemoclaw_present or similar) rather than reusing behavior IDs;
update the specific entries that reference
validation.sandbox_lifecycle.gateway_health and marker_written (and the similar
blocks around the other occurrences noted) to use "deferred" or the new
preflight IDs and ensure owner/reusable metadata remains consistent.

In `@test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts`:
- Around line 121-127: The test named
"test_should_report_scoped_lifecycle_parity_at_or_above_100_percent" only checks
for section presence (using loadMetadataFromDir and renderCoverageReport into
md) so it doesn't enforce the 100% threshold; update the test to parse the
lifecycle parity percentage out of md (e.g., with a regex against md) and assert
the numeric value is >= 100, referencing the existing md variable and the
renderCoverageReport output (or alternatively rename the test to reflect
"section presence" if you prefer not to assert the numeric threshold).

In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts`:
- Around line 458-461: The test currently only asserts runBash(...) returns a
non-zero status, which allows unrelated failures to pass; update the
"test_should_apply_timeout_to_command_execution" case to assert timeout-specific
semantics from the runBash result: call runBash with `.
"${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh";
sandbox_lifecycle_run_with_timeout 1 bash -c 'sleep 5'` as before, then assert
either the canonical timeout exit code (e.g., r.status === 124) or that
r.stdout/r.stderr contains a timeout marker (e.g., matches /timeout|timed out/)
so the test verifies sandbox_lifecycle_run_with_timeout actually timed out
rather than failing for another reason.

In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh`:
- Around line 112-120: In
sandbox_lifecycle_assert_snapshot_create_list_restore_marker, the "marker
written" and "marker rolled back" assertions are meaningless because no marker
is written or checked; fix by explicitly creating a marker in the sandbox before
taking the snapshot and verifying its state after restore: use the sandbox
manipulation commands already used in this script (the same nemoclaw/sandbox
invocation pattern) to write a sentinel (e.g., create a file or set a flag) in
the sandbox prior to calling "nemoclaw snapshot create" and assert its presence
with sandbox_lifecycle_pass; after "nemoclaw snapshot restore ... latest" verify
the marker has been removed or reverted as expected and call
sandbox_lifecycle_pass or sandbox_lifecycle_fail accordingly so the existing
messages ("marker written" and "marker rolled back") reflect real checks.
- Around line 63-66: Several assertion functions call sandbox_lifecycle_fail via
|| but then continue execution, causing both FAIL and PASS to be reported;
update each affected function
(sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox,
sandbox_lifecycle_assert_status_fields_present,
sandbox_lifecycle_assert_logs_available,
sandbox_lifecycle_assert_openshell_exec_ok,
sandbox_lifecycle_assert_gateway_health,
sandbox_lifecycle_assert_snapshot_create_list_restore_marker) to immediately
exit after calling sandbox_lifecycle_fail by adding an explicit "return 1" (or
equivalent early return) right after each sandbox_lifecycle_fail invocation so
the function stops and does not proceed to sandbox_lifecycle_pass. Ensure you
add the return in every branch where sandbox_lifecycle_fail is used.

---

Outside diff comments:
In `@test/e2e/validation_suites/suites.yaml`:
- Line 1: This YAML is missing the required SPDX license header; add the
standard SPDX header lines at the very top of this file (above the existing
top-level key "suites:") including the SPDX copyright text entry and the
SPDX-License-Identifier set to Apache-2.0 so the file complies with the
project's licensing guideline.

---

Duplicate comments:
In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh`:
- Line 52: The assignment to SANDBOX_LIFECYCLE_LAST_OUTPUT uses an unquoted $@
in the fallback command substitution which can re-split arguments and change
behavior; modify the command substitution to use quoted "$@" instead so the
exact arguments are preserved (update the line that sets
SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)" to use "$@" within the substitution),
ensuring the non-timeout fallback path receives the same arguments as the
timeout path.

---

Nitpick comments:
In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts`:
- Line 470: Update the runBash invocation so the mocked command environment uses
the repo's PATH fallback pattern: when calling runBash (the call that sources
"${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh" and runs sandbox_lifecycle_*
assertions) set the PATH env to use the fallback `${bin}:${process.env.PATH ||
""}` (i.e. include the empty-string fallback) instead of
`${bin}:${process.env.PATH}` to avoid failures when PATH is unset in isolated
test environments.

In `@test/e2e/scenario-framework-tests/e2e-parity-map.test.ts`:
- Line 92: The test currently asserts a hard-coded approval date using
expect(entry).toMatch(...), which is brittle; update the assertion in the
e2e-parity-map.test to check presence and valid date format instead of the fixed
literal: replace the exact-date regex with one that matches an ISO date (e.g.,
YYYY-MM-DD) allowing optional quotes and whitespace, or alternatively parse the
captured value with Date.parse to assert it's a valid date; keep using the same
expect(entry).toMatch / expect(...) pattern so the change is localized to the
assertion for entry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 708860d0-25b2-4c54-b107-1a6711143381

📥 Commits

Reviewing files that changed from the base of the PR and between ca045a9 and f621825.

📒 Files selected for processing (12)
  • test/e2e/docs/README.md
  • test/e2e/docs/parity-map.yaml
  • test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts
  • test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts
  • test/e2e/scenario-framework-tests/e2e-parity-map.test.ts
  • test/e2e/validation_suites/lib/sandbox_lifecycle.sh
  • test/e2e/validation_suites/sandbox/lifecycle/00-gateway-health.sh
  • test/e2e/validation_suites/sandbox/lifecycle/01-gateway-recovery.sh
  • test/e2e/validation_suites/sandbox/operations/00-list-and-status.sh
  • test/e2e/validation_suites/sandbox/operations/01-logs-and-exec.sh
  • test/e2e/validation_suites/sandbox/snapshot/00-create-list-restore.sh
  • test/e2e/validation_suites/suites.yaml

Comment thread test/e2e/docs/parity-map.yaml Outdated
Comment thread test/e2e/docs/parity-map.yaml Outdated
Comment thread test/e2e/docs/parity-map.yaml Outdated
Comment thread test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts Outdated
Comment thread test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts
Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Outdated
Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh
@wscurran wscurran added E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. fix labels May 20, 2026
@wscurran
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches added the v0.0.47 Release target label May 20, 2026
Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Fixed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/docs/parity-map.yaml`:
- Around line 10647-10660: The two deferred parity entries whose legacy messages
are "No credentials in snapshot directories" and "Credentials found:
$CRED_LEAKS" currently put the coverage explanation into runner_requirement;
move the explanatory text back into the reason field (keeping "snapshot
credential-leak coverage is not asserted by
validation.sandbox_snapshot.create_succeeds") and replace runner_requirement
with the actual execution environment used elsewhere in this file (i.e., set
runner_requirement to the canonical runner name used for sandbox snapshot checks
instead of "dedicated snapshot credential leak assertion").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e299b67d-c111-476c-a122-0d6b01cdd1ee

📥 Commits

Reviewing files that changed from the base of the PR and between 53f7222 and 973fd12.

📒 Files selected for processing (5)
  • .github/actions/basic-checks/action.yaml
  • test/e2e/docs/parity-map.yaml
  • test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts
  • test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts
  • test/e2e/validation_suites/lib/sandbox_lifecycle.sh

Comment thread test/e2e/docs/parity-map.yaml Outdated
Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Fixed
@jyaunches jyaunches merged commit e122450 into main May 20, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. fix v0.0.47 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(e2e): migrate sandbox lifecycle coverage to scenario suites

4 participants