test(e2e): migrate sandbox lifecycle coverage by jyaunches · Pull Request #3902 · NVIDIA/NemoClaw

jyaunches · 2026-05-20T12:39:31Z

Summary

Migrates sandbox lifecycle E2E coverage into the scenario validation suite so lifecycle, operations, and snapshot behavior are covered by reusable plan-driven assertions. This also expands parity metadata and framework tests so migrated lifecycle coverage stays visible and enforced.

Related Issue

Fixes #3813

Changes

Added reusable sandbox lifecycle helper assertions under test/e2e/validation_suites/lib/sandbox_lifecycle.sh.
Added sandbox lifecycle, operations, and snapshot validation suite scripts.
Registered lifecycle-related suites in test/e2e/validation_suites/suites.yaml.
Expanded test/e2e/docs/parity-map.yaml with migrated lifecycle coverage metadata.
Updated scenario framework tests for parity-map strictness, coverage reporting, and helper behavior.
Updated E2E docs to reference the lifecycle migration coverage.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Additional targeted validation run during implementation:

npx vitest run test/e2e/scenario-framework-tests/e2e-context-helper.test.ts test/e2e/scenario-framework-tests/e2e-convention-lint.test.ts test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts test/e2e/scenario-framework-tests/e2e-expected-failure.test.ts test/e2e/scenario-framework-tests/e2e-expected-state-validator.test.ts test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts test/e2e/scenario-framework-tests/e2e-parity-map.test.ts passed: 7 files, 66 tests.
test/e2e/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --plan-only passed.
npx tsx scripts/e2e/check-parity-map.ts --strict passed.

Signed-off-by: Julie Yaunches jyaunches@nvidia.com

Summary by CodeRabbit

Tests
- Added comprehensive sandbox lifecycle, operations, and snapshot end-to-end scripts, a reusable validation harness and helpers, parity-map updates for validation mappings/metadata, and a coverage-report test.
Documentation
- Clarified README instructions for PASS/FAIL log-line formatting.
Chores
- Fixed CI action artifact naming to ensure the lint tool downloads correctly.

github-actions · 2026-05-20T12:40:05Z

PR Review Advisor

Recommendation: info only
Confidence: low
Analyzed HEAD: d7a0c7f97c620c3798c4b8f8b114e4b0d1f757a9
Findings: 0 blocker(s), 1 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Advisor execution failed: Could not configure advisor model openai/openai/gpt-5.5

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: d7a0c7f97c620c3798c4b8f8b114e4b0d1f757a9
Recommendation: info only
Confidence: low

PR review advisor failed: Could not configure advisor model openai/openai/gpt-5.5

Gate status

CI: pending — 7 status context(s) appear pending.
Mergeability: fail — mergeStateStatus=BLOCKED
Review threads: pass — 12 review thread(s), all resolved.
Risky code tested: pass — No risky code areas detected by path heuristics.

🔴 Blockers

None.

🟡 Warnings

PR review advisor unavailable: The automated advisor could not complete: Could not configure advisor model openai/openai/gpt-5.5
- Recommendation: Re-run the PR Review Advisor or perform a manual review.
- Evidence: Could not configure advisor model openai/openai/gpt-5.5

🔵 Suggestions

None.

Acceptance coverage

No linked acceptance clauses were analyzed.

Security review

warning — Secrets and Credentials: Advisor unavailable; human review required.
warning — Input Validation and Data Sanitization: Advisor unavailable; human review required.
warning — Authentication and Authorization: Advisor unavailable; human review required.
warning — Dependencies and Third-Party Libraries: Advisor unavailable; human review required.
warning — Error Handling and Logging: Advisor unavailable; human review required.
warning — Cryptography and Data Protection: Advisor unavailable; human review required.
warning — Configuration and Security Headers: Advisor unavailable; human review required.
warning — Security Testing: Advisor unavailable; human review required.
warning — Holistic Security Posture: Advisor unavailable; human review required.

Test / E2E status

Test depth: e2e_required — Runtime/sandbox/infrastructure paths need real execution coverage: .github/actions/basic-checks/action.yaml.
E2E Advisor: not_found (not found)

✅ What looks good

No positives were identified by the advisor.

Review completeness

Advisor execution failed: Could not configure advisor model openai/openai/gpt-5.5
Human maintainer review required: yes

github-actions · 2026-05-20T12:41:41Z

E2E Advisor Recommendation

Required E2E: scenario:ubuntu-repo-cloud-openclaw:suites=sandbox-lifecycle,sandbox-operations, scenario:ubuntu-repo-cloud-openclaw:suites=snapshot-lifecycle, parity-compare:bucket=lifecycle
Optional E2E: scenario:ubuntu-repo-cloud-openclaw:full-default-suites, branch-validation:full

Dispatch hint: Run workflow_dispatch twice for scenario=ubuntu-repo-cloud-openclaw: first with suite_filter=sandbox-lifecycle,sandbox-operations, then with suite_filter=snapshot-lifecycle.

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

scenario:ubuntu-repo-cloud-openclaw:suites=sandbox-lifecycle,sandbox-operations (medium; live Ubuntu sandbox with NVIDIA_API_KEY): The PR changes live sandbox lifecycle and operations suites and their shared helper. Run the migrated scenario with the new suite filter to validate gateway health/recovery, sandbox listing/status, logs, and openshell exec against a real sandbox.
scenario:ubuntu-repo-cloud-openclaw:suites=snapshot-lifecycle (medium; live Ubuntu sandbox with destructive snapshot restore): The PR adds destructive snapshot create/list/restore validation and wires the opt-in snapshot-lifecycle suite. This should run separately because restore mutates sandbox state.
parity-compare:bucket=lifecycle (medium; parity workflow with live scenario/legacy comparison when scenario and legacy_script are provided): The parity map remaps lifecycle, sandbox operations, survival, and snapshot legacy assertions to the new validation IDs. Run the existing parity workflow for the lifecycle bucket to catch mapped assertion divergence and strict map issues.

Optional E2E

scenario:ubuntu-repo-cloud-openclaw:full-default-suites (medium; live Ubuntu sandbox with NVIDIA_API_KEY): Useful broader confidence that the added suites and parity metadata did not regress the baseline smoke, inference, credentials, or baseline-onboarding flow for the canonical Ubuntu OpenClaw scenario.
branch-validation:full (medium; Brev CPU instance plus NVIDIA_API_KEY): Provides clean-machine install/onboard/sandbox validation on Brev. Optional because this PR primarily changes migrated E2E validation assets rather than production installer or runtime code.

New E2E recommendations

sandbox-lifecycle (high): The new gateway recovery helper only probes health and exec; parity-map entries still defer crash-loop respawn, guard-chain retention, missing proxy-env warning, and soak assertions. Consider adding a dedicated migrated lifecycle recovery suite for those behaviors.
- Suggested test: Add a scenario validation suite that deliberately restarts/kills the gateway process, verifies PID change, guard/preload chain retention, warning behavior for missing proxy-env, and repeated inference health during a bounded soak.
sandbox-snapshot-security (high): Snapshot parity entries for credential leak checks were explicitly deferred because the new snapshot lifecycle suite covers marker rollback but not credential sanitization in snapshot/backup directories.
- Suggested test: Add a snapshot security validation step that creates/restores snapshots and scans snapshot/backup directories for NVIDIA_API_KEY, provider tokens, auth profiles, and other raw credentials.
sandbox-operations (medium): New operations coverage validates list/status/logs/exec for one sandbox, while parity-map entries for multi-sandbox metadata, registry rebuild, process recovery, destroy cleanup, and A/B isolation remain deferred.
- Suggested test: Add an opt-in multi-sandbox operations suite covering two sandboxes, registry rebuild, destroy cleanup, metadata presence, process recovery, and cross-sandbox isolation checks.

Dispatch hint

Workflow: .github/workflows/e2e-scenarios.yaml
jobs input: Run workflow_dispatch twice for scenario=ubuntu-repo-cloud-openclaw: first with suite_filter=sandbox-lifecycle,sandbox-operations, then with suite_filter=snapshot-lifecycle.

coderabbitai · 2026-05-20T12:42:35Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Migrates sandbox lifecycle E2E coverage into the scenario framework by adding a shared assertion library, per-check validation scripts, suite wiring with explicit requires_state, parity-map migrations to validation.* IDs, and tests validating helpers and coverage reporting.

Changes

Sandbox Lifecycle E2E Coverage Migration

Layer / File(s)	Summary
Parity-map metadata migration `test/e2e/docs/README.md`, `test/e2e/docs/parity-map.yaml`	Legacy parity entries for crash-loop recovery, sandbox operations, sandbox survival, and snapshot commands were migrated to `validation.*` IDs with `layer: validation` and `gap_domain: sandbox-lifecycle`; some legacy prerequisites were reclassified as `deferred`. README formatting for assertion logging was adjusted.
Sandbox lifecycle assertion library `test/e2e/validation_suites/lib/sandbox_lifecycle.sh`	Adds context loading from `context.env`, `SANDBOX_LIFECYCLE_LAST_OUTPUT`, `sandbox_lifecycle_pass`/`fail`, `sandbox_lifecycle_run_with_timeout` (dry-run aware), and assertion helpers for nemoclaw list/status/logs, openshell exec, gateway health/recovery, and snapshot create/list/restore.
Lifecycle validation scripts `test/e2e/validation_suites/sandbox/lifecycle/00-gateway-health.sh`, `test/e2e/validation_suites/sandbox/lifecycle/01-gateway-recovery.sh`, `test/e2e/validation_suites/sandbox/operations/00-list-and-status.sh`, `test/e2e/validation_suites/sandbox/operations/01-logs-and-exec.sh`, `test/e2e/validation_suites/sandbox/snapshot/00-create-list-restore.sh`	Per-check executable scripts that source the lifecycle library, load context, and invoke specific assertion helpers to validate sandbox and gateway behavior.
Suite orchestration and wiring `test/e2e/validation_suites/suites.yaml`	Defines explicit `sandbox-lifecycle`, `sandbox-operations`, `snapshot`, and `snapshot-lifecycle` suites with gateway/sandbox health `requires_state` conditions and step references to the new validation scripts.
Framework test validation `test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts`, `test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts`	Adds coverage-report test asserting lifecycle scope appears in the rendered report and helper tests validating context loading, PASS/FAIL emission, timeout enforcement, and mocked external-CLI assertion flows with expected validation markers.
CI action: hadolint asset fix `.github/actions/basic-checks/action.yaml`	Fix asset filename casing for hadolint download URL in the composite action.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

NVIDIA/NemoClaw#3800: Both PRs touch the E2E parity/coverage-reporting layer by updating test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts to assert parity-related report sections (and accompanying parity-map/README guidance).

Suggested labels

Sandbox, v0.0.46

Suggested reviewers

cv
cjagwani

Poem

🐰 I hopped through scripts and parity lines,
I sourced context.env and checked the signs;
PASS on stdout, FAIL on the side,
Gateway probes and snapshots tried,
A little rabbit cheers the tests that shine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(e2e): migrate sandbox lifecycle coverage' directly and concisely describes the main change—migrating sandbox lifecycle E2E coverage into the scenario framework.
Linked Issues check	✅ Passed	The PR fully addresses issue `#3813` requirements: added sandbox_lifecycle.sh library with reusable helpers [`#3813`], migrated legacy assertions to validation suite scripts [`#3813`], registered suites in suites.yaml [`#3813`], updated parity-map.yaml with stable IDs and metadata [`#3813`], and added scenario framework tests [`#3813`].
Out of Scope Changes check	✅ Passed	All changes are in-scope: sandbox lifecycle library and test suites address `#3813`, parity-map updates document the migration, scenario framework tests validate new helpers, and the hadolint URL fix is a supporting maintenance change.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-3813-migrate-sandbox-lifecycle-coverage

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/e2e/validation_suites/suites.yaml (1)
1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add SPDX license header to this YAML source file.

This file is missing the required SPDX copyright and license header.
Proposed fix
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 suites:
As per coding guidelines, **/*.{js,ts,tsx,jsx,sh,yaml,yml,json,md,mdx}: Every source file must include an SPDX license header for copyright and Apache-2.0 license.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/validation_suites/suites.yaml` at line 1, This YAML is missing the
required SPDX license header; add the standard SPDX header lines at the very top
of this file (above the existing top-level key "suites:") including the SPDX
copyright text entry and the SPDX-License-Identifier set to Apache-2.0 so the
file complies with the project's licensing guideline.

♻️ Duplicate comments (1)

test/e2e/validation_suites/lib/sandbox_lifecycle.sh (1)

52-52: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Quote "$@" in the fallback command substitution.

At Line 52, unquoted $@ can re-split arguments and change command behavior on the non-timeout path.

Proposed fix

-    SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)" || {
+    SANDBOX_LIFECYCLE_LAST_OUTPUT="$("$@" 2>&1)" || {

#!/bin/bash
shellcheck -s bash test/e2e/validation_suites/lib/sandbox_lifecycle.sh
rg -n '\$\(\$@' test/e2e/validation_suites/lib/sandbox_lifecycle.sh

As per coding guidelines, **/*.sh: Shell scripts must be enforced by ShellCheck (.shellcheckrc) and formatted with shfmt.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh` at line 52, The
assignment to SANDBOX_LIFECYCLE_LAST_OUTPUT uses an unquoted $@ in the fallback
command substitution which can re-split arguments and change behavior; modify
the command substitution to use quoted "$@" instead so the exact arguments are
preserved (update the line that sets SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)"
to use "$@" within the substitution), ensuring the non-timeout fallback path
receives the same arguments as the timeout path.

🧹 Nitpick comments (2)

test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts (1)

470-470: ⚡ Quick win

Use the repo’s PATH fallback pattern in mocked command env.

Line 470 should use PATH: \${bin}:${process.env.PATH || ""}`to avoid brittle concatenation whenPATH` is unset in isolated test environments.

Suggested patch

-      const r = runBash(`set -euo pipefail; . "${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh"; sandbox_lifecycle_load_context; sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox; sandbox_lifecycle_assert_status_fields_present; sandbox_lifecycle_assert_logs_available; sandbox_lifecycle_assert_openshell_exec_ok`, { E2E_CONTEXT_DIR: tmp, PATH: `${bin}:${process.env.PATH}` });
+      const r = runBash(`set -euo pipefail; . "${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh"; sandbox_lifecycle_load_context; sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox; sandbox_lifecycle_assert_status_fields_present; sandbox_lifecycle_assert_logs_available; sandbox_lifecycle_assert_openshell_exec_ok`, { E2E_CONTEXT_DIR: tmp, PATH: `${bin}:${process.env.PATH || ""}` });

Based on learnings: In this repo’s tests, prefer PATH: \${fakeBin}:${process.env.PATH || ""}`with POSIX:` separator.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts` at line 470,
Update the runBash invocation so the mocked command environment uses the repo's
PATH fallback pattern: when calling runBash (the call that sources
"${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh" and runs sandbox_lifecycle_*
assertions) set the PATH env to use the fallback `${bin}:${process.env.PATH ||
""}` (i.e. include the empty-string fallback) instead of
`${bin}:${process.env.PATH}` to avoid failures when PATH is unset in isolated
test environments.

test/e2e/scenario-framework-tests/e2e-parity-map.test.ts (1)

92-92: ⚡ Quick win

Hard-coded retirement date makes this test unnecessarily brittle.

Line 92 enforces approved_at: 2026-05-20 exactly. Any valid future metadata update will fail this test even when classification is correct. Prefer checking presence and date format instead of a fixed date literal.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/scenario-framework-tests/e2e-parity-map.test.ts` at line 92, The
test currently asserts a hard-coded approval date using
expect(entry).toMatch(...), which is brittle; update the assertion in the
e2e-parity-map.test to check presence and valid date format instead of the fixed
literal: replace the exact-date regex with one that matches an ISO date (e.g.,
YYYY-MM-DD) allowing optional quotes and whitespace, or alternatively parse the
captured value with Date.parse to assert it's a valid date; keep using the same
expect(entry).toMatch / expect(...) pattern so the change is localized to the
assertion for entry.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/docs/parity-map.yaml`:
- Around line 9782-9787: The mapping uses the existing id
"validation.sandbox_operations.openshell_exec_ok" which collides with the
exec/chat coverage; replace that id with a restart-specific identifier (for
example "validation.sandbox_operations.restart_pod_ready" or
"validation.sandbox_operations.restart_pod_not_ready" depending on whether the
status is OK or representing a gap) so the entry "Sandbox pod did not reach
Running/Ready after restart" is tracked separately from the exec path; update
the id value on the YAML block containing status: mapped, layer: validation,
gap_domain: sandbox-lifecycle, owner: e2e-maintainers to the chosen
restart-specific id.
- Around line 10648-10661: The two parity-map entries mapping the legacy
messages "No credentials in snapshot directories" and "Credentials found:
$CRED_LEAKS" to id validation.sandbox_snapshot.create_succeeds are incorrect;
remove or change those mappings so credential-leak assertions are not marked as
covered by create_succeeds—either set their status to deferred (leave them
unmapped) or remap them to a dedicated snapshot leak/no-credentials assertion
(e.g., a future validation.sandbox_snapshot.no_credentials) once that test
exists; update the entries that reference the legacy strings and the id
validation.sandbox_snapshot.create_succeeds accordingly so leak checks remain
separate from create/list/restore coverage.
- Around line 4640-4646: The YAML remaps are incorrectly mapping prerequisite
checks (e.g., the legacy note "nemoclaw on PATH", "Docker is running", "NemoClaw
installed") to behavior IDs like validation.sandbox_lifecycle.gateway_health and
marker_written; instead, change those entries so their status is "deferred" or
replace them with dedicated preflight IDs (create new IDs such as
preflight.sandbox.nemoclaw_present or similar) rather than reusing behavior IDs;
update the specific entries that reference
validation.sandbox_lifecycle.gateway_health and marker_written (and the similar
blocks around the other occurrences noted) to use "deferred" or the new
preflight IDs and ensure owner/reusable metadata remains consistent.

In `@test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts`:
- Around line 121-127: The test named
"test_should_report_scoped_lifecycle_parity_at_or_above_100_percent" only checks
for section presence (using loadMetadataFromDir and renderCoverageReport into
md) so it doesn't enforce the 100% threshold; update the test to parse the
lifecycle parity percentage out of md (e.g., with a regex against md) and assert
the numeric value is >= 100, referencing the existing md variable and the
renderCoverageReport output (or alternatively rename the test to reflect
"section presence" if you prefer not to assert the numeric threshold).

In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts`:
- Around line 458-461: The test currently only asserts runBash(...) returns a
non-zero status, which allows unrelated failures to pass; update the
"test_should_apply_timeout_to_command_execution" case to assert timeout-specific
semantics from the runBash result: call runBash with `.
"${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh";
sandbox_lifecycle_run_with_timeout 1 bash -c 'sleep 5'` as before, then assert
either the canonical timeout exit code (e.g., r.status === 124) or that
r.stdout/r.stderr contains a timeout marker (e.g., matches /timeout|timed out/)
so the test verifies sandbox_lifecycle_run_with_timeout actually timed out
rather than failing for another reason.

In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh`:
- Around line 112-120: In
sandbox_lifecycle_assert_snapshot_create_list_restore_marker, the "marker
written" and "marker rolled back" assertions are meaningless because no marker
is written or checked; fix by explicitly creating a marker in the sandbox before
taking the snapshot and verifying its state after restore: use the sandbox
manipulation commands already used in this script (the same nemoclaw/sandbox
invocation pattern) to write a sentinel (e.g., create a file or set a flag) in
the sandbox prior to calling "nemoclaw snapshot create" and assert its presence
with sandbox_lifecycle_pass; after "nemoclaw snapshot restore ... latest" verify
the marker has been removed or reverted as expected and call
sandbox_lifecycle_pass or sandbox_lifecycle_fail accordingly so the existing
messages ("marker written" and "marker rolled back") reflect real checks.
- Around line 63-66: Several assertion functions call sandbox_lifecycle_fail via
|| but then continue execution, causing both FAIL and PASS to be reported;
update each affected function
(sandbox_lifecycle_assert_nemoclaw_list_contains_sandbox,
sandbox_lifecycle_assert_status_fields_present,
sandbox_lifecycle_assert_logs_available,
sandbox_lifecycle_assert_openshell_exec_ok,
sandbox_lifecycle_assert_gateway_health,
sandbox_lifecycle_assert_snapshot_create_list_restore_marker) to immediately
exit after calling sandbox_lifecycle_fail by adding an explicit "return 1" (or
equivalent early return) right after each sandbox_lifecycle_fail invocation so
the function stops and does not proceed to sandbox_lifecycle_pass. Ensure you
add the return in every branch where sandbox_lifecycle_fail is used.

---

Outside diff comments:
In `@test/e2e/validation_suites/suites.yaml`:
- Line 1: This YAML is missing the required SPDX license header; add the
standard SPDX header lines at the very top of this file (above the existing
top-level key "suites:") including the SPDX copyright text entry and the
SPDX-License-Identifier set to Apache-2.0 so the file complies with the
project's licensing guideline.

---

Duplicate comments:
In `@test/e2e/validation_suites/lib/sandbox_lifecycle.sh`:
- Line 52: The assignment to SANDBOX_LIFECYCLE_LAST_OUTPUT uses an unquoted $@
in the fallback command substitution which can re-split arguments and change
behavior; modify the command substitution to use quoted "$@" instead so the
exact arguments are preserved (update the line that sets
SANDBOX_LIFECYCLE_LAST_OUTPUT="$($@ 2>&1)" to use "$@" within the substitution),
ensuring the non-timeout fallback path receives the same arguments as the
timeout path.

---

Nitpick comments:
In `@test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts`:
- Line 470: Update the runBash invocation so the mocked command environment uses
the repo's PATH fallback pattern: when calling runBash (the call that sources
"${VALIDATION_SUITES}/lib/sandbox_lifecycle.sh" and runs sandbox_lifecycle_*
assertions) set the PATH env to use the fallback `${bin}:${process.env.PATH ||
""}` (i.e. include the empty-string fallback) instead of
`${bin}:${process.env.PATH}` to avoid failures when PATH is unset in isolated
test environments.

In `@test/e2e/scenario-framework-tests/e2e-parity-map.test.ts`:
- Line 92: The test currently asserts a hard-coded approval date using
expect(entry).toMatch(...), which is brittle; update the assertion in the
e2e-parity-map.test to check presence and valid date format instead of the fixed
literal: replace the exact-date regex with one that matches an ISO date (e.g.,
YYYY-MM-DD) allowing optional quotes and whitespace, or alternatively parse the
captured value with Date.parse to assert it's a valid date; keep using the same
expect(entry).toMatch / expect(...) pattern so the change is localized to the
assertion for entry.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 708860d0-25b2-4c54-b107-1a6711143381

📥 Commits

Reviewing files that changed from the base of the PR and between ca045a9 and f621825.

📒 Files selected for processing (12)

test/e2e/docs/README.md
test/e2e/docs/parity-map.yaml
test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts
test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts
test/e2e/scenario-framework-tests/e2e-parity-map.test.ts
test/e2e/validation_suites/lib/sandbox_lifecycle.sh
test/e2e/validation_suites/sandbox/lifecycle/00-gateway-health.sh
test/e2e/validation_suites/sandbox/lifecycle/01-gateway-recovery.sh
test/e2e/validation_suites/sandbox/operations/00-list-and-status.sh
test/e2e/validation_suites/sandbox/operations/01-logs-and-exec.sh
test/e2e/validation_suites/sandbox/snapshot/00-create-list-restore.sh
test/e2e/validation_suites/suites.yaml

…ndbox-lifecycle-coverage

wscurran · 2026-05-20T16:01:25Z

✨ Related open issues:

#3813 test(e2e): migrate sandbox lifecycle coverage to scenario suites

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/docs/parity-map.yaml`:
- Around line 10647-10660: The two deferred parity entries whose legacy messages
are "No credentials in snapshot directories" and "Credentials found:
$CRED_LEAKS" currently put the coverage explanation into runner_requirement;
move the explanatory text back into the reason field (keeping "snapshot
credential-leak coverage is not asserted by
validation.sandbox_snapshot.create_succeeds") and replace runner_requirement
with the actual execution environment used elsewhere in this file (i.e., set
runner_requirement to the canonical runner name used for sandbox snapshot checks
instead of "dedicated snapshot credential leak assertion").

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e299b67d-c111-476c-a122-0d6b01cdd1ee

📥 Commits

Reviewing files that changed from the base of the PR and between 53f7222 and 973fd12.

📒 Files selected for processing (5)

.github/actions/basic-checks/action.yaml
test/e2e/docs/parity-map.yaml
test/e2e/scenario-framework-tests/e2e-coverage-report.test.ts
test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts
test/e2e/validation_suites/lib/sandbox_lifecycle.sh

jyaunches added 23 commits May 20, 2026 07:36

docs(spec): apply lifecycle review decisions

635dd2e

test: Add failing tests for Phase 1

2b28811

feat: Implement Phase 1 - lifecycle parity classification

9e809ae

Mark Phase 1 as completed [9e809ae]

f41e05c

test: Add failing tests for Phase 2

66c3e7e

feat: Implement Phase 2 - sandbox lifecycle helpers

1752bb4

Mark Phase 2 as completed [1752bb4]

065c42c

test: Add failing tests for Phase 3

55c2604

feat: Implement Phase 3 - lifecycle suite scripts

c757bff

Mark Phase 3 as completed [c757bff]

6246b96

test: Add failing tests for Phase 4

04cea05

feat: Implement Phase 4 - parity coverage visibility

7f8e1cc

Mark Phase 4 as completed [04cea05]

b88069a

test: Add failing tests for Phase 5

a4d37e7

feat: Implement Phase 5 - integration verification notes

ff7f73b

Mark Phase 5 as completed [7f8e1cc]

aa5ddd8

test: Add failing tests for Phase 6

07558da

feat: Implement Phase 6 - lifecycle suite docs cleanup

7b3ab1b

Mark Phase 6 as completed [ff7f73b]

12c6b3f

fix: Correct lifecycle helper context path

2b65694

fix: Restore lifecycle helper context path

365e80c

test(e2e): validate sandbox lifecycle migration

b0dec96

chore: remove committed spec artifacts

f621825

jyaunches self-assigned this May 20, 2026

github-advanced-security AI found potential problems May 20, 2026

View reviewed changes

Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Fixed

Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Fixed

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into issue-3813-migrate-sa…

58c4fad

…ndbox-lifecycle-coverage

fix(e2e): address sandbox lifecycle shellcheck

ec0c36f

wscurran added E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. fix labels May 20, 2026

test(e2e): remove source-shape parity assertions

53f7222

jyaunches added the v0.0.47 Release target label May 20, 2026

jyaunches added 2 commits May 20, 2026 14:00

fix(e2e): address sandbox lifecycle review feedback

38fc224

fix(e2e): restore sandbox lifecycle executable bit

cc7ff10

github-advanced-security AI found potential problems May 20, 2026

View reviewed changes

Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Fixed

fix(ci): update hadolint asset name

973fd12

cv approved these changes May 20, 2026

View reviewed changes

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread test/e2e/docs/parity-map.yaml Outdated

jyaunches added 2 commits May 20, 2026 14:09

fix(e2e): avoid shellcheck marker warning

36bbcbc

fix(e2e): keep sandbox lifecycle executable

586bb85

github-advanced-security AI found potential problems May 20, 2026

View reviewed changes

Comment thread test/e2e/validation_suites/lib/sandbox_lifecycle.sh Fixed

jyaunches added 4 commits May 20, 2026 14:18

fix(e2e): simplify marker restore assertion

b5f4269

fix(e2e): preserve lifecycle executable mode

3015444

fix(e2e): clarify snapshot leak runner metadata

bed60b9

merge main

d7a0c7f

jyaunches merged commit e122450 into main May 20, 2026
26 checks passed

Conversation

jyaunches commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

PR Review Advisor

Gate status

🔴 Blockers

🟡 Warnings

🔵 Suggestions

Acceptance coverage

Security review

Test / E2E status

✅ What looks good

Review completeness

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wscurran commented May 20, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jyaunches commented May 20, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading