test(e2e): run onboarding assertions in scenario runner by cv · Pull Request #4657 · NVIDIA/NemoClaw

cv · 2026-06-02T07:35:29Z

Summary

Runs declared YAML onboarding assertions from the shell scenario runner after setup/onboarding and before expected-state validation. Dry-run mode traces and reports the assertions without executing live assertion scripts.

Related Issue

Refs #3588.

Changes

Resolve onboarding_assertions from plan.json and nemoclaw_scenarios/scenarios.yaml inside runtime/run-scenario.sh.
Execute positive and negative preflight onboarding assertions at the correct point in the runner flow.
Add dry-run tracing and regression expectations for the baseline OpenClaw scenario.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Additional verification run:

bash test/e2e-scenario/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --dry-run
E2E_CONTEXT_DIR=$(mktemp -d) bash test/e2e-scenario/runtime/run-scenario.sh ubuntu-no-docker-preflight-negative --dry-run
npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts test/e2e-scenario/framework-tests/e2e-scenario-resolver.test.ts --silent=false --reporter=default
npx prek run --files test/e2e-scenario/runtime/run-scenario.sh test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts

Signed-off-by: Carlos Villela cvillela@nvidia.com

copy-pr-bot · 2026-06-02T07:35:33Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-02T07:35:36Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: be62e5d6-9c14-4aaf-a1ee-495da0b6f681

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/e2e-onboarding-assertions-runner

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-02T07:39:35Z

E2E Advisor Recommendation

Required E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw, e2e-scenarios:ubuntu-no-docker-preflight-negative
Optional E2E: e2e-scenarios-all

Dispatch hint: ubuntu-repo-cloud-openclaw,ubuntu-no-docker-preflight-negative

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

e2e-scenarios:ubuntu-repo-cloud-openclaw (medium): Primary positive scenario affected by the resolver, plan schema, run-scenario onboarding assertion execution, and preflight-passed assertion changes. It verifies the happy-path onboarding assertion flow before suites.
e2e-scenarios:ubuntu-no-docker-preflight-negative (medium): Negative preflight scenario is explicitly affected by the new onboarding assertion execution path and expected-failure/preflight assertion handling.

Optional E2E

e2e-scenarios-all (high): Optional broad confidence because resolver/schema and run-scenario changes are shared by all typed scenarios, but the two required Ubuntu scenarios cover the directly modified positive and negative onboarding assertion paths.

New E2E recommendations

None.

Dispatch hint

Workflow: .github/workflows/e2e-scenarios.yaml
jobs input: ubuntu-repo-cloud-openclaw,ubuntu-no-docker-preflight-negative

github-actions · 2026-06-02T07:39:36Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: Changes touch shared scenario runtime/runner and resolver code under test/e2e-scenario/runtime, including plan resolution, schema, and run-scenario behavior. Per policy, runtime/runner changes require the full scenario fan-out.
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts
test/e2e-scenario/onboarding_assertions/preflight/00-preflight-passed.sh
test/e2e-scenario/runtime/resolver/plan.ts
test/e2e-scenario/runtime/resolver/schema.ts
test/e2e-scenario/runtime/run-scenario.sh

github-actions · 2026-06-02T07:39:50Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 4 prior items resolved, 1 still applies, 1 new item found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Replace or harden the Node-to-shell assertion row handoff (test/e2e-scenario/runtime/run-scenario.sh:323): The prior exploitable row injection through unvalidated stable assertion IDs appears fixed: assertion IDs and stable IDs are now allowlisted in the resolver and revalidated before execution. However, `run_onboarding_assertions` still serializes validated Node data into a tab/newline-delimited string and then shell-parses that string before `bash "${full_path}"`. This is currently mitigated by token validation plus path containment, but it remains a brittle security boundary for future assertion metadata or script-path changes.
- Recommendation: Prefer a non-delimited handoff or keep validation and execution in the same runtime. If the bash handoff remains, reject control characters in every emitted field, including script paths/realpaths, and add a regression test that a tampered `plan.json` cannot inject additional rows or alter the executed path.
- Evidence: The embedded Node validator writes rows with `${id}\t${real}\t${stableId}\n`; the shell loop parses them with `while IFS=$'\t' read -r assertion_id full_path stable_id` and then executes `bash "${full_path}"`. Current tests cover control characters in IDs/stable IDs, but the protocol itself remains delimiter-based.
Coordinate this runner expansion with overlapping E2E runner deletion work (test/e2e-scenario/runtime/run-scenario.sh:1): This PR expands the bash scenario runner, resolver plan/schema, and framework tests while deterministic drift context reports open PR test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 touching the same five core files and directionally deleting the bash runner, dry-run, and validate-only paths. That overlap creates a high risk that one branch invalidates the other's source-of-truth assumptions for onboarding assertion execution.
- Recommendation: Coordinate the intended runner source of truth before landing these changes, or rebase/reconcile once the overlapping runner work is resolved so onboarding assertion validation and execution are implemented in the runner path that will remain.
- Evidence: Open overlap context lists PR test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 with the same framework test, preflight assertion, resolver plan/schema, and `runtime/run-scenario.sh` files; this PR adds `run_onboarding_assertions()` and expands resolver output consumed by that bash runner.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — run_scenario_should_reject_plan_json_onboarding_assertion_steps_that_do_not_match_declared_order. The PR adds good resolver and runner regression coverage for the changed behavior, including negative-path tests for the prior injection issue. Additional runtime-defense tests would further protect the remaining shell handoff and plan-artifact trust boundary.
**Runtime validation** — run_scenario_should_not_execute_additional_rows_from_tampered_onboarding_assertion_plan_data. The PR adds good resolver and runner regression coverage for the changed behavior, including negative-path tests for the prior injection issue. Additional runtime-defense tests would further protect the remaining shell handoff and plan-artifact trust boundary.
**Runtime validation** — run_scenario_should_reject_onboarding_assertion_script_paths_or_realpaths_with_tab_newline_or_control_characters. The PR adds good resolver and runner regression coverage for the changed behavior, including negative-path tests for the prior injection issue. Additional runtime-defense tests would further protect the remaining shell handoff and plan-artifact trust boundary.
**Acceptance clause:** Refs Implement layered E2E scenario model #3588. — add test evidence or identify existing coverage. The deterministic context did not provide trusted issue Implement layered E2E scenario model #3588 body or comments, and `linkedIssues` is empty, so literal issue acceptance clauses could not be extracted or mapped.

Since last review details

Current findings:

Replace or harden the Node-to-shell assertion row handoff (test/e2e-scenario/runtime/run-scenario.sh:323): The prior exploitable row injection through unvalidated stable assertion IDs appears fixed: assertion IDs and stable IDs are now allowlisted in the resolver and revalidated before execution. However, `run_onboarding_assertions` still serializes validated Node data into a tab/newline-delimited string and then shell-parses that string before `bash "${full_path}"`. This is currently mitigated by token validation plus path containment, but it remains a brittle security boundary for future assertion metadata or script-path changes.
- Recommendation: Prefer a non-delimited handoff or keep validation and execution in the same runtime. If the bash handoff remains, reject control characters in every emitted field, including script paths/realpaths, and add a regression test that a tampered `plan.json` cannot inject additional rows or alter the executed path.
- Evidence: The embedded Node validator writes rows with `${id}\t${real}\t${stableId}\n`; the shell loop parses them with `while IFS=$'\t' read -r assertion_id full_path stable_id` and then executes `bash "${full_path}"`. Current tests cover control characters in IDs/stable IDs, but the protocol itself remains delimiter-based.
Coordinate this runner expansion with overlapping E2E runner deletion work (test/e2e-scenario/runtime/run-scenario.sh:1): This PR expands the bash scenario runner, resolver plan/schema, and framework tests while deterministic drift context reports open PR test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 touching the same five core files and directionally deleting the bash runner, dry-run, and validate-only paths. That overlap creates a high risk that one branch invalidates the other's source-of-truth assumptions for onboarding assertion execution.
- Recommendation: Coordinate the intended runner source of truth before landing these changes, or rebase/reconcile once the overlapping runner work is resolved so onboarding assertion validation and execution are implemented in the runner path that will remain.
- Evidence: Open overlap context lists PR test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 with the same framework test, preflight assertion, resolver plan/schema, and `runtime/run-scenario.sh` files; this PR adds `run_onboarding_assertions()` and expands resolver output consumed by that bash runner.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

wscurran · 2026-06-03T18:11:54Z

✨
Related open issues:

#3588 Implement layered E2E scenario model

cv · 2026-06-08T06:28:30Z

Maintainer direction update: I think this PR should be closed or superseded rather than merged as-is.

After reviewing #3588, #4380, and this branch, the intended end state should be a single scenario runner, not a permanent hybrid. This PR strengthens the YAML/bash runtime/run-scenario.sh bridge, which is useful as design evidence but no longer aligned as the place to land new durable behavior.

The requirements this branch proves should move into the single-runner implementation before legacy runner pieces are removed:

executable onboarding assertions, including positive preflight and no-Docker negative preflight;
stable assertion IDs and script/path validation;
ordering and failure propagation tests;
no plan-artifact-to-shell injection boundary.

I opened #4939 to record this direction in repo docs.

cv · 2026-06-08T15:12:51Z

Closing per decisions in #4941

cv added 5 commits June 1, 2026 23:42

test(e2e): add scenario aliases for layered plans

dbc672f

docs(e2e): reconcile scenario migration docs

c66d2a3

test(e2e): guard scenario workflow drift

401c631

ci(e2e): clarify scenario dry-run summary

a93d8cb

test(e2e): run onboarding assertions in scenario runner

30b6d2a

cv self-assigned this Jun 2, 2026

wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality labels Jun 3, 2026

cv requested a review from jyaunches June 7, 2026 03:22

Base automatically changed from ci/e2e-scenario-dry-run-summary to main June 7, 2026 21:22

cv added 3 commits June 7, 2026 14:32

merge(main): resolve PR 4657 conflicts and harden assertions

8b302c7

test(e2e): cover onboarding assertion execution

c49067b

fix(e2e): reject assertion metadata control chars

8b14b93

cv added the v0.0.61 Release target label Jun 8, 2026

This was referenced Jun 8, 2026

Implement layered E2E scenario model #3588

Open

test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380

Merged

Adopt Vitest fixtures as the E2E scenario execution model #4941

Open

cv closed this Jun 8, 2026

Conversation

cv commented Jun 2, 2026

Summary

Related Issue

Changes

Type of Change

Verification

Uh oh!

copy-pr-bot Bot commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

wscurran commented Jun 3, 2026

Uh oh!

cv commented Jun 8, 2026

Uh oh!

cv commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading