Skip to content

test(e2e): run onboarding assertions in scenario runner#4657

Closed
cv wants to merge 8 commits into
mainfrom
test/e2e-onboarding-assertions-runner
Closed

test(e2e): run onboarding assertions in scenario runner#4657
cv wants to merge 8 commits into
mainfrom
test/e2e-onboarding-assertions-runner

Conversation

@cv

@cv cv commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Runs declared YAML onboarding assertions from the shell scenario runner after setup/onboarding and before expected-state validation. Dry-run mode traces and reports the assertions without executing live assertion scripts.

Related Issue

Refs #3588.

Changes

  • Resolve onboarding_assertions from plan.json and nemoclaw_scenarios/scenarios.yaml inside runtime/run-scenario.sh.
  • Execute positive and negative preflight onboarding assertions at the correct point in the runner flow.
  • Add dry-run tracing and regression expectations for the baseline OpenClaw scenario.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Additional verification run:

  • bash test/e2e-scenario/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --dry-run
  • E2E_CONTEXT_DIR=$(mktemp -d) bash test/e2e-scenario/runtime/run-scenario.sh ubuntu-no-docker-preflight-negative --dry-run
  • npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts test/e2e-scenario/framework-tests/e2e-scenario-resolver.test.ts --silent=false --reporter=default
  • npx prek run --files test/e2e-scenario/runtime/run-scenario.sh test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts

Signed-off-by: Carlos Villela cvillela@nvidia.com

@cv cv self-assigned this Jun 2, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 2, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: be62e5d6-9c14-4aaf-a1ee-495da0b6f681

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/e2e-onboarding-assertions-runner

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw, e2e-scenarios:ubuntu-no-docker-preflight-negative
Optional E2E: e2e-scenarios-all

Dispatch hint: ubuntu-repo-cloud-openclaw,ubuntu-no-docker-preflight-negative

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • e2e-scenarios:ubuntu-repo-cloud-openclaw (medium): Primary positive scenario affected by the resolver, plan schema, run-scenario onboarding assertion execution, and preflight-passed assertion changes. It verifies the happy-path onboarding assertion flow before suites.
  • e2e-scenarios:ubuntu-no-docker-preflight-negative (medium): Negative preflight scenario is explicitly affected by the new onboarding assertion execution path and expected-failure/preflight assertion handling.

Optional E2E

  • e2e-scenarios-all (high): Optional broad confidence because resolver/schema and run-scenario changes are shared by all typed scenarios, but the two required Ubuntu scenarios cover the directly modified positive and negative onboarding assertion paths.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/e2e-scenarios.yaml
  • jobs input: ubuntu-repo-cloud-openclaw,ubuntu-no-docker-preflight-negative

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: Changes touch shared scenario runtime/runner and resolver code under test/e2e-scenario/runtime, including plan resolution, schema, and run-scenario behavior. Per policy, runtime/runner changes require the full scenario fan-out.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts
  • test/e2e-scenario/onboarding_assertions/preflight/00-preflight-passed.sh
  • test/e2e-scenario/runtime/resolver/plan.ts
  • test/e2e-scenario/runtime/resolver/schema.ts
  • test/e2e-scenario/runtime/run-scenario.sh

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 4 prior items resolved, 1 still applies, 1 new item found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Replace or harden the Node-to-shell assertion row handoff (test/e2e-scenario/runtime/run-scenario.sh:323): The prior exploitable row injection through unvalidated stable assertion IDs appears fixed: assertion IDs and stable IDs are now allowlisted in the resolver and revalidated before execution. However, `run_onboarding_assertions` still serializes validated Node data into a tab/newline-delimited string and then shell-parses that string before `bash "${full_path}"`. This is currently mitigated by token validation plus path containment, but it remains a brittle security boundary for future assertion metadata or script-path changes.
    • Recommendation: Prefer a non-delimited handoff or keep validation and execution in the same runtime. If the bash handoff remains, reject control characters in every emitted field, including script paths/realpaths, and add a regression test that a tampered `plan.json` cannot inject additional rows or alter the executed path.
    • Evidence: The embedded Node validator writes rows with `${id}\t${real}\t${stableId}\n`; the shell loop parses them with `while IFS=$'\t' read -r assertion_id full_path stable_id` and then executes `bash "${full_path}"`. Current tests cover control characters in IDs/stable IDs, but the protocol itself remains delimiter-based.
  • Coordinate this runner expansion with overlapping E2E runner deletion work (test/e2e-scenario/runtime/run-scenario.sh:1): This PR expands the bash scenario runner, resolver plan/schema, and framework tests while deterministic drift context reports open PR test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 touching the same five core files and directionally deleting the bash runner, dry-run, and validate-only paths. That overlap creates a high risk that one branch invalidates the other's source-of-truth assumptions for onboarding assertion execution.
    • Recommendation: Coordinate the intended runner source of truth before landing these changes, or rebase/reconcile once the overlapping runner work is resolved so onboarding assertion validation and execution are implemented in the runner path that will remain.
    • Evidence: Open overlap context lists PR test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 with the same framework test, preflight assertion, resolver plan/schema, and `runtime/run-scenario.sh` files; this PR adds `run_onboarding_assertions()` and expands resolver output consumed by that bash runner.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — run_scenario_should_reject_plan_json_onboarding_assertion_steps_that_do_not_match_declared_order. The PR adds good resolver and runner regression coverage for the changed behavior, including negative-path tests for the prior injection issue. Additional runtime-defense tests would further protect the remaining shell handoff and plan-artifact trust boundary.
  • **Runtime validation** — run_scenario_should_not_execute_additional_rows_from_tampered_onboarding_assertion_plan_data. The PR adds good resolver and runner regression coverage for the changed behavior, including negative-path tests for the prior injection issue. Additional runtime-defense tests would further protect the remaining shell handoff and plan-artifact trust boundary.
  • **Runtime validation** — run_scenario_should_reject_onboarding_assertion_script_paths_or_realpaths_with_tab_newline_or_control_characters. The PR adds good resolver and runner regression coverage for the changed behavior, including negative-path tests for the prior injection issue. Additional runtime-defense tests would further protect the remaining shell handoff and plan-artifact trust boundary.
  • **Acceptance clause:** Refs Implement layered E2E scenario model #3588. — add test evidence or identify existing coverage. The deterministic context did not provide trusted issue Implement layered E2E scenario model #3588 body or comments, and `linkedIssues` is empty, so literal issue acceptance clauses could not be extracted or mapped.
Since last review details

Current findings:

  • Replace or harden the Node-to-shell assertion row handoff (test/e2e-scenario/runtime/run-scenario.sh:323): The prior exploitable row injection through unvalidated stable assertion IDs appears fixed: assertion IDs and stable IDs are now allowlisted in the resolver and revalidated before execution. However, `run_onboarding_assertions` still serializes validated Node data into a tab/newline-delimited string and then shell-parses that string before `bash "${full_path}"`. This is currently mitigated by token validation plus path containment, but it remains a brittle security boundary for future assertion metadata or script-path changes.
    • Recommendation: Prefer a non-delimited handoff or keep validation and execution in the same runtime. If the bash handoff remains, reject control characters in every emitted field, including script paths/realpaths, and add a regression test that a tampered `plan.json` cannot inject additional rows or alter the executed path.
    • Evidence: The embedded Node validator writes rows with `${id}\t${real}\t${stableId}\n`; the shell loop parses them with `while IFS=$'\t' read -r assertion_id full_path stable_id` and then executes `bash "${full_path}"`. Current tests cover control characters in IDs/stable IDs, but the protocol itself remains delimiter-based.
  • Coordinate this runner expansion with overlapping E2E runner deletion work (test/e2e-scenario/runtime/run-scenario.sh:1): This PR expands the bash scenario runner, resolver plan/schema, and framework tests while deterministic drift context reports open PR test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 touching the same five core files and directionally deleting the bash runner, dry-run, and validate-only paths. That overlap creates a high risk that one branch invalidates the other's source-of-truth assumptions for onboarding assertion execution.
    • Recommendation: Coordinate the intended runner source of truth before landing these changes, or rebase/reconcile once the overlapping runner work is resolved so onboarding assertion validation and execution are implemented in the runner path that will remain.
    • Evidence: Open overlap context lists PR test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 with the same framework test, preflight assertion, resolver plan/schema, and `runtime/run-scenario.sh` files; this PR adds `run_onboarding_assertions()` and expands resolver output consumed by that bash runner.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality labels Jun 3, 2026
@wscurran

wscurran commented Jun 3, 2026

Copy link
Copy Markdown
Contributor


Related open issues:

@cv cv requested a review from jyaunches June 7, 2026 03:22
Base automatically changed from ci/e2e-scenario-dry-run-summary to main June 7, 2026 21:22
@cv cv added the v0.0.61 Release target label Jun 8, 2026
@cv

cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Maintainer direction update: I think this PR should be closed or superseded rather than merged as-is.

After reviewing #3588, #4380, and this branch, the intended end state should be a single scenario runner, not a permanent hybrid. This PR strengthens the YAML/bash runtime/run-scenario.sh bridge, which is useful as design evidence but no longer aligned as the place to land new durable behavior.

The requirements this branch proves should move into the single-runner implementation before legacy runner pieces are removed:

  • executable onboarding assertions, including positive preflight and no-Docker negative preflight;
  • stable assertion IDs and script/path validation;
  • ordering and failure propagation tests;
  • no plan-artifact-to-shell injection boundary.

I opened #4939 to record this direction in repo docs.

@cv

cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Closing per decisions in #4941

@cv cv closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality v0.0.61 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants