test(e2e-scenario): delete obsolete run.ts framework, switch CI to run-scenario.sh by jyaunches · Pull Request #4379 · NVIDIA/NemoClaw

jyaunches · 2026-05-28T01:02:48Z

Summary

The hybrid scenario suite under test/e2e-scenario/ had two runner entrypoints:

test/e2e-scenario/scenarios/run.ts — OLD: TypeScript DSL + builders
test/e2e-scenario/runtime/run-scenario.sh — NEW: YAML metadata + bash runtime

The new bash runner (via runtime/resolver/*) is self-contained and does not import anything from scenarios/. All recently-developed tests target the new path. The OLD TS framework was kept only to satisfy CI (which still called run.ts) and a handful of framework-tests that exercised the OLD DSL.

This PR deletes the OLD path and migrates CI to the new bash entrypoint.

What's deleted

OLD framework — only consumed by run.ts:

test/e2e-scenario/scenarios/ — entire TS DSL (33 files: run.ts, compiler.ts, registry.ts, builder.ts, manifests.ts, matrix.ts, migration-inventory.ts, types.ts, js-yaml.d.ts, orchestrators/, scenarios/baseline.ts, assertions/, clients/)
test/e2e-scenario/manifests/ — only consumed by scenarios/manifests.ts
test/e2e-scenario/onboarding_assertions/ — dead in NEW path (no helper sources or runs them; YAML declares string IDs but the bash runner never invokes the .sh files)
6 framework-tests that imported scenarios/*:
- e2e-assertion-modules.test.ts
- e2e-manifests.test.ts
- e2e-migration-inventory-lock.test.ts
- e2e-phase-orchestrators.test.ts
- e2e-plan-compiler.test.ts
- e2e-scenario-registry.test.ts

What's updated

.github/workflows/e2e-scenarios.yaml: replaced 3 npx tsx ...run.ts invocations with bash test/e2e-scenario/runtime/run-scenario.sh <id>, looping over comma-separated IDs in both the linux and WSL paths.
tools/e2e-scenarios/workflow-boundary.mts: assert the bash entrypoint instead of run.ts.
test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts: update fixture to the new entrypoint.
tools/e2e-advisor/scenarios.mts (bridge edit): swap listScenarios() (OLD registry) for loadMetadataFromDir() + resolveScenario() (NEW YAML resolver), so the deterministic scenario advisor still compiles after the deletion.

Follow-up

test/e2e-scenario-advisor.test.ts: 2 cases skipped pending #4378.

While doing the bridge edit on scenarios.mts, we discovered that nemoclaw_scenarios/scenarios.yaml's setup_scenarios: is missing user-friendly aliases for 13 layered test_plans (telegram, discord, slack, brave, resume, repair, double-same-provider, double-provider-switch, token-rotation, openai-compatible, hermes-discord, hermes-slack). The OLD TS registry was the only thing that knew those IDs. #4378 (child of epic #3588 Phase 4) tracks adding the aliases. A separate session is also overhauling the deterministic scenario advisor and may obsolete this code path entirely.

Verification

267 framework + advisor tests pass; 2 skipped with #4378 reference.
bash test/e2e-scenario/runtime/run-scenario.sh <id> --plan-only succeeds for all 10 scenarios currently in setup_scenarios:.

Stats

69 files changed, 42 insertions(+), 3210 deletions(-)

Refs: #3588, #4378

Summary by CodeRabbit

Chores
- CI now runs scenarios via a shell-based runner and uploads the full e2e output directory for artifacts.
Tests
- Large portion of legacy end-to-end test suites, scenario manifests, and assertion groups removed or temporarily disabled; workflow boundary validations updated to use the new runner.
Refactor
- Scenario metadata resolution moved to runtime-based discovery.

…n-scenario.sh The hybrid scenario suite had two runner entrypoints: - test/e2e-scenario/scenarios/run.ts (OLD: TS DSL + builders) - test/e2e-scenario/runtime/run-scenario.sh (NEW: YAML metadata + bash) The new bash runner via runtime/resolver/* is self-contained and does not import from scenarios/. All recently-developed tests target the new path. The OLD TS framework was kept only to satisfy CI (which was still calling run.ts) and a handful of framework-tests that exercised the OLD DSL. This commit removes the OLD path and migrates CI. Deleted (OLD framework, only used by run.ts): - test/e2e-scenario/scenarios/ (entire TS DSL, 33 files) - test/e2e-scenario/manifests/ (only consumed by scenarios/manifests.ts) - test/e2e-scenario/onboarding_assertions/ (dead in NEW path) - 6 framework-tests that imported scenarios/*: e2e-assertion-modules, e2e-manifests, e2e-migration-inventory-lock, e2e-phase-orchestrators, e2e-plan-compiler, e2e-scenario-registry Workflow and validator updates: - .github/workflows/e2e-scenarios.yaml: replace 3 npx tsx ...run.ts calls with bash test/e2e-scenario/runtime/run-scenario.sh <id>, looping over comma-separated IDs in linux + WSL paths. - tools/e2e-scenarios/workflow-boundary.mts: assert the bash entrypoint. - test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts: update fixture to the new entrypoint. Bridge edit: - tools/e2e-advisor/scenarios.mts: swap listScenarios() (OLD registry) for loadMetadataFromDir() + resolveScenario() (NEW YAML resolver), so the deterministic scenario advisor still compiles after the deletion. Follow-up: - test/e2e-scenario-advisor.test.ts: 2 cases skipped pending #4378. nemoclaw_scenarios/scenarios.yaml setup_scenarios is missing user-friendly aliases for 13 layered test_plans (telegram, discord, slack, brave, resume, repair, double-*, token-rotation, openai-compatible, hermes-discord, hermes-slack). The OLD TS registry was the only thing that knew those IDs. #4378 (child of epic #3588 Phase 4) tracks adding the aliases. A separate session is overhauling the deterministic scenario advisor, which may obsolete this path entirely. Verification: - 267 framework + advisor tests pass; 2 skipped with #4378 reference. - bash test/e2e-scenario/runtime/run-scenario.sh <id> --plan-only succeeds for all 10 scenarios currently in setup_scenarios. Refs: #3588, #4378

coderabbitai · 2026-05-28T01:03:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0fc17d40-fa82-4d69-aae3-901bb1811596

📥 Commits

Reviewing files that changed from the base of the PR and between 3abc9ef and 464dbac.

📒 Files selected for processing (1)

.github/workflows/e2e-scenarios.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/e2e-scenarios.yaml

📝 Walkthrough

Walkthrough

Migrates e2e scenario execution from a TypeScript registry/runner to a shell script dispatcher, updating GH Actions to call test/e2e-scenario/runtime/run-scenario.sh per scenario, adjusting tools/validators, skipping two advisor tests, and removing legacy assertion/client exports and one manifest.

Changes

E2E Scenario Execution Modernization

Layer / File(s)	Summary
Workflow dispatch to shell runner `.github/workflows/e2e-scenarios.yaml`, `test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts`	Workflow now invokes `bash test/e2e-scenario/runtime/run-scenario.sh` per scenario id (`--plan-only` in resolver, `--dry-run` in run steps), splits comma-separated `SCENARIOS`, sets WSL `E2E_CONTEXT_DIR` to `${{ github.workspace }}/.e2e`, and uploads the full `.e2e/` directory to artifacts. Workflow test updated to expect shell invocation.
Tools integration with runtime resolver `tools/e2e-advisor/scenarios.mts`, `tools/e2e-scenarios/workflow-boundary.mts`	E2E advisor builds catalog by loading metadata and resolving plans (`loadMetadataFromDir` + `resolveScenario`) and skips failing scenarios; workflow-boundary validator updated to require shell runner and `--dry-run`.
Test suite adaptation to new dispatch `test/e2e-scenario-advisor.test.ts`	Two advisor test cases are converted to `it.skip(...)` with comments referencing `#4378` (missing scenario alias resolution).
Deprecated module re-export cleanup `test/e2e-scenario/scenarios/assertions/`, `test/e2e-scenario/scenarios/clients/`, `test/e2e-scenario/manifests/hermes-nvidia.yaml`	Remove re-exports of `validationSuiteGroups`, remove small client observation stubs (`agent`, `provider`), and delete one manifest file as part of trimming the old TS-based scenario framework surface.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

test(e2e): add setup_scenarios aliases for 13 layered test_plans (epic #3588 Phase 4) #4378: This PR references missing scenario alias resolution; reopening or addressing that issue is related to re-enabling skipped advisor tests.

Possibly related PRs

NVIDIA/NemoClaw#4004: Related changes to the scenario-advisor pipeline and advisor tests.
NVIDIA/NemoClaw#4081: Changes to e2e-advisor/scenarios sourcing and mapping logic.
NVIDIA/NemoClaw#4283: Prior PR that modified the typed scenario execution pipeline which this PR replaces.

Suggested labels

E2E, CI/CD, refactor, enhancement: testing

Suggested reviewers

cv
ericksoa

Poem

🐰 I hopped through manifests, tests, and YAML trails,
Swapped TS for shell so each scenario sails,
Split the lists, wrapped each id in a loop,
Artifacts in .e2e — a neat little stoop,
Hope the aliases return and we’ll re-enable the tales.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: deletion of the old TypeScript e2e-scenario framework (run.ts) and migration to the bash-based runner.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/remove-e2e-scenario-run-ts

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-28T01:05:28Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

github-actions · 2026-05-28T01:05:30Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: the reusable single-scenario workflow changed
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

.github/workflows/e2e-scenarios.yaml

coderabbitai

🧹 Nitpick comments (1)

tools/e2e-advisor/scenarios.mts (1)

287-295: ⚡ Quick win

Surface skipped scenario resolution failures instead of swallowing them.

At Line 293, the empty catch makes advisor output silently incomplete when resolution breaks. Please at least record skipped IDs/errors so the result is auditable.

Proposed patch

 function loadScenarios(root: string): Record<string, ScenarioEntry> {
   const meta = loadMetadataFromDir(path.join(root, "test/e2e-scenario"));
   const out: Record<string, ScenarioEntry> = {};
+  const skipped: string[] = [];
   for (const id of Object.keys(meta.scenarios.setup_scenarios)) {
     try {
       const plan = resolveScenario(id, meta);
       out[id] = {
         suites: plan.suites.map((s) => s.id),
         runner_requirements: plan.runner_requirements ?? [],
       };
-    } catch {
-      // Skip scenarios that fail to resolve; they are not advisable targets.
+    } catch (error: unknown) {
+      skipped.push(
+        `${id}: ${error instanceof Error ? error.message : String(error)}`,
+      );
     }
   }
+  if (skipped.length > 0) {
+    console.warn(
+      `e2e-advisor: skipped ${skipped.length} unresolved scenario(s)\n- ${skipped.join("\n- ")}`,
+    );
+  }
   return out;
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/e2e-advisor/scenarios.mts` around lines 287 - 295, The empty catch
currently swallows failures from resolveScenario(id, meta); change it to catch
the error (e.g., catch (err)) and record the failure into the output so skipped
scenarios are auditable — for example, set out[id] to include an error field
(stringified error message/stack) and a flag like skipped: true or add the
id+error to a skipped_scenarios list; update the existing try block that builds
out[id] (using resolveScenario, plan.suites, plan.runner_requirements) so
failures populate out with diagnostic info rather than being ignored.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tools/e2e-advisor/scenarios.mts`:
- Around line 287-295: The empty catch currently swallows failures from
resolveScenario(id, meta); change it to catch the error (e.g., catch (err)) and
record the failure into the output so skipped scenarios are auditable — for
example, set out[id] to include an error field (stringified error message/stack)
and a flag like skipped: true or add the id+error to a skipped_scenarios list;
update the existing try block that builds out[id] (using resolveScenario,
plan.suites, plan.runner_requirements) so failures populate out with diagnostic
info rather than being ignored.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0c87d069-8a34-44e5-b663-3f9fd532a80a

📥 Commits

Reviewing files that changed from the base of the PR and between 1daf081 and 3abc9ef.

📒 Files selected for processing (69)

.github/workflows/e2e-scenarios.yaml
test/e2e-scenario-advisor.test.ts
test/e2e-scenario/framework-tests/e2e-assertion-modules.test.ts
test/e2e-scenario/framework-tests/e2e-manifests.test.ts
test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts
test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
test/e2e-scenario/framework-tests/e2e-plan-compiler.test.ts
test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts
test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts
test/e2e-scenario/manifests/hermes-nvidia-discord.yaml
test/e2e-scenario/manifests/hermes-nvidia-slack.yaml
test/e2e-scenario/manifests/hermes-nvidia.yaml
test/e2e-scenario/manifests/openclaw-nvidia-brave.yaml
test/e2e-scenario/manifests/openclaw-nvidia-brev-launchable.yaml
test/e2e-scenario/manifests/openclaw-nvidia-custom-policies.yaml
test/e2e-scenario/manifests/openclaw-nvidia-discord.yaml
test/e2e-scenario/manifests/openclaw-nvidia-double-provider-switch.yaml
test/e2e-scenario/manifests/openclaw-nvidia-double-same-provider.yaml
test/e2e-scenario/manifests/openclaw-nvidia-gateway-port-conflict.yaml
test/e2e-scenario/manifests/openclaw-nvidia-invalid-key.yaml
test/e2e-scenario/manifests/openclaw-nvidia-macos.yaml
test/e2e-scenario/manifests/openclaw-nvidia-no-docker-negative.yaml
test/e2e-scenario/manifests/openclaw-nvidia-repair.yaml
test/e2e-scenario/manifests/openclaw-nvidia-resume.yaml
test/e2e-scenario/manifests/openclaw-nvidia-slack.yaml
test/e2e-scenario/manifests/openclaw-nvidia-telegram.yaml
test/e2e-scenario/manifests/openclaw-nvidia-token-rotation.yaml
test/e2e-scenario/manifests/openclaw-nvidia-wsl.yaml
test/e2e-scenario/manifests/openclaw-nvidia.yaml
test/e2e-scenario/manifests/openclaw-ollama-gpu.yaml
test/e2e-scenario/manifests/openclaw-openai-compatible.yaml
test/e2e-scenario/onboarding_assertions/base/00-cli-installed.sh
test/e2e-scenario/onboarding_assertions/preflight/00-preflight-expected-failed.sh
test/e2e-scenario/onboarding_assertions/preflight/00-preflight-passed.sh
test/e2e-scenario/scenarios/assertions/diagnostics.ts
test/e2e-scenario/scenarios/assertions/environment.ts
test/e2e-scenario/scenarios/assertions/hermes.ts
test/e2e-scenario/scenarios/assertions/inference.ts
test/e2e-scenario/scenarios/assertions/lifecycle.ts
test/e2e-scenario/scenarios/assertions/messaging.ts
test/e2e-scenario/scenarios/assertions/negative.ts
test/e2e-scenario/scenarios/assertions/onboarding.ts
test/e2e-scenario/scenarios/assertions/platform.ts
test/e2e-scenario/scenarios/assertions/registry.ts
test/e2e-scenario/scenarios/assertions/runtime.ts
test/e2e-scenario/scenarios/assertions/security.ts
test/e2e-scenario/scenarios/builder.ts
test/e2e-scenario/scenarios/clients/agent.ts
test/e2e-scenario/scenarios/clients/gateway.ts
test/e2e-scenario/scenarios/clients/host-cli.ts
test/e2e-scenario/scenarios/clients/provider.ts
test/e2e-scenario/scenarios/clients/sandbox.ts
test/e2e-scenario/scenarios/clients/state.ts
test/e2e-scenario/scenarios/compiler.ts
test/e2e-scenario/scenarios/js-yaml.d.ts
test/e2e-scenario/scenarios/manifests.ts
test/e2e-scenario/scenarios/matrix.ts
test/e2e-scenario/scenarios/migration-inventory.ts
test/e2e-scenario/scenarios/orchestrators/environment.ts
test/e2e-scenario/scenarios/orchestrators/onboarding.ts
test/e2e-scenario/scenarios/orchestrators/phase.ts
test/e2e-scenario/scenarios/orchestrators/runner.ts
test/e2e-scenario/scenarios/orchestrators/runtime.ts
test/e2e-scenario/scenarios/registry.ts
test/e2e-scenario/scenarios/run.ts
test/e2e-scenario/scenarios/scenarios/baseline.ts
test/e2e-scenario/scenarios/types.ts
tools/e2e-advisor/scenarios.mts
tools/e2e-scenarios/workflow-boundary.mts

💤 Files with no reviewable changes (64)

test/e2e-scenario/scenarios/assertions/platform.ts
test/e2e-scenario/manifests/openclaw-ollama-gpu.yaml
test/e2e-scenario/scenarios/assertions/hermes.ts
test/e2e-scenario/manifests/openclaw-nvidia-repair.yaml
test/e2e-scenario/scenarios/clients/host-cli.ts
test/e2e-scenario/manifests/openclaw-nvidia-double-same-provider.yaml
test/e2e-scenario/manifests/openclaw-nvidia-gateway-port-conflict.yaml
test/e2e-scenario/manifests/hermes-nvidia-slack.yaml
test/e2e-scenario/scenarios/builder.ts
test/e2e-scenario/framework-tests/e2e-assertion-modules.test.ts
test/e2e-scenario/manifests/openclaw-openai-compatible.yaml
test/e2e-scenario/scenarios/clients/gateway.ts
test/e2e-scenario/scenarios/orchestrators/environment.ts
test/e2e-scenario/manifests/openclaw-nvidia-telegram.yaml
test/e2e-scenario/scenarios/orchestrators/runner.ts
test/e2e-scenario/scenarios/clients/sandbox.ts
test/e2e-scenario/manifests/hermes-nvidia.yaml
test/e2e-scenario/framework-tests/e2e-plan-compiler.test.ts
test/e2e-scenario/scenarios/orchestrators/runtime.ts
test/e2e-scenario/scenarios/orchestrators/phase.ts
test/e2e-scenario/manifests/openclaw-nvidia-invalid-key.yaml
test/e2e-scenario/scenarios/assertions/messaging.ts
test/e2e-scenario/manifests/openclaw-nvidia.yaml
test/e2e-scenario/manifests/openclaw-nvidia-no-docker-negative.yaml
test/e2e-scenario/scenarios/manifests.ts
test/e2e-scenario/manifests/openclaw-nvidia-brev-launchable.yaml
test/e2e-scenario/scenarios/assertions/negative.ts
test/e2e-scenario/scenarios/clients/agent.ts
test/e2e-scenario/manifests/openclaw-nvidia-token-rotation.yaml
test/e2e-scenario/framework-tests/e2e-manifests.test.ts
test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts
test/e2e-scenario/onboarding_assertions/base/00-cli-installed.sh
test/e2e-scenario/manifests/openclaw-nvidia-macos.yaml
test/e2e-scenario/scenarios/assertions/lifecycle.ts
test/e2e-scenario/scenarios/scenarios/baseline.ts
test/e2e-scenario/scenarios/assertions/environment.ts
test/e2e-scenario/scenarios/clients/provider.ts
test/e2e-scenario/scenarios/assertions/runtime.ts
test/e2e-scenario/scenarios/types.ts
test/e2e-scenario/manifests/openclaw-nvidia-wsl.yaml
test/e2e-scenario/manifests/hermes-nvidia-discord.yaml
test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts
test/e2e-scenario/scenarios/clients/state.ts
test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
test/e2e-scenario/manifests/openclaw-nvidia-brave.yaml
test/e2e-scenario/scenarios/migration-inventory.ts
test/e2e-scenario/scenarios/matrix.ts
test/e2e-scenario/scenarios/assertions/registry.ts
test/e2e-scenario/scenarios/assertions/diagnostics.ts
test/e2e-scenario/manifests/openclaw-nvidia-custom-policies.yaml
test/e2e-scenario/manifests/openclaw-nvidia-double-provider-switch.yaml
test/e2e-scenario/manifests/openclaw-nvidia-resume.yaml
test/e2e-scenario/onboarding_assertions/preflight/00-preflight-passed.sh
test/e2e-scenario/onboarding_assertions/preflight/00-preflight-expected-failed.sh
test/e2e-scenario/scenarios/js-yaml.d.ts
test/e2e-scenario/scenarios/assertions/security.ts
test/e2e-scenario/scenarios/registry.ts
test/e2e-scenario/manifests/openclaw-nvidia-slack.yaml
test/e2e-scenario/scenarios/run.ts
test/e2e-scenario/scenarios/compiler.ts
test/e2e-scenario/scenarios/assertions/onboarding.ts
test/e2e-scenario/scenarios/assertions/inference.ts
test/e2e-scenario/manifests/openclaw-nvidia-discord.yaml
test/e2e-scenario/scenarios/orchestrators/onboarding.ts

github-actions · 2026-05-28T01:13:02Z

PR Review Advisor

Findings: 2 needs attention, 7 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 7 still apply, 0 new items found

Review findings

🛠️ Needs attention

WSL scenario artifacts are written outside the uploaded workspace .e2e directory (.github/workflows/e2e-scenarios.yaml:162): The Linux path now writes run-scenario.sh outputs under `${{ github.workspace }}/.e2e`, but the WSL path overrides `E2E_CONTEXT_DIR` to `${WSL_WORKDIR}` inside the WSL VM. The later summary and upload steps still read the host workspace `.e2e/`, and there is no copy-back from `${WSL_WORKDIR}` to `${WSL_CHECKOUT_DIR}/.e2e`. WSL runs can therefore complete without publishing the `plan.json` and runtime logs reviewers need.
- Recommendation: After the WSL run, copy `${WSL_WORKDIR}` artifacts back into `${WSL_CHECKOUT_DIR}/.e2e` or set WSL `E2E_CONTEXT_DIR` to a path that maps to the checked-out workspace `.e2e`. Add a workflow-boundary or workflow fixture test that covers the WSL artifact path and specifically asserts `plan.json` is available for summary/upload.
- Evidence: Workflow sets job `E2E_CONTEXT_DIR: ${{ github.workspace }}/.e2e`, but the WSL bash block exports `E2E_CONTEXT_DIR="${WSL_WORKDIR}"`; the summary checks `[ -f .e2e/plan.json ]` and upload path is `.e2e/`.
Targeted scenario advisor coverage is skipped while the bridge only enumerates setup_scenarios (test/e2e-scenario-advisor.test.ts:46): The advisor now builds recommendations from YAML metadata, but `loadScenarios()` only iterates `setup_scenarios`. The layered `test_plans` include targeted messaging/security-style plans that are not exposed as setup aliases yet, and the two tests that would catch targeted validation-suite recommendations are skipped. A validation-suite change can therefore lose its targeted scenario recommendation without an active regression test.
- Recommendation: Either add the missing setup aliases in this PR or have the advisor enumerate resolvable `test_plans` directly. Re-enable the skipped targeted-suite tests and make resolver failures visible as diagnostics instead of silently dropping scenarios.
- Evidence: `test/e2e-scenario-advisor.test.ts` marks the targeted validation-suite tests with `it.skip`; `tools/e2e-advisor/scenarios.mts` iterates `Object.keys(meta.scenarios.setup_scenarios)`; `scenarios.yaml` contains layered plans such as `ubuntu-repo-docker__cloud-nvidia-openclaw-telegram` with `messaging-telegram` but no corresponding setup scenario alias for `ubuntu-repo-cloud-openclaw-telegram`.

🔎 Worth checking

Source-of-truth review needed: Skipped targeted scenario advisor tests pending test(e2e): add setup_scenarios aliases for 13 layered test_plans (epic #3588 Phase 4) #4378: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `test/e2e-scenario-advisor.test.ts` skips two targeted recommendation tests; `loadScenarios()` iterates only `meta.scenarios.setup_scenarios`.
Source-of-truth review needed: Advisor loadScenarios() silently skips resolver failures: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `tools/e2e-advisor/scenarios.mts` catches all errors around `resolveScenario(id, meta)` and does not record the skipped id or error.
Source-of-truth review needed: Deleted onboarding assertion scripts with remaining metadata script references: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Deleted `test/e2e-scenario/onboarding_assertions/**` files are still referenced at the bottom of `test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml`.
Source-of-truth review needed: Raw-secret metadata guard after manifest deletion: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Deleted `SECRET_KEY_PATTERN` in `scenarios/manifests.ts`; no equivalent pattern or scan is present in `runtime/resolver/load.ts`.
Advisor silently skips scenarios whose YAML fails to resolve (tools/e2e-advisor/scenarios.mts:282): `loadScenarios()` catches all resolver errors and omits the scenario with no diagnostic. That turns metadata corruption into a missing recommendation instead of a visible advisor problem, which is risky while the YAML catalog is becoming the source of truth.
- Recommendation: Collect skipped scenario ids and resolver error messages in the advisor result/summary, or fail closed for malformed scenario metadata. Add a negative test with a broken scenario reference proving the advisor surfaces the problem.
- Evidence: `loadScenarios()` wraps `resolveScenario(id, meta)` in `try/catch` and the catch block only comments: `Skip scenarios that fail to resolve; they are not advisable targets.`
Metadata still references deleted onboarding assertion scripts (test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml:553): This PR deletes the onboarding assertion shell scripts, but `scenarios.yaml` still declares `onboarding_assertions.*.script` paths pointing to those deleted files. The current runner may treat onboarding assertions as IDs only, but the YAML source of truth still advertises executable scripts and the replacement hygiene test only validates validation-suite scripts.
- Recommendation: Make the metadata match the new runtime contract: remove the stale `script` fields if onboarding assertions are ID-only, or restore the scripts and add a metadata hygiene test that validates every declared `onboarding_assertions.*.script` exists.
- Evidence: Deleted files include `test/e2e-scenario/onboarding_assertions/base/00-cli-installed.sh` and the two preflight scripts; `scenarios.yaml` still contains `script: onboarding_assertions/base/00-cli-installed.sh`, `script: onboarding_assertions/preflight/00-preflight-passed.sh`, and `script: onboarding_assertions/preflight/00-preflight-expected-failed.sh`; `e2e-metadata-final-hygiene.test.ts` only checks `meta.suites.suites[*].steps[*].script`.
Raw-secret metadata guard was deleted without an equivalent YAML guard (test/e2e-scenario/scenarios/manifests.ts:22): The old manifest loader rejected secret-looking literal fields and the old test covered that negative path. This PR deletes that loader and test while moving the active source of truth to YAML metadata, but the current YAML loader only validates shape, expected-failure blocks, and suite step fields. This weakens defense-in-depth against accidental API key/token/password commits in scenario metadata.
- Recommendation: Add an equivalent recursive raw-secret scan to `runtime/resolver/load.ts` or a dedicated metadata hygiene test over `nemoclaw_scenarios/*.yaml` and `validation_suites/suites.yaml`, with a negative fixture such as `apiKey: nvapi-literal-secret` or token/password-looking values outside allowed credential-reference names.
- Evidence: Deleted `manifests.ts` defined `SECRET_KEY_PATTERN` and rejected raw secret-looking values; deleted `e2e-manifests.test.ts` included `test_should_reject_raw_secret_values_in_manifest`; `runtime/resolver/load.ts` lacks an equivalent recursive secret-literal scan.

🌱 Nice ideas

None.

Since last review details

Current findings:

Source-of-truth review needed: Skipped targeted scenario advisor tests pending test(e2e): add setup_scenarios aliases for 13 layered test_plans (epic #3588 Phase 4) #4378: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `test/e2e-scenario-advisor.test.ts` skips two targeted recommendation tests; `loadScenarios()` iterates only `meta.scenarios.setup_scenarios`.
Source-of-truth review needed: Advisor loadScenarios() silently skips resolver failures: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `tools/e2e-advisor/scenarios.mts` catches all errors around `resolveScenario(id, meta)` and does not record the skipped id or error.
Source-of-truth review needed: Deleted onboarding assertion scripts with remaining metadata script references: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Deleted `test/e2e-scenario/onboarding_assertions/**` files are still referenced at the bottom of `test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml`.
Source-of-truth review needed: Raw-secret metadata guard after manifest deletion: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Deleted `SECRET_KEY_PATTERN` in `scenarios/manifests.ts`; no equivalent pattern or scan is present in `runtime/resolver/load.ts`.
WSL scenario artifacts are written outside the uploaded workspace .e2e directory (.github/workflows/e2e-scenarios.yaml:162): The Linux path now writes run-scenario.sh outputs under `${{ github.workspace }}/.e2e`, but the WSL path overrides `E2E_CONTEXT_DIR` to `${WSL_WORKDIR}` inside the WSL VM. The later summary and upload steps still read the host workspace `.e2e/`, and there is no copy-back from `${WSL_WORKDIR}` to `${WSL_CHECKOUT_DIR}/.e2e`. WSL runs can therefore complete without publishing the `plan.json` and runtime logs reviewers need.
- Recommendation: After the WSL run, copy `${WSL_WORKDIR}` artifacts back into `${WSL_CHECKOUT_DIR}/.e2e` or set WSL `E2E_CONTEXT_DIR` to a path that maps to the checked-out workspace `.e2e`. Add a workflow-boundary or workflow fixture test that covers the WSL artifact path and specifically asserts `plan.json` is available for summary/upload.
- Evidence: Workflow sets job `E2E_CONTEXT_DIR: ${{ github.workspace }}/.e2e`, but the WSL bash block exports `E2E_CONTEXT_DIR="${WSL_WORKDIR}"`; the summary checks `[ -f .e2e/plan.json ]` and upload path is `.e2e/`.
Targeted scenario advisor coverage is skipped while the bridge only enumerates setup_scenarios (test/e2e-scenario-advisor.test.ts:46): The advisor now builds recommendations from YAML metadata, but `loadScenarios()` only iterates `setup_scenarios`. The layered `test_plans` include targeted messaging/security-style plans that are not exposed as setup aliases yet, and the two tests that would catch targeted validation-suite recommendations are skipped. A validation-suite change can therefore lose its targeted scenario recommendation without an active regression test.
- Recommendation: Either add the missing setup aliases in this PR or have the advisor enumerate resolvable `test_plans` directly. Re-enable the skipped targeted-suite tests and make resolver failures visible as diagnostics instead of silently dropping scenarios.
- Evidence: `test/e2e-scenario-advisor.test.ts` marks the targeted validation-suite tests with `it.skip`; `tools/e2e-advisor/scenarios.mts` iterates `Object.keys(meta.scenarios.setup_scenarios)`; `scenarios.yaml` contains layered plans such as `ubuntu-repo-docker__cloud-nvidia-openclaw-telegram` with `messaging-telegram` but no corresponding setup scenario alias for `ubuntu-repo-cloud-openclaw-telegram`.
Advisor silently skips scenarios whose YAML fails to resolve (tools/e2e-advisor/scenarios.mts:282): `loadScenarios()` catches all resolver errors and omits the scenario with no diagnostic. That turns metadata corruption into a missing recommendation instead of a visible advisor problem, which is risky while the YAML catalog is becoming the source of truth.
- Recommendation: Collect skipped scenario ids and resolver error messages in the advisor result/summary, or fail closed for malformed scenario metadata. Add a negative test with a broken scenario reference proving the advisor surfaces the problem.
- Evidence: `loadScenarios()` wraps `resolveScenario(id, meta)` in `try/catch` and the catch block only comments: `Skip scenarios that fail to resolve; they are not advisable targets.`
Metadata still references deleted onboarding assertion scripts (test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml:553): This PR deletes the onboarding assertion shell scripts, but `scenarios.yaml` still declares `onboarding_assertions.*.script` paths pointing to those deleted files. The current runner may treat onboarding assertions as IDs only, but the YAML source of truth still advertises executable scripts and the replacement hygiene test only validates validation-suite scripts.
- Recommendation: Make the metadata match the new runtime contract: remove the stale `script` fields if onboarding assertions are ID-only, or restore the scripts and add a metadata hygiene test that validates every declared `onboarding_assertions.*.script` exists.
- Evidence: Deleted files include `test/e2e-scenario/onboarding_assertions/base/00-cli-installed.sh` and the two preflight scripts; `scenarios.yaml` still contains `script: onboarding_assertions/base/00-cli-installed.sh`, `script: onboarding_assertions/preflight/00-preflight-passed.sh`, and `script: onboarding_assertions/preflight/00-preflight-expected-failed.sh`; `e2e-metadata-final-hygiene.test.ts` only checks `meta.suites.suites[*].steps[*].script`.
Raw-secret metadata guard was deleted without an equivalent YAML guard (test/e2e-scenario/scenarios/manifests.ts:22): The old manifest loader rejected secret-looking literal fields and the old test covered that negative path. This PR deletes that loader and test while moving the active source of truth to YAML metadata, but the current YAML loader only validates shape, expected-failure blocks, and suite step fields. This weakens defense-in-depth against accidental API key/token/password commits in scenario metadata.
- Recommendation: Add an equivalent recursive raw-secret scan to `runtime/resolver/load.ts` or a dedicated metadata hygiene test over `nemoclaw_scenarios/*.yaml` and `validation_suites/suites.yaml`, with a negative fixture such as `apiKey: nvapi-literal-secret` or token/password-looking values outside allowed credential-reference names.
- Evidence: Deleted `manifests.ts` defined `SECRET_KEY_PATTERN` and rejected raw secret-looking values; deleted `e2e-manifests.test.ts` included `test_should_reject_raw_secret_values_in_manifest`; `runtime/resolver/load.ts` lacks an equivalent recursive secret-literal scan.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

…ired flag Closes two of the three sub-points from the PR Review Advisor's 'Validate scenario IDs against ROUTES' finding. The remaining sub-point (membership-in-ROUTES allowlist) is deferred to a follow-up that lands after #4379, since ROUTES will move with that PR and the runtime resolve-runner job already rejects unknown ids at dispatch. In tools/e2e-advisor/scenarios.mts: - e2e-scenarios-all.yaml fan-out is the only valid pairing for the synthetic id 'e2e-scenarios-all'. Drop incoherent pairings in either direction so the sticky comment never renders a fan-out command for a single scenario id, or a single-scenario --field scenarios= line carrying the synthetic fan-out id. - Authority for the required boolean is array position: items in required[] are required; items in optional[] are optional. The model's per-item value is ignored so it cannot promote/demote a recommendation against its bucket. In test/e2e-scenario-advisor.test.ts: - Negative test: workflow/id pairing mismatches in both directions are dropped; only the valid (e2e-scenarios-all.yaml, e2e-scenarios-all) pairing survives. - Negative test: model-supplied required boolean cannot override the array position. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

After switching CI from npx tsx run.ts to bash run-scenario.sh in this PR, the workflow's summary and upload steps still referenced the deleted TS runner's artifact names (plan.txt, run-plan.json, *.result.json). With E2E_CONTEXT_DIR overridden to ${{ github.workspace }} (workspace root), plan.json was not even landing under .e2e/, so the upload step warned 'No files were found with the provided path: .e2e/run-plan.json' on the failing scenario run, publishing no useful plan artifact for reviewers. Changes: - Set E2E_CONTEXT_DIR to ${{ github.workspace }}/.e2e so run-scenario.sh writes plan.json/expected-state-report.json/expected-vs-actual.json and install.log/onboard.log/etc. under .e2e/ as the upload glob expects. - Update the plan summary step to read plan.json (the YAML resolver's actual output) instead of the deleted plan.txt. - Trim the upload list to .e2e/ (covers all current runtime artifacts) plus test/e2e/logs/, removing references to deleted-runner files. Refs: #4378, advisor finding from PR #4379 review-advisor run. Signed-off-by: Justine Yaunches <jyaunches@nvidia.com>

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

jyaunches added the v0.0.55 Release target label May 28, 2026

Conversation

jyaunches commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's deleted

What's updated

Follow-up

Verification

Stats

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jyaunches commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading