Skip to content

test(e2e-scenario): delete obsolete run.ts framework, switch CI to run-scenario.sh#4379

Open
jyaunches wants to merge 2 commits into
mainfrom
chore/remove-e2e-scenario-run-ts
Open

test(e2e-scenario): delete obsolete run.ts framework, switch CI to run-scenario.sh#4379
jyaunches wants to merge 2 commits into
mainfrom
chore/remove-e2e-scenario-run-ts

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches commented May 28, 2026

Summary

The hybrid scenario suite under test/e2e-scenario/ had two runner entrypoints:

  • test/e2e-scenario/scenarios/run.ts — OLD: TypeScript DSL + builders
  • test/e2e-scenario/runtime/run-scenario.sh — NEW: YAML metadata + bash runtime

The new bash runner (via runtime/resolver/*) is self-contained and does not import anything from scenarios/. All recently-developed tests target the new path. The OLD TS framework was kept only to satisfy CI (which still called run.ts) and a handful of framework-tests that exercised the OLD DSL.

This PR deletes the OLD path and migrates CI to the new bash entrypoint.

What's deleted

OLD framework — only consumed by run.ts:

  • test/e2e-scenario/scenarios/ — entire TS DSL (33 files: run.ts, compiler.ts, registry.ts, builder.ts, manifests.ts, matrix.ts, migration-inventory.ts, types.ts, js-yaml.d.ts, orchestrators/, scenarios/baseline.ts, assertions/, clients/)
  • test/e2e-scenario/manifests/ — only consumed by scenarios/manifests.ts
  • test/e2e-scenario/onboarding_assertions/ — dead in NEW path (no helper sources or runs them; YAML declares string IDs but the bash runner never invokes the .sh files)
  • 6 framework-tests that imported scenarios/*:
    • e2e-assertion-modules.test.ts
    • e2e-manifests.test.ts
    • e2e-migration-inventory-lock.test.ts
    • e2e-phase-orchestrators.test.ts
    • e2e-plan-compiler.test.ts
    • e2e-scenario-registry.test.ts

What's updated

  • .github/workflows/e2e-scenarios.yaml: replaced 3 npx tsx ...run.ts invocations with bash test/e2e-scenario/runtime/run-scenario.sh <id>, looping over comma-separated IDs in both the linux and WSL paths.
  • tools/e2e-scenarios/workflow-boundary.mts: assert the bash entrypoint instead of run.ts.
  • test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts: update fixture to the new entrypoint.
  • tools/e2e-advisor/scenarios.mts (bridge edit): swap listScenarios() (OLD registry) for loadMetadataFromDir() + resolveScenario() (NEW YAML resolver), so the deterministic scenario advisor still compiles after the deletion.

Follow-up

test/e2e-scenario-advisor.test.ts: 2 cases skipped pending #4378.

While doing the bridge edit on scenarios.mts, we discovered that nemoclaw_scenarios/scenarios.yaml's setup_scenarios: is missing user-friendly aliases for 13 layered test_plans (telegram, discord, slack, brave, resume, repair, double-same-provider, double-provider-switch, token-rotation, openai-compatible, hermes-discord, hermes-slack). The OLD TS registry was the only thing that knew those IDs. #4378 (child of epic #3588 Phase 4) tracks adding the aliases. A separate session is also overhauling the deterministic scenario advisor and may obsolete this code path entirely.

Verification

  • 267 framework + advisor tests pass; 2 skipped with #4378 reference.
  • bash test/e2e-scenario/runtime/run-scenario.sh <id> --plan-only succeeds for all 10 scenarios currently in setup_scenarios:.

Stats

69 files changed, 42 insertions(+), 3210 deletions(-)

Refs: #3588, #4378

Summary by CodeRabbit

  • Chores
    • CI now runs scenarios via a shell-based runner and uploads the full e2e output directory for artifacts.
  • Tests
    • Large portion of legacy end-to-end test suites, scenario manifests, and assertion groups removed or temporarily disabled; workflow boundary validations updated to use the new runner.
  • Refactor
    • Scenario metadata resolution moved to runtime-based discovery.

Review Change Stack

…n-scenario.sh

The hybrid scenario suite had two runner entrypoints:
  - test/e2e-scenario/scenarios/run.ts        (OLD: TS DSL + builders)
  - test/e2e-scenario/runtime/run-scenario.sh (NEW: YAML metadata + bash)

The new bash runner via runtime/resolver/* is self-contained and does not
import from scenarios/. All recently-developed tests target the new path.
The OLD TS framework was kept only to satisfy CI (which was still calling
run.ts) and a handful of framework-tests that exercised the OLD DSL.

This commit removes the OLD path and migrates CI.

Deleted (OLD framework, only used by run.ts):
  - test/e2e-scenario/scenarios/              (entire TS DSL, 33 files)
  - test/e2e-scenario/manifests/              (only consumed by scenarios/manifests.ts)
  - test/e2e-scenario/onboarding_assertions/  (dead in NEW path)
  - 6 framework-tests that imported scenarios/*:
      e2e-assertion-modules, e2e-manifests, e2e-migration-inventory-lock,
      e2e-phase-orchestrators, e2e-plan-compiler, e2e-scenario-registry

Workflow and validator updates:
  - .github/workflows/e2e-scenarios.yaml: replace 3 npx tsx ...run.ts calls
    with bash test/e2e-scenario/runtime/run-scenario.sh <id>, looping over
    comma-separated IDs in linux + WSL paths.
  - tools/e2e-scenarios/workflow-boundary.mts: assert the bash entrypoint.
  - test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts:
    update fixture to the new entrypoint.

Bridge edit:
  - tools/e2e-advisor/scenarios.mts: swap listScenarios() (OLD registry)
    for loadMetadataFromDir() + resolveScenario() (NEW YAML resolver),
    so the deterministic scenario advisor still compiles after the deletion.

Follow-up:
  - test/e2e-scenario-advisor.test.ts: 2 cases skipped pending #4378.
    nemoclaw_scenarios/scenarios.yaml setup_scenarios is missing user-friendly
    aliases for 13 layered test_plans (telegram, discord, slack, brave,
    resume, repair, double-*, token-rotation, openai-compatible,
    hermes-discord, hermes-slack). The OLD TS registry was the only thing
    that knew those IDs. #4378 (child of epic #3588 Phase 4) tracks adding
    the aliases. A separate session is overhauling the deterministic
    scenario advisor, which may obsolete this path entirely.

Verification:
  - 267 framework + advisor tests pass; 2 skipped with #4378 reference.
  - bash test/e2e-scenario/runtime/run-scenario.sh <id> --plan-only succeeds
    for all 10 scenarios currently in setup_scenarios.

Refs: #3588, #4378
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0fc17d40-fa82-4d69-aae3-901bb1811596

📥 Commits

Reviewing files that changed from the base of the PR and between 3abc9ef and 464dbac.

📒 Files selected for processing (1)
  • .github/workflows/e2e-scenarios.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/e2e-scenarios.yaml

📝 Walkthrough

Walkthrough

Migrates e2e scenario execution from a TypeScript registry/runner to a shell script dispatcher, updating GH Actions to call test/e2e-scenario/runtime/run-scenario.sh per scenario, adjusting tools/validators, skipping two advisor tests, and removing legacy assertion/client exports and one manifest.

Changes

E2E Scenario Execution Modernization

Layer / File(s) Summary
Workflow dispatch to shell runner
.github/workflows/e2e-scenarios.yaml, test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts
Workflow now invokes bash test/e2e-scenario/runtime/run-scenario.sh per scenario id (--plan-only in resolver, --dry-run in run steps), splits comma-separated SCENARIOS, sets WSL E2E_CONTEXT_DIR to ${{ github.workspace }}/.e2e, and uploads the full .e2e/ directory to artifacts. Workflow test updated to expect shell invocation.
Tools integration with runtime resolver
tools/e2e-advisor/scenarios.mts, tools/e2e-scenarios/workflow-boundary.mts
E2E advisor builds catalog by loading metadata and resolving plans (loadMetadataFromDir + resolveScenario) and skips failing scenarios; workflow-boundary validator updated to require shell runner and --dry-run.
Test suite adaptation to new dispatch
test/e2e-scenario-advisor.test.ts
Two advisor test cases are converted to it.skip(...) with comments referencing #4378 (missing scenario alias resolution).
Deprecated module re-export cleanup
test/e2e-scenario/scenarios/assertions/*, test/e2e-scenario/scenarios/clients/*, test/e2e-scenario/manifests/hermes-nvidia.yaml
Remove re-exports of validationSuiteGroups, remove small client observation stubs (agent, provider), and delete one manifest file as part of trimming the old TS-based scenario framework surface.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Suggested labels

E2E, CI/CD, refactor, enhancement: testing

Suggested reviewers

  • cv
  • ericksoa

Poem

🐰 I hopped through manifests, tests, and YAML trails,
Swapped TS for shell so each scenario sails,
Split the lists, wrapped each id in a loop,
Artifacts in .e2e — a neat little stoop,
Hope the aliases return and we’ll re-enable the tales.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: deletion of the old TypeScript e2e-scenario framework (run.ts) and migration to the bash-based runner.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/remove-e2e-scenario-run-ts

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: the reusable single-scenario workflow changed
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • .github/workflows/e2e-scenarios.yaml

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tools/e2e-advisor/scenarios.mts (1)

287-295: ⚡ Quick win

Surface skipped scenario resolution failures instead of swallowing them.

At Line 293, the empty catch makes advisor output silently incomplete when resolution breaks. Please at least record skipped IDs/errors so the result is auditable.

Proposed patch
 function loadScenarios(root: string): Record<string, ScenarioEntry> {
   const meta = loadMetadataFromDir(path.join(root, "test/e2e-scenario"));
   const out: Record<string, ScenarioEntry> = {};
+  const skipped: string[] = [];
   for (const id of Object.keys(meta.scenarios.setup_scenarios)) {
     try {
       const plan = resolveScenario(id, meta);
       out[id] = {
         suites: plan.suites.map((s) => s.id),
         runner_requirements: plan.runner_requirements ?? [],
       };
-    } catch {
-      // Skip scenarios that fail to resolve; they are not advisable targets.
+    } catch (error: unknown) {
+      skipped.push(
+        `${id}: ${error instanceof Error ? error.message : String(error)}`,
+      );
     }
   }
+  if (skipped.length > 0) {
+    console.warn(
+      `e2e-advisor: skipped ${skipped.length} unresolved scenario(s)\n- ${skipped.join("\n- ")}`,
+    );
+  }
   return out;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/e2e-advisor/scenarios.mts` around lines 287 - 295, The empty catch
currently swallows failures from resolveScenario(id, meta); change it to catch
the error (e.g., catch (err)) and record the failure into the output so skipped
scenarios are auditable — for example, set out[id] to include an error field
(stringified error message/stack) and a flag like skipped: true or add the
id+error to a skipped_scenarios list; update the existing try block that builds
out[id] (using resolveScenario, plan.suites, plan.runner_requirements) so
failures populate out with diagnostic info rather than being ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tools/e2e-advisor/scenarios.mts`:
- Around line 287-295: The empty catch currently swallows failures from
resolveScenario(id, meta); change it to catch the error (e.g., catch (err)) and
record the failure into the output so skipped scenarios are auditable — for
example, set out[id] to include an error field (stringified error message/stack)
and a flag like skipped: true or add the id+error to a skipped_scenarios list;
update the existing try block that builds out[id] (using resolveScenario,
plan.suites, plan.runner_requirements) so failures populate out with diagnostic
info rather than being ignored.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0c87d069-8a34-44e5-b663-3f9fd532a80a

📥 Commits

Reviewing files that changed from the base of the PR and between 1daf081 and 3abc9ef.

📒 Files selected for processing (69)
  • .github/workflows/e2e-scenarios.yaml
  • test/e2e-scenario-advisor.test.ts
  • test/e2e-scenario/framework-tests/e2e-assertion-modules.test.ts
  • test/e2e-scenario/framework-tests/e2e-manifests.test.ts
  • test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
  • test/e2e-scenario/framework-tests/e2e-plan-compiler.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts
  • test/e2e-scenario/manifests/hermes-nvidia-discord.yaml
  • test/e2e-scenario/manifests/hermes-nvidia-slack.yaml
  • test/e2e-scenario/manifests/hermes-nvidia.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-brave.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-brev-launchable.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-custom-policies.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-discord.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-double-provider-switch.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-double-same-provider.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-gateway-port-conflict.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-invalid-key.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-macos.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-no-docker-negative.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-repair.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-resume.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-slack.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-telegram.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-token-rotation.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-wsl.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia.yaml
  • test/e2e-scenario/manifests/openclaw-ollama-gpu.yaml
  • test/e2e-scenario/manifests/openclaw-openai-compatible.yaml
  • test/e2e-scenario/onboarding_assertions/base/00-cli-installed.sh
  • test/e2e-scenario/onboarding_assertions/preflight/00-preflight-expected-failed.sh
  • test/e2e-scenario/onboarding_assertions/preflight/00-preflight-passed.sh
  • test/e2e-scenario/scenarios/assertions/diagnostics.ts
  • test/e2e-scenario/scenarios/assertions/environment.ts
  • test/e2e-scenario/scenarios/assertions/hermes.ts
  • test/e2e-scenario/scenarios/assertions/inference.ts
  • test/e2e-scenario/scenarios/assertions/lifecycle.ts
  • test/e2e-scenario/scenarios/assertions/messaging.ts
  • test/e2e-scenario/scenarios/assertions/negative.ts
  • test/e2e-scenario/scenarios/assertions/onboarding.ts
  • test/e2e-scenario/scenarios/assertions/platform.ts
  • test/e2e-scenario/scenarios/assertions/registry.ts
  • test/e2e-scenario/scenarios/assertions/runtime.ts
  • test/e2e-scenario/scenarios/assertions/security.ts
  • test/e2e-scenario/scenarios/builder.ts
  • test/e2e-scenario/scenarios/clients/agent.ts
  • test/e2e-scenario/scenarios/clients/gateway.ts
  • test/e2e-scenario/scenarios/clients/host-cli.ts
  • test/e2e-scenario/scenarios/clients/provider.ts
  • test/e2e-scenario/scenarios/clients/sandbox.ts
  • test/e2e-scenario/scenarios/clients/state.ts
  • test/e2e-scenario/scenarios/compiler.ts
  • test/e2e-scenario/scenarios/js-yaml.d.ts
  • test/e2e-scenario/scenarios/manifests.ts
  • test/e2e-scenario/scenarios/matrix.ts
  • test/e2e-scenario/scenarios/migration-inventory.ts
  • test/e2e-scenario/scenarios/orchestrators/environment.ts
  • test/e2e-scenario/scenarios/orchestrators/onboarding.ts
  • test/e2e-scenario/scenarios/orchestrators/phase.ts
  • test/e2e-scenario/scenarios/orchestrators/runner.ts
  • test/e2e-scenario/scenarios/orchestrators/runtime.ts
  • test/e2e-scenario/scenarios/registry.ts
  • test/e2e-scenario/scenarios/run.ts
  • test/e2e-scenario/scenarios/scenarios/baseline.ts
  • test/e2e-scenario/scenarios/types.ts
  • tools/e2e-advisor/scenarios.mts
  • tools/e2e-scenarios/workflow-boundary.mts
💤 Files with no reviewable changes (64)
  • test/e2e-scenario/scenarios/assertions/platform.ts
  • test/e2e-scenario/manifests/openclaw-ollama-gpu.yaml
  • test/e2e-scenario/scenarios/assertions/hermes.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-repair.yaml
  • test/e2e-scenario/scenarios/clients/host-cli.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-double-same-provider.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-gateway-port-conflict.yaml
  • test/e2e-scenario/manifests/hermes-nvidia-slack.yaml
  • test/e2e-scenario/scenarios/builder.ts
  • test/e2e-scenario/framework-tests/e2e-assertion-modules.test.ts
  • test/e2e-scenario/manifests/openclaw-openai-compatible.yaml
  • test/e2e-scenario/scenarios/clients/gateway.ts
  • test/e2e-scenario/scenarios/orchestrators/environment.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-telegram.yaml
  • test/e2e-scenario/scenarios/orchestrators/runner.ts
  • test/e2e-scenario/scenarios/clients/sandbox.ts
  • test/e2e-scenario/manifests/hermes-nvidia.yaml
  • test/e2e-scenario/framework-tests/e2e-plan-compiler.test.ts
  • test/e2e-scenario/scenarios/orchestrators/runtime.ts
  • test/e2e-scenario/scenarios/orchestrators/phase.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-invalid-key.yaml
  • test/e2e-scenario/scenarios/assertions/messaging.ts
  • test/e2e-scenario/manifests/openclaw-nvidia.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-no-docker-negative.yaml
  • test/e2e-scenario/scenarios/manifests.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-brev-launchable.yaml
  • test/e2e-scenario/scenarios/assertions/negative.ts
  • test/e2e-scenario/scenarios/clients/agent.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-token-rotation.yaml
  • test/e2e-scenario/framework-tests/e2e-manifests.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts
  • test/e2e-scenario/onboarding_assertions/base/00-cli-installed.sh
  • test/e2e-scenario/manifests/openclaw-nvidia-macos.yaml
  • test/e2e-scenario/scenarios/assertions/lifecycle.ts
  • test/e2e-scenario/scenarios/scenarios/baseline.ts
  • test/e2e-scenario/scenarios/assertions/environment.ts
  • test/e2e-scenario/scenarios/clients/provider.ts
  • test/e2e-scenario/scenarios/assertions/runtime.ts
  • test/e2e-scenario/scenarios/types.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-wsl.yaml
  • test/e2e-scenario/manifests/hermes-nvidia-discord.yaml
  • test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts
  • test/e2e-scenario/scenarios/clients/state.ts
  • test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-brave.yaml
  • test/e2e-scenario/scenarios/migration-inventory.ts
  • test/e2e-scenario/scenarios/matrix.ts
  • test/e2e-scenario/scenarios/assertions/registry.ts
  • test/e2e-scenario/scenarios/assertions/diagnostics.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-custom-policies.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-double-provider-switch.yaml
  • test/e2e-scenario/manifests/openclaw-nvidia-resume.yaml
  • test/e2e-scenario/onboarding_assertions/preflight/00-preflight-passed.sh
  • test/e2e-scenario/onboarding_assertions/preflight/00-preflight-expected-failed.sh
  • test/e2e-scenario/scenarios/js-yaml.d.ts
  • test/e2e-scenario/scenarios/assertions/security.ts
  • test/e2e-scenario/scenarios/registry.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-slack.yaml
  • test/e2e-scenario/scenarios/run.ts
  • test/e2e-scenario/scenarios/compiler.ts
  • test/e2e-scenario/scenarios/assertions/onboarding.ts
  • test/e2e-scenario/scenarios/assertions/inference.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-discord.yaml
  • test/e2e-scenario/scenarios/orchestrators/onboarding.ts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

PR Review Advisor

Findings: 2 needs attention, 7 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 7 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • WSL scenario artifacts are written outside the uploaded workspace .e2e directory (.github/workflows/e2e-scenarios.yaml:162): The Linux path now writes run-scenario.sh outputs under `${{ github.workspace }}/.e2e`, but the WSL path overrides `E2E_CONTEXT_DIR` to `${WSL_WORKDIR}` inside the WSL VM. The later summary and upload steps still read the host workspace `.e2e/`, and there is no copy-back from `${WSL_WORKDIR}` to `${WSL_CHECKOUT_DIR}/.e2e`. WSL runs can therefore complete without publishing the `plan.json` and runtime logs reviewers need.
    • Recommendation: After the WSL run, copy `${WSL_WORKDIR}` artifacts back into `${WSL_CHECKOUT_DIR}/.e2e` or set WSL `E2E_CONTEXT_DIR` to a path that maps to the checked-out workspace `.e2e`. Add a workflow-boundary or workflow fixture test that covers the WSL artifact path and specifically asserts `plan.json` is available for summary/upload.
    • Evidence: Workflow sets job `E2E_CONTEXT_DIR: ${{ github.workspace }}/.e2e`, but the WSL bash block exports `E2E_CONTEXT_DIR="${WSL_WORKDIR}"`; the summary checks `[ -f .e2e/plan.json ]` and upload path is `.e2e/`.
  • Targeted scenario advisor coverage is skipped while the bridge only enumerates setup_scenarios (test/e2e-scenario-advisor.test.ts:46): The advisor now builds recommendations from YAML metadata, but `loadScenarios()` only iterates `setup_scenarios`. The layered `test_plans` include targeted messaging/security-style plans that are not exposed as setup aliases yet, and the two tests that would catch targeted validation-suite recommendations are skipped. A validation-suite change can therefore lose its targeted scenario recommendation without an active regression test.
    • Recommendation: Either add the missing setup aliases in this PR or have the advisor enumerate resolvable `test_plans` directly. Re-enable the skipped targeted-suite tests and make resolver failures visible as diagnostics instead of silently dropping scenarios.
    • Evidence: `test/e2e-scenario-advisor.test.ts` marks the targeted validation-suite tests with `it.skip`; `tools/e2e-advisor/scenarios.mts` iterates `Object.keys(meta.scenarios.setup_scenarios)`; `scenarios.yaml` contains layered plans such as `ubuntu-repo-docker__cloud-nvidia-openclaw-telegram` with `messaging-telegram` but no corresponding setup scenario alias for `ubuntu-repo-cloud-openclaw-telegram`.

🔎 Worth checking

  • Source-of-truth review needed: Skipped targeted scenario advisor tests pending test(e2e): add setup_scenarios aliases for 13 layered test_plans (epic #3588 Phase 4) #4378: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `test/e2e-scenario-advisor.test.ts` skips two targeted recommendation tests; `loadScenarios()` iterates only `meta.scenarios.setup_scenarios`.
  • Source-of-truth review needed: Advisor loadScenarios() silently skips resolver failures: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `tools/e2e-advisor/scenarios.mts` catches all errors around `resolveScenario(id, meta)` and does not record the skipped id or error.
  • Source-of-truth review needed: Deleted onboarding assertion scripts with remaining metadata script references: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Deleted `test/e2e-scenario/onboarding_assertions/**` files are still referenced at the bottom of `test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml`.
  • Source-of-truth review needed: Raw-secret metadata guard after manifest deletion: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Deleted `SECRET_KEY_PATTERN` in `scenarios/manifests.ts`; no equivalent pattern or scan is present in `runtime/resolver/load.ts`.
  • Advisor silently skips scenarios whose YAML fails to resolve (tools/e2e-advisor/scenarios.mts:282): `loadScenarios()` catches all resolver errors and omits the scenario with no diagnostic. That turns metadata corruption into a missing recommendation instead of a visible advisor problem, which is risky while the YAML catalog is becoming the source of truth.
    • Recommendation: Collect skipped scenario ids and resolver error messages in the advisor result/summary, or fail closed for malformed scenario metadata. Add a negative test with a broken scenario reference proving the advisor surfaces the problem.
    • Evidence: `loadScenarios()` wraps `resolveScenario(id, meta)` in `try/catch` and the catch block only comments: `Skip scenarios that fail to resolve; they are not advisable targets.`
  • Metadata still references deleted onboarding assertion scripts (test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml:553): This PR deletes the onboarding assertion shell scripts, but `scenarios.yaml` still declares `onboarding_assertions.*.script` paths pointing to those deleted files. The current runner may treat onboarding assertions as IDs only, but the YAML source of truth still advertises executable scripts and the replacement hygiene test only validates validation-suite scripts.
    • Recommendation: Make the metadata match the new runtime contract: remove the stale `script` fields if onboarding assertions are ID-only, or restore the scripts and add a metadata hygiene test that validates every declared `onboarding_assertions.*.script` exists.
    • Evidence: Deleted files include `test/e2e-scenario/onboarding_assertions/base/00-cli-installed.sh` and the two preflight scripts; `scenarios.yaml` still contains `script: onboarding_assertions/base/00-cli-installed.sh`, `script: onboarding_assertions/preflight/00-preflight-passed.sh`, and `script: onboarding_assertions/preflight/00-preflight-expected-failed.sh`; `e2e-metadata-final-hygiene.test.ts` only checks `meta.suites.suites[*].steps[*].script`.
  • Raw-secret metadata guard was deleted without an equivalent YAML guard (test/e2e-scenario/scenarios/manifests.ts:22): The old manifest loader rejected secret-looking literal fields and the old test covered that negative path. This PR deletes that loader and test while moving the active source of truth to YAML metadata, but the current YAML loader only validates shape, expected-failure blocks, and suite step fields. This weakens defense-in-depth against accidental API key/token/password commits in scenario metadata.
    • Recommendation: Add an equivalent recursive raw-secret scan to `runtime/resolver/load.ts` or a dedicated metadata hygiene test over `nemoclaw_scenarios/*.yaml` and `validation_suites/suites.yaml`, with a negative fixture such as `apiKey: nvapi-literal-secret` or token/password-looking values outside allowed credential-reference names.
    • Evidence: Deleted `manifests.ts` defined `SECRET_KEY_PATTERN` and rejected raw secret-looking values; deleted `e2e-manifests.test.ts` included `test_should_reject_raw_secret_values_in_manifest`; `runtime/resolver/load.ts` lacks an equivalent recursive secret-literal scan.

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Source-of-truth review needed: Skipped targeted scenario advisor tests pending test(e2e): add setup_scenarios aliases for 13 layered test_plans (epic #3588 Phase 4) #4378: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `test/e2e-scenario-advisor.test.ts` skips two targeted recommendation tests; `loadScenarios()` iterates only `meta.scenarios.setup_scenarios`.
  • Source-of-truth review needed: Advisor loadScenarios() silently skips resolver failures: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `tools/e2e-advisor/scenarios.mts` catches all errors around `resolveScenario(id, meta)` and does not record the skipped id or error.
  • Source-of-truth review needed: Deleted onboarding assertion scripts with remaining metadata script references: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Deleted `test/e2e-scenario/onboarding_assertions/**` files are still referenced at the bottom of `test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml`.
  • Source-of-truth review needed: Raw-secret metadata guard after manifest deletion: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Deleted `SECRET_KEY_PATTERN` in `scenarios/manifests.ts`; no equivalent pattern or scan is present in `runtime/resolver/load.ts`.
  • WSL scenario artifacts are written outside the uploaded workspace .e2e directory (.github/workflows/e2e-scenarios.yaml:162): The Linux path now writes run-scenario.sh outputs under `${{ github.workspace }}/.e2e`, but the WSL path overrides `E2E_CONTEXT_DIR` to `${WSL_WORKDIR}` inside the WSL VM. The later summary and upload steps still read the host workspace `.e2e/`, and there is no copy-back from `${WSL_WORKDIR}` to `${WSL_CHECKOUT_DIR}/.e2e`. WSL runs can therefore complete without publishing the `plan.json` and runtime logs reviewers need.
    • Recommendation: After the WSL run, copy `${WSL_WORKDIR}` artifacts back into `${WSL_CHECKOUT_DIR}/.e2e` or set WSL `E2E_CONTEXT_DIR` to a path that maps to the checked-out workspace `.e2e`. Add a workflow-boundary or workflow fixture test that covers the WSL artifact path and specifically asserts `plan.json` is available for summary/upload.
    • Evidence: Workflow sets job `E2E_CONTEXT_DIR: ${{ github.workspace }}/.e2e`, but the WSL bash block exports `E2E_CONTEXT_DIR="${WSL_WORKDIR}"`; the summary checks `[ -f .e2e/plan.json ]` and upload path is `.e2e/`.
  • Targeted scenario advisor coverage is skipped while the bridge only enumerates setup_scenarios (test/e2e-scenario-advisor.test.ts:46): The advisor now builds recommendations from YAML metadata, but `loadScenarios()` only iterates `setup_scenarios`. The layered `test_plans` include targeted messaging/security-style plans that are not exposed as setup aliases yet, and the two tests that would catch targeted validation-suite recommendations are skipped. A validation-suite change can therefore lose its targeted scenario recommendation without an active regression test.
    • Recommendation: Either add the missing setup aliases in this PR or have the advisor enumerate resolvable `test_plans` directly. Re-enable the skipped targeted-suite tests and make resolver failures visible as diagnostics instead of silently dropping scenarios.
    • Evidence: `test/e2e-scenario-advisor.test.ts` marks the targeted validation-suite tests with `it.skip`; `tools/e2e-advisor/scenarios.mts` iterates `Object.keys(meta.scenarios.setup_scenarios)`; `scenarios.yaml` contains layered plans such as `ubuntu-repo-docker__cloud-nvidia-openclaw-telegram` with `messaging-telegram` but no corresponding setup scenario alias for `ubuntu-repo-cloud-openclaw-telegram`.
  • Advisor silently skips scenarios whose YAML fails to resolve (tools/e2e-advisor/scenarios.mts:282): `loadScenarios()` catches all resolver errors and omits the scenario with no diagnostic. That turns metadata corruption into a missing recommendation instead of a visible advisor problem, which is risky while the YAML catalog is becoming the source of truth.
    • Recommendation: Collect skipped scenario ids and resolver error messages in the advisor result/summary, or fail closed for malformed scenario metadata. Add a negative test with a broken scenario reference proving the advisor surfaces the problem.
    • Evidence: `loadScenarios()` wraps `resolveScenario(id, meta)` in `try/catch` and the catch block only comments: `Skip scenarios that fail to resolve; they are not advisable targets.`
  • Metadata still references deleted onboarding assertion scripts (test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml:553): This PR deletes the onboarding assertion shell scripts, but `scenarios.yaml` still declares `onboarding_assertions.*.script` paths pointing to those deleted files. The current runner may treat onboarding assertions as IDs only, but the YAML source of truth still advertises executable scripts and the replacement hygiene test only validates validation-suite scripts.
    • Recommendation: Make the metadata match the new runtime contract: remove the stale `script` fields if onboarding assertions are ID-only, or restore the scripts and add a metadata hygiene test that validates every declared `onboarding_assertions.*.script` exists.
    • Evidence: Deleted files include `test/e2e-scenario/onboarding_assertions/base/00-cli-installed.sh` and the two preflight scripts; `scenarios.yaml` still contains `script: onboarding_assertions/base/00-cli-installed.sh`, `script: onboarding_assertions/preflight/00-preflight-passed.sh`, and `script: onboarding_assertions/preflight/00-preflight-expected-failed.sh`; `e2e-metadata-final-hygiene.test.ts` only checks `meta.suites.suites[*].steps[*].script`.
  • Raw-secret metadata guard was deleted without an equivalent YAML guard (test/e2e-scenario/scenarios/manifests.ts:22): The old manifest loader rejected secret-looking literal fields and the old test covered that negative path. This PR deletes that loader and test while moving the active source of truth to YAML metadata, but the current YAML loader only validates shape, expected-failure blocks, and suite step fields. This weakens defense-in-depth against accidental API key/token/password commits in scenario metadata.
    • Recommendation: Add an equivalent recursive raw-secret scan to `runtime/resolver/load.ts` or a dedicated metadata hygiene test over `nemoclaw_scenarios/*.yaml` and `validation_suites/suites.yaml`, with a negative fixture such as `apiKey: nvapi-literal-secret` or token/password-looking values outside allowed credential-reference names.
    • Evidence: Deleted `manifests.ts` defined `SECRET_KEY_PATTERN` and rejected raw secret-looking values; deleted `e2e-manifests.test.ts` included `test_should_reject_raw_secret_values_in_manifest`; `runtime/resolver/load.ts` lacks an equivalent recursive secret-literal scan.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@jyaunches jyaunches added the v0.0.55 Release target label May 28, 2026
jyaunches added a commit that referenced this pull request May 28, 2026
…ired flag

Closes two of the three sub-points from the PR Review Advisor's
'Validate scenario IDs against ROUTES' finding. The remaining
sub-point (membership-in-ROUTES allowlist) is deferred to a follow-up
that lands after #4379, since ROUTES will move with that PR and the
runtime resolve-runner job already rejects unknown ids at dispatch.

In tools/e2e-advisor/scenarios.mts:
- e2e-scenarios-all.yaml fan-out is the only valid pairing for the
  synthetic id 'e2e-scenarios-all'. Drop incoherent pairings in either
  direction so the sticky comment never renders a fan-out command for
  a single scenario id, or a single-scenario --field scenarios= line
  carrying the synthetic fan-out id.
- Authority for the required boolean is array position: items in
  required[] are required; items in optional[] are optional. The
  model's per-item value is ignored so it cannot promote/demote a
  recommendation against its bucket.

In test/e2e-scenario-advisor.test.ts:
- Negative test: workflow/id pairing mismatches in both directions
  are dropped; only the valid (e2e-scenarios-all.yaml,
  e2e-scenarios-all) pairing survives.
- Negative test: model-supplied required boolean cannot override the
  array position.

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
After switching CI from npx tsx run.ts to bash run-scenario.sh in this PR,
the workflow's summary and upload steps still referenced the deleted TS
runner's artifact names (plan.txt, run-plan.json, *.result.json). With
E2E_CONTEXT_DIR overridden to ${{ github.workspace }} (workspace root),
plan.json was not even landing under .e2e/, so the upload step warned
'No files were found with the provided path: .e2e/run-plan.json' on the
failing scenario run, publishing no useful plan artifact for reviewers.

Changes:
  - Set E2E_CONTEXT_DIR to ${{ github.workspace }}/.e2e so run-scenario.sh
    writes plan.json/expected-state-report.json/expected-vs-actual.json
    and install.log/onboard.log/etc. under .e2e/ as the upload glob expects.
  - Update the plan summary step to read plan.json (the YAML resolver's
    actual output) instead of the deleted plan.txt.
  - Trim the upload list to .e2e/ (covers all current runtime artifacts)
    plus test/e2e/logs/, removing references to deleted-runner files.

Refs: #4378, advisor finding from PR #4379 review-advisor run.
Signed-off-by: Justine Yaunches <jyaunches@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.55 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant