feat(e2e): generate scenario fan-out matrix from typed registry by jyaunches · Pull Request #4359 · NVIDIA/NemoClaw

jyaunches · 2026-05-27T22:10:44Z

Why

The e2e-scenarios-all.yaml fan-out has 8 hand-wired jobs, but the typed scenario registry in baseline.ts already defines 22 scenarios. The other 14 are defined-but-not-dispatched, and every new scenario today requires a YAML edit that's easy to forget. The hand-wired job names also no longer reflect what each scenario actually tests (manifest, suites, expectedState).

This PR makes the fan-out generated from the existing typed registry so the workflow can never drift from baseline.ts again.

What changed

`--emit-matrix` flag on `run.ts`

Walks listScenarios() (the existing registry — no new registry created) and prints a single-line JSON array of GHA matrix include entries:

[
  { "id": "ubuntu-repo-cloud-openclaw",
    "runner": "ubuntu-latest",
    "label": "ubuntu-local · ubuntu-repo-cloud-openclaw · smoke+inference+credentials",
    "platform": "ubuntu-local",
    "suites": ["smoke", "inference", "credentials"] },
  ...
]

`runner-routing.ts` helper

Derives the runs-on label from ScenarioEnvironment.platform, with a runs-on:<label> override path via runnerRequirements. Throws on unknown platforms so missing mappings fail loudly during matrix generation rather than silently falling back to ubuntu-latest (which used to mask routing bugs in the legacy bash ROUTES map).

`e2e-scenarios-all.yaml` refactor

Two-job shape:

generate-matrix — checks out, installs deps, runs --emit-matrix, and writes the JSON to $GITHUB_OUTPUT. Also renders a Markdown table of all matrix entries in the step summary so you can see the full plan at a glance.
run-scenario — uses strategy.matrix.include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }} and calls the existing e2e-scenarios.yaml reusable workflow per scenario. Tile names use ${{ matrix.label }} so the sidebar reflects what's actually being tested.

Adding a scenario from now on

Add the entry to canonicalScenarioInputs in baseline.ts.
That's it. Next workflow run automatically picks it up as a new tile.

What didn't change

e2e-scenarios.yaml (the single-scenario reusable workflow) is untouched. Its resolve-runner job continues to be the authoritative runner-selection path during execution; the runner field in the matrix is informational and used for the sidebar label.
The bash ROUTES map in e2e-scenarios.yaml is left in place for now to keep this PR focused. It becomes effectively dead code once this lands and can be removed in a follow-up.
No new registry. Everything reads from test/e2e-scenario/scenarios/registry.ts.

Verification

Unit tests in e2e-scenario-matrix.test.ts cover:

matrix entry per registered scenario
runner resolves for every scenario
platform-default routing (ubuntu/macos/wsl/gpu)
explicit runs-on:<label> override
loud failure on unknown platform
single-line JSON output suitable for $GITHUB_OUTPUT

Local run:

✓ test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts (6 tests)
✓ test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts (6 tests)
✓ test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts (2 tests)
✓ test/e2e-scenario/framework-tests/e2e-plan-compiler.test.ts
✓ test/e2e-scenario/framework-tests/e2e-scenario-resolver.test.ts
58 tests passed

Manual smoke:

$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix | jq 'length'
22
$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix | jq -r '.[].runner' | sort -u
linux-amd64-gpu-rtxpro6000-latest-1
macos-26
ubuntu-latest
windows-latest

Test depth

Unit tests are sufficient — this is a build/CI plumbing change, not a behavioral change to the test runner or scenarios themselves. The actual scenario execution path (e2e-scenarios.yaml) is unchanged. After merge, the next manual workflow_dispatch of E2E / Scenario Runner / All will validate the matrix end-to-end.

Follow-ups (separate PRs)

Delete the now-dead ROUTES bash map in e2e-scenarios.yaml and have its resolve-runner job consume runner-routing.ts too.
Consider sharding e2e-scenarios-all.yaml by platform once we cross ~80 scenarios (currently at 22, plenty of headroom against the 256-job GHA ceiling).

Summary by CodeRabbit

Tests
- Added comprehensive test coverage for E2E scenario matrix generation and runner routing validation.
Chores
- CI/CD workflow now dynamically generates end-to-end test scenarios at runtime instead of static job definitions.
- Enhanced scenario runner to support emitting test matrix configuration for GitHub Actions.

Replace the hand-wired job list in e2e-scenarios-all.yaml with a generated matrix sourced from the existing typed scenario registry. Adding a scenario in test/e2e-scenario/scenarios/scenarios/baseline.ts now automatically produces a tile in the fan-out workflow on the next run, with no workflow edits required. This closes the drift gap that left ~14 of 22 registered scenarios undispatched. Changes: - Add --emit-matrix flag to test/e2e-scenario/scenarios/run.ts that walks listScenarios() and prints a single-line JSON array of GHA matrix include entries: { id, runner, label, platform, suites }. - Extract runner-routing.ts that derives the runs-on label from ScenarioEnvironment.platform, with a runs-on:<label> override path via runnerRequirements. Throws on unknown platforms so missing mappings fail loudly during matrix generation. - Refactor .github/workflows/e2e-scenarios-all.yaml to a two-job shape: generate-matrix emits the JSON, run-scenario fans out via strategy.matrix.include and calls the existing e2e-scenarios.yaml reusable workflow. Tile names come from the typed label so the sidebar reflects what is actually being tested. - Guard the run.ts CLI bootstrap so importing buildScenarioMatrix from tests does not trigger the side-effecting main() path. - Add e2e-scenario-matrix.test.ts covering matrix shape, registry parity, runner overrides, unknown-platform errors, and CLI output contract.

copy-pr-bot · 2026-05-27T22:10:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-27T22:10:57Z

📝 Walkthrough

Walkthrough

This PR replaces static GitHub Actions job enumeration with a runtime-generated matrix by introducing typed runner resolution logic, extending the scenario script with a --emit-matrix CLI flag, and updating the workflow to invoke the matrix generator and fan out jobs dynamically via GitHub's matrix strategy.

Changes

Dynamic Scenario Matrix Generation

Layer / File(s)	Summary
Runner routing foundation `test/e2e-scenario/scenarios/runner-routing.ts`	Exports `ResolvedRunner` interface and platform-to-runner default mapping, then implements `resolveRunnerForScenario()` to select a runner via explicit `runs-on:` override precedence or platform-based fallback, throwing on unmapped platforms.
Scenario matrix construction `test/e2e-scenario/scenarios/run.ts` (imports, CLI, build functions)	Adds `--emit-matrix` CLI option and imports runner resolution; introduces `buildScenarioMatrix()` to map all registry scenarios to typed `ScenarioMatrixEntry` objects with resolved runner labels and formatted labels combining platform, scenario id, suite grouping, and expected-failure class.
Matrix emission and module guarding `test/e2e-scenario/scenarios/run.ts` (emission, module guards)	Implements `emitMatrix()` to output JSON to stdout; wraps main execution with `isInvokedDirectly()` comparison so the script can be imported by tests; adds try/catch error handling and `process.exitCode` control.
GitHub Actions workflow integration `.github/workflows/e2e-scenarios-all.yaml`	Updates header comment; replaces hardcoded per-scenario jobs with a `generate-matrix` job that runs the scenario script, validates non-empty output, renders a summary table, and a single `run-scenario` reusable-workflow job that iterates over the dynamic matrix to fan out one execution per scenario.
E2E matrix tests `test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts`	Complete test suite that spawns the scenario runner script and validates coverage of all registered scenarios, correct runner/label structure, platform-to-runner routing with explicit overrides, single-line JSON output compatibility with `$GITHUB_OUTPUT`, and error handling for unmapped platforms.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

CI/CD, E2E

Suggested reviewers

cv

Poem

🐰 A matrix rises from the registry code,
No hardcoded jobs on the testing road!
Platform to runner, scenario to stage—
Dynamic fan-out writes the new page! ✨
The test validates the flow, precise and bright,
Let the scenarios run—overnight by night!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main change: generating a scenario fan-out matrix from a typed registry instead of static YAML configuration. It clearly reflects the primary objective described in the PR objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/dynamic-e2e-scenario-matrix

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-27T22:12:21Z

E2E Advisor Recommendation

Required E2E: e2e-scenarios-all
Optional E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw

Dispatch hint: workflow_dispatch; no inputs

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

e2e-scenarios-all (high; fans out all registered typed scenarios and may use GPU, macOS, Windows/WSL, and Ubuntu runners): Required because the all-scenarios fan-out workflow and generated scenario matrix are the primary changed surface. Running it validates matrix emission from the typed registry and the reusable scenario workflow path for every registered scenario, including platform-specific runner routing.

Optional E2E

e2e-scenarios:ubuntu-repo-cloud-openclaw (medium): Useful quick smoke of the reusable single-scenario workflow and the updated scenario runner CLI path before or while the full all-scenarios fan-out runs. Not a substitute for the required all-scenarios workflow because it does not cover special-runner matrix routing.

New E2E recommendations

runner-routing-consistency (high): The new TypeScript runner resolver is used to emit the all-scenarios matrix, while the reusable e2e-scenarios.yaml workflow still resolves runners internally. Add coverage that proves the generated matrix routes and the reusable workflow runner selection stay consistent, or refactor the workflow to consume the emitted runner directly.
- Suggested test: Add a workflow-boundary or scenario E2E integration check that compares every typed scenario's emitted runner with the runner selected by e2e-scenarios.yaml before dispatch.

Dispatch hint

Workflow: e2e-scenarios-all.yaml
jobs input: workflow_dispatch; no inputs

github-actions · 2026-05-27T22:12:22Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: the all-scenarios fan-out workflow changed
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

.github/workflows/e2e-scenarios-all.yaml

github-actions · 2026-05-27T22:14:24Z

PR Review Advisor

Findings: 1 needs attention, 3 worth checking, 0 nice ideas
Top item: Generated fan-out includes scenarios child workflow cannot route

Review findings

🛠️ Needs attention

Generated fan-out includes scenarios the called workflow still rejects (.github/workflows/e2e-scenarios-all.yaml:80): The all-scenarios workflow now builds its matrix from every typed registry entry and passes each `matrix.id` to `.github/workflows/e2e-scenarios.yaml`. That called workflow remains the authoritative execution path and still has a hardcoded `ROUTES` map that lacks current registry IDs such as `ubuntu-repo-cloud-openclaw-custom-policies`, `ubuntu-invalid-nvidia-key-negative`, and `ubuntu-gateway-port-conflict-negative`. Those generated jobs will reach the child workflow's `No runner route for scenario` error, so the PR's acceptance claim that adding to `baseline.ts` is sufficient is not currently true.
- Recommendation: Make the executable routing source match the generated matrix before expanding the fan-out: either update/remove the child workflow `ROUTES` map to consume the same validated typed router, or filter the all-workflow matrix to only IDs the child workflow can currently execute. Add a static workflow contract test that verifies every generated `listScenarios()` ID is accepted by the called workflow.
- Evidence: `e2e-scenarios-all.yaml` uses `include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}` and passes `scenarios: ${{ matrix.id }}`. `run.ts` builds one entry per `listScenarios()`. The unchanged child workflow's `ROUTES` map includes `ubuntu-repo-cloud-openclaw-token-rotation` and `ubuntu-repo-openai-compatible-openclaw` but not the later registry IDs at `baseline.ts` lines 230, 238, and 252.

🔎 Worth checking

Source-of-truth review needed: workflow runner routing source of truth: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: The PR body says the child `ROUTES` map is left in place; code now generates all registry IDs and calls the child workflow for each.
Expanded fan-out does not pass all secrets required by generated scenarios (.github/workflows/e2e-scenarios-all.yaml:85): The generated matrix now includes every registry scenario, but the reusable workflow call still passes only `NVIDIA_API_KEY`. The typed registry includes scenarios whose `requiredSecrets` metadata names other secrets such as `OPENAI_COMPATIBLE_API_KEY`, `BRAVE_API_KEY`, `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, and `SLACK_BOT_TOKEN`. Even if some current dry-run paths do not consume them, the all-workflow contract is now broader than its secret plumbing.
- Recommendation: Either pass and declare the optional secrets required by the generated scenarios in the reusable workflow contract, or restrict this all-scenarios workflow to scenarios whose required secrets are available. Add a test that compares generated matrix IDs against the secrets exposed to the called workflow.
- Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`. `baseline.ts` declares `requiredSecrets` for OpenAI-compatible, Brave, Telegram, Discord, and Slack scenarios.
Runner override accepts arbitrary `runs-on:` labels without an allowlist (test/e2e-scenario/scenarios/runner-routing.ts:39): `resolveRunnerForScenario()` gives precedence to any `runnerRequirements` entry beginning with `runs-on:` and returns the suffix as the runner label. The changed workflow currently treats `matrix.runner` as informational and does not use it as `runs-on`, which limits immediate impact. However, the PR introduces and tests this field as workflow routing data, so using it directly later would allow branch-controlled scenario metadata to route secret-bearing jobs to unexpected runner labels.
- Recommendation: Validate `runs-on:` overrides against a narrow allowlist of approved GitHub-hosted and self-hosted labels before emitting them, or remove arbitrary overrides from the workflow-facing matrix until the trusted routing contract is finalized. Add a negative test for rejected runner labels.
- Evidence: `runner-routing.ts` returns `explicit.slice("runs-on:".length)` without validation. `e2e-scenario-matrix.test.ts` asserts that `runs-on:custom-self-hosted` is accepted.

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/e2e-scenario/scenarios/run.ts (1)

104-119: ⚡ Quick win

Apply the documented id sort before matrix emission.

The doc says matrix entries are sorted by id, but the implementation currently preserves registry order. Sorting here makes output deterministically diffable as intended.

Proposed fix

 export function buildScenarioMatrix(): ScenarioMatrixEntry[] {
-  return listScenarios().map((scenario): ScenarioMatrixEntry => {
+  return [...listScenarios()]
+    .sort((a, b) => a.id.localeCompare(b.id))
+    .map((scenario): ScenarioMatrixEntry => {
     const { runner } = resolveRunnerForScenario(scenario);
     return {
       id: scenario.id,
       runner,
       label: buildLabel(scenario),
       platform: scenario.environment?.platform ?? "unknown",
       suites: scenario.suiteIds ?? [],
     };
   });
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/run.ts` around lines 104 - 119, The
buildScenarioMatrix function currently maps listScenarios() without sorting, so
update buildScenarioMatrix to sort the scenarios by their id before mapping
(e.g., call .sort(...) on the array returned by listScenarios()) so the returned
ScenarioMatrixEntry[] is deterministically ordered by scenario.id; locate
buildScenarioMatrix and adjust the pipeline that uses listScenarios() and
resolveRunnerForScenario(scenario) to operate on a sorted array (use a string
compare/localeCompare on scenario.id).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/scenarios/runner-routing.ts`:
- Around line 39-42: The code currently accepts a `runs-on:` override that can
slice to an empty string; update the block that finds `explicit` from
`scenario.runnerRequirements` so it rejects empty labels — e.g., after computing
`const label = explicit.slice("runs-on:".length)` (or by ensuring `req.length >
"runs-on:".length` when matching) validate `label !== ""` and if empty throw a
clear Error like "Invalid runs-on: override: empty runner label" (or return a
failing result) instead of returning `{ runner: "", ... }`; refer to the
`explicit` variable and the `runnerRequirements` lookup when making this change.

---

Nitpick comments:
In `@test/e2e-scenario/scenarios/run.ts`:
- Around line 104-119: The buildScenarioMatrix function currently maps
listScenarios() without sorting, so update buildScenarioMatrix to sort the
scenarios by their id before mapping (e.g., call .sort(...) on the array
returned by listScenarios()) so the returned ScenarioMatrixEntry[] is
deterministically ordered by scenario.id; locate buildScenarioMatrix and adjust
the pipeline that uses listScenarios() and resolveRunnerForScenario(scenario) to
operate on a sorted array (use a string compare/localeCompare on scenario.id).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48621b06-d779-4cd5-a9ce-513bb8c854ab

📥 Commits

Reviewing files that changed from the base of the PR and between b14fd76 and b573c81.

📒 Files selected for processing (4)

.github/workflows/e2e-scenarios-all.yaml
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
test/e2e-scenario/scenarios/run.ts
test/e2e-scenario/scenarios/runner-routing.ts

coderabbitai · 2026-05-27T22:15:40Z

+  const explicit = (scenario.runnerRequirements ?? []).find((req) => req.startsWith("runs-on:"));
+  if (explicit) {
+    return { runner: explicit.slice("runs-on:".length), reason: "runnerRequirements override" };
+  }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject empty runs-on: overrides to prevent invalid runner labels.

runs-on: with no label currently resolves to an empty string and propagates into the matrix. Fail fast here with a clear error.

Proposed fix

export function resolveRunnerForScenario(scenario: ScenarioDefinition): ResolvedRunner { const explicit = (scenario.runnerRequirements ?? []).find((req) => req.startsWith("runs-on:")); if (explicit) { - return { runner: explicit.slice("runs-on:".length), reason: "runnerRequirements override" }; + const runner = explicit.slice("runs-on:".length).trim(); + if (!runner) { + throw new Error(`Cannot resolve runner for scenario '${scenario.id}': empty runs-on override.`); + } + return { runner, reason: "runnerRequirements override" }; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-scenario/scenarios/runner-routing.ts` around lines 39 - 42, The code currently accepts a `runs-on:` override that can slice to an empty string; update the block that finds `explicit` from `scenario.runnerRequirements` so it rejects empty labels — e.g., after computing `const label = explicit.slice("runs-on:".length)` (or by ensuring `req.length > "runs-on:".length` when matching) validate `label !== ""` and if empty throw a clear Error like "Invalid runs-on: override: empty runner label" (or return a failing result) instead of returning `{ runner: "", ... }`; refer to the `explicit` variable and the `runnerRequirements` lookup when making this change.

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

cv added the v0.0.55 Release target label May 27, 2026

wscurran added E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. labels May 27, 2026

jyaunches mentioned this pull request May 28, 2026

test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380

Open

Conversation

jyaunches commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

--emit-matrix flag on run.ts

runner-routing.ts helper

e2e-scenarios-all.yaml refactor

Adding a scenario from now on

What didn't change

Verification

Test depth

Follow-ups (separate PRs)

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 27, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 27, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 27, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jyaunches commented May 27, 2026 •

edited by coderabbitai Bot

Loading

`--emit-matrix` flag on `run.ts`

`runner-routing.ts` helper

`e2e-scenarios-all.yaml` refactor

coderabbitai Bot commented May 27, 2026 •

edited

Loading