Skip to content

feat(e2e): generate scenario fan-out matrix from typed registry#4359

Open
jyaunches wants to merge 1 commit into
mainfrom
feat/dynamic-e2e-scenario-matrix
Open

feat(e2e): generate scenario fan-out matrix from typed registry#4359
jyaunches wants to merge 1 commit into
mainfrom
feat/dynamic-e2e-scenario-matrix

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches commented May 27, 2026

Why

The e2e-scenarios-all.yaml fan-out has 8 hand-wired jobs, but the typed scenario registry in baseline.ts already defines 22 scenarios. The other 14 are defined-but-not-dispatched, and every new scenario today requires a YAML edit that's easy to forget. The hand-wired job names also no longer reflect what each scenario actually tests (manifest, suites, expectedState).

This PR makes the fan-out generated from the existing typed registry so the workflow can never drift from baseline.ts again.

What changed

--emit-matrix flag on run.ts

Walks listScenarios() (the existing registry — no new registry created) and prints a single-line JSON array of GHA matrix include entries:

[
  { "id": "ubuntu-repo-cloud-openclaw",
    "runner": "ubuntu-latest",
    "label": "ubuntu-local · ubuntu-repo-cloud-openclaw · smoke+inference+credentials",
    "platform": "ubuntu-local",
    "suites": ["smoke", "inference", "credentials"] },
  ...
]

runner-routing.ts helper

Derives the runs-on label from ScenarioEnvironment.platform, with a runs-on:<label> override path via runnerRequirements. Throws on unknown platforms so missing mappings fail loudly during matrix generation rather than silently falling back to ubuntu-latest (which used to mask routing bugs in the legacy bash ROUTES map).

e2e-scenarios-all.yaml refactor

Two-job shape:

  1. generate-matrix — checks out, installs deps, runs --emit-matrix, and writes the JSON to $GITHUB_OUTPUT. Also renders a Markdown table of all matrix entries in the step summary so you can see the full plan at a glance.
  2. run-scenario — uses strategy.matrix.include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }} and calls the existing e2e-scenarios.yaml reusable workflow per scenario. Tile names use ${{ matrix.label }} so the sidebar reflects what's actually being tested.

Adding a scenario from now on

  1. Add the entry to canonicalScenarioInputs in baseline.ts.
  2. That's it. Next workflow run automatically picks it up as a new tile.

What didn't change

  • e2e-scenarios.yaml (the single-scenario reusable workflow) is untouched. Its resolve-runner job continues to be the authoritative runner-selection path during execution; the runner field in the matrix is informational and used for the sidebar label.
  • The bash ROUTES map in e2e-scenarios.yaml is left in place for now to keep this PR focused. It becomes effectively dead code once this lands and can be removed in a follow-up.
  • No new registry. Everything reads from test/e2e-scenario/scenarios/registry.ts.

Verification

Unit tests in e2e-scenario-matrix.test.ts cover:

  • matrix entry per registered scenario
  • runner resolves for every scenario
  • platform-default routing (ubuntu/macos/wsl/gpu)
  • explicit runs-on:<label> override
  • loud failure on unknown platform
  • single-line JSON output suitable for $GITHUB_OUTPUT

Local run:

✓ test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts (6 tests)
✓ test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts (6 tests)
✓ test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts (2 tests)
✓ test/e2e-scenario/framework-tests/e2e-plan-compiler.test.ts
✓ test/e2e-scenario/framework-tests/e2e-scenario-resolver.test.ts
58 tests passed

Manual smoke:

$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix | jq 'length'
22
$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix | jq -r '.[].runner' | sort -u
linux-amd64-gpu-rtxpro6000-latest-1
macos-26
ubuntu-latest
windows-latest

Test depth

Unit tests are sufficient — this is a build/CI plumbing change, not a behavioral change to the test runner or scenarios themselves. The actual scenario execution path (e2e-scenarios.yaml) is unchanged. After merge, the next manual workflow_dispatch of E2E / Scenario Runner / All will validate the matrix end-to-end.

Follow-ups (separate PRs)

  • Delete the now-dead ROUTES bash map in e2e-scenarios.yaml and have its resolve-runner job consume runner-routing.ts too.
  • Consider sharding e2e-scenarios-all.yaml by platform once we cross ~80 scenarios (currently at 22, plenty of headroom against the 256-job GHA ceiling).

Summary by CodeRabbit

  • Tests

    • Added comprehensive test coverage for E2E scenario matrix generation and runner routing validation.
  • Chores

    • CI/CD workflow now dynamically generates end-to-end test scenarios at runtime instead of static job definitions.
    • Enhanced scenario runner to support emitting test matrix configuration for GitHub Actions.

Review Change Stack

Replace the hand-wired job list in e2e-scenarios-all.yaml with a
generated matrix sourced from the existing typed scenario registry.
Adding a scenario in test/e2e-scenario/scenarios/scenarios/baseline.ts
now automatically produces a tile in the fan-out workflow on the next
run, with no workflow edits required. This closes the drift gap that
left ~14 of 22 registered scenarios undispatched.

Changes:
- Add --emit-matrix flag to test/e2e-scenario/scenarios/run.ts that
  walks listScenarios() and prints a single-line JSON array of GHA
  matrix include entries: { id, runner, label, platform, suites }.
- Extract runner-routing.ts that derives the runs-on label from
  ScenarioEnvironment.platform, with a runs-on:<label> override path
  via runnerRequirements. Throws on unknown platforms so missing
  mappings fail loudly during matrix generation.
- Refactor .github/workflows/e2e-scenarios-all.yaml to a two-job
  shape: generate-matrix emits the JSON, run-scenario fans out via
  strategy.matrix.include and calls the existing e2e-scenarios.yaml
  reusable workflow. Tile names come from the typed label so the
  sidebar reflects what is actually being tested.
- Guard the run.ts CLI bootstrap so importing buildScenarioMatrix
  from tests does not trigger the side-effecting main() path.
- Add e2e-scenario-matrix.test.ts covering matrix shape, registry
  parity, runner overrides, unknown-platform errors, and CLI output
  contract.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

This PR replaces static GitHub Actions job enumeration with a runtime-generated matrix by introducing typed runner resolution logic, extending the scenario script with a --emit-matrix CLI flag, and updating the workflow to invoke the matrix generator and fan out jobs dynamically via GitHub's matrix strategy.

Changes

Dynamic Scenario Matrix Generation

Layer / File(s) Summary
Runner routing foundation
test/e2e-scenario/scenarios/runner-routing.ts
Exports ResolvedRunner interface and platform-to-runner default mapping, then implements resolveRunnerForScenario() to select a runner via explicit runs-on: override precedence or platform-based fallback, throwing on unmapped platforms.
Scenario matrix construction
test/e2e-scenario/scenarios/run.ts (imports, CLI, build functions)
Adds --emit-matrix CLI option and imports runner resolution; introduces buildScenarioMatrix() to map all registry scenarios to typed ScenarioMatrixEntry objects with resolved runner labels and formatted labels combining platform, scenario id, suite grouping, and expected-failure class.
Matrix emission and module guarding
test/e2e-scenario/scenarios/run.ts (emission, module guards)
Implements emitMatrix() to output JSON to stdout; wraps main execution with isInvokedDirectly() comparison so the script can be imported by tests; adds try/catch error handling and process.exitCode control.
GitHub Actions workflow integration
.github/workflows/e2e-scenarios-all.yaml
Updates header comment; replaces hardcoded per-scenario jobs with a generate-matrix job that runs the scenario script, validates non-empty output, renders a summary table, and a single run-scenario reusable-workflow job that iterates over the dynamic matrix to fan out one execution per scenario.
E2E matrix tests
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
Complete test suite that spawns the scenario runner script and validates coverage of all registered scenarios, correct runner/label structure, platform-to-runner routing with explicit overrides, single-line JSON output compatibility with $GITHUB_OUTPUT, and error handling for unmapped platforms.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

CI/CD, E2E

Suggested reviewers

  • cv

Poem

🐰 A matrix rises from the registry code,
No hardcoded jobs on the testing road!
Platform to runner, scenario to stage—
Dynamic fan-out writes the new page! ✨
The test validates the flow, precise and bright,
Let the scenarios run—overnight by night!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: generating a scenario fan-out matrix from a typed registry instead of static YAML configuration. It clearly reflects the primary objective described in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dynamic-e2e-scenario-matrix

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: e2e-scenarios-all
Optional E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw

Dispatch hint: workflow_dispatch; no inputs

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • e2e-scenarios-all (high; fans out all registered typed scenarios and may use GPU, macOS, Windows/WSL, and Ubuntu runners): Required because the all-scenarios fan-out workflow and generated scenario matrix are the primary changed surface. Running it validates matrix emission from the typed registry and the reusable scenario workflow path for every registered scenario, including platform-specific runner routing.

Optional E2E

  • e2e-scenarios:ubuntu-repo-cloud-openclaw (medium): Useful quick smoke of the reusable single-scenario workflow and the updated scenario runner CLI path before or while the full all-scenarios fan-out runs. Not a substitute for the required all-scenarios workflow because it does not cover special-runner matrix routing.

New E2E recommendations

  • runner-routing-consistency (high): The new TypeScript runner resolver is used to emit the all-scenarios matrix, while the reusable e2e-scenarios.yaml workflow still resolves runners internally. Add coverage that proves the generated matrix routes and the reusable workflow runner selection stay consistent, or refactor the workflow to consume the emitted runner directly.
    • Suggested test: Add a workflow-boundary or scenario E2E integration check that compares every typed scenario's emitted runner with the runner selected by e2e-scenarios.yaml before dispatch.

Dispatch hint

  • Workflow: e2e-scenarios-all.yaml
  • jobs input: workflow_dispatch; no inputs

@github-actions
Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: the all-scenarios fan-out workflow changed
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • .github/workflows/e2e-scenarios-all.yaml

@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 1 needs attention, 3 worth checking, 0 nice ideas
Top item: Generated fan-out includes scenarios child workflow cannot route

Review findings

🛠️ Needs attention

  • Generated fan-out includes scenarios the called workflow still rejects (.github/workflows/e2e-scenarios-all.yaml:80): The all-scenarios workflow now builds its matrix from every typed registry entry and passes each `matrix.id` to `.github/workflows/e2e-scenarios.yaml`. That called workflow remains the authoritative execution path and still has a hardcoded `ROUTES` map that lacks current registry IDs such as `ubuntu-repo-cloud-openclaw-custom-policies`, `ubuntu-invalid-nvidia-key-negative`, and `ubuntu-gateway-port-conflict-negative`. Those generated jobs will reach the child workflow's `No runner route for scenario` error, so the PR's acceptance claim that adding to `baseline.ts` is sufficient is not currently true.
    • Recommendation: Make the executable routing source match the generated matrix before expanding the fan-out: either update/remove the child workflow `ROUTES` map to consume the same validated typed router, or filter the all-workflow matrix to only IDs the child workflow can currently execute. Add a static workflow contract test that verifies every generated `listScenarios()` ID is accepted by the called workflow.
    • Evidence: `e2e-scenarios-all.yaml` uses `include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}` and passes `scenarios: ${{ matrix.id }}`. `run.ts` builds one entry per `listScenarios()`. The unchanged child workflow's `ROUTES` map includes `ubuntu-repo-cloud-openclaw-token-rotation` and `ubuntu-repo-openai-compatible-openclaw` but not the later registry IDs at `baseline.ts` lines 230, 238, and 252.

🔎 Worth checking

  • Source-of-truth review needed: workflow runner routing source of truth: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: The PR body says the child `ROUTES` map is left in place; code now generates all registry IDs and calls the child workflow for each.
  • Expanded fan-out does not pass all secrets required by generated scenarios (.github/workflows/e2e-scenarios-all.yaml:85): The generated matrix now includes every registry scenario, but the reusable workflow call still passes only `NVIDIA_API_KEY`. The typed registry includes scenarios whose `requiredSecrets` metadata names other secrets such as `OPENAI_COMPATIBLE_API_KEY`, `BRAVE_API_KEY`, `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, and `SLACK_BOT_TOKEN`. Even if some current dry-run paths do not consume them, the all-workflow contract is now broader than its secret plumbing.
    • Recommendation: Either pass and declare the optional secrets required by the generated scenarios in the reusable workflow contract, or restrict this all-scenarios workflow to scenarios whose required secrets are available. Add a test that compares generated matrix IDs against the secrets exposed to the called workflow.
    • Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`. `baseline.ts` declares `requiredSecrets` for OpenAI-compatible, Brave, Telegram, Discord, and Slack scenarios.
  • Runner override accepts arbitrary `runs-on:` labels without an allowlist (test/e2e-scenario/scenarios/runner-routing.ts:39): `resolveRunnerForScenario()` gives precedence to any `runnerRequirements` entry beginning with `runs-on:` and returns the suffix as the runner label. The changed workflow currently treats `matrix.runner` as informational and does not use it as `runs-on`, which limits immediate impact. However, the PR introduces and tests this field as workflow routing data, so using it directly later would allow branch-controlled scenario metadata to route secret-bearing jobs to unexpected runner labels.
    • Recommendation: Validate `runs-on:` overrides against a narrow allowlist of approved GitHub-hosted and self-hosted labels before emitting them, or remove arbitrary overrides from the workflow-facing matrix until the trusted routing contract is finalized. Add a negative test for rejected runner labels.
    • Evidence: `runner-routing.ts` returns `explicit.slice("runs-on:".length)` without validation. `e2e-scenario-matrix.test.ts` asserts that `runs-on:custom-self-hosted` is accepted.

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e-scenario/scenarios/run.ts (1)

104-119: ⚡ Quick win

Apply the documented id sort before matrix emission.

The doc says matrix entries are sorted by id, but the implementation currently preserves registry order. Sorting here makes output deterministically diffable as intended.

Proposed fix
 export function buildScenarioMatrix(): ScenarioMatrixEntry[] {
-  return listScenarios().map((scenario): ScenarioMatrixEntry => {
+  return [...listScenarios()]
+    .sort((a, b) => a.id.localeCompare(b.id))
+    .map((scenario): ScenarioMatrixEntry => {
     const { runner } = resolveRunnerForScenario(scenario);
     return {
       id: scenario.id,
       runner,
       label: buildLabel(scenario),
       platform: scenario.environment?.platform ?? "unknown",
       suites: scenario.suiteIds ?? [],
     };
   });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/run.ts` around lines 104 - 119, The
buildScenarioMatrix function currently maps listScenarios() without sorting, so
update buildScenarioMatrix to sort the scenarios by their id before mapping
(e.g., call .sort(...) on the array returned by listScenarios()) so the returned
ScenarioMatrixEntry[] is deterministically ordered by scenario.id; locate
buildScenarioMatrix and adjust the pipeline that uses listScenarios() and
resolveRunnerForScenario(scenario) to operate on a sorted array (use a string
compare/localeCompare on scenario.id).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/scenarios/runner-routing.ts`:
- Around line 39-42: The code currently accepts a `runs-on:` override that can
slice to an empty string; update the block that finds `explicit` from
`scenario.runnerRequirements` so it rejects empty labels — e.g., after computing
`const label = explicit.slice("runs-on:".length)` (or by ensuring `req.length >
"runs-on:".length` when matching) validate `label !== ""` and if empty throw a
clear Error like "Invalid runs-on: override: empty runner label" (or return a
failing result) instead of returning `{ runner: "", ... }`; refer to the
`explicit` variable and the `runnerRequirements` lookup when making this change.

---

Nitpick comments:
In `@test/e2e-scenario/scenarios/run.ts`:
- Around line 104-119: The buildScenarioMatrix function currently maps
listScenarios() without sorting, so update buildScenarioMatrix to sort the
scenarios by their id before mapping (e.g., call .sort(...) on the array
returned by listScenarios()) so the returned ScenarioMatrixEntry[] is
deterministically ordered by scenario.id; locate buildScenarioMatrix and adjust
the pipeline that uses listScenarios() and resolveRunnerForScenario(scenario) to
operate on a sorted array (use a string compare/localeCompare on scenario.id).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48621b06-d779-4cd5-a9ce-513bb8c854ab

📥 Commits

Reviewing files that changed from the base of the PR and between b14fd76 and b573c81.

📒 Files selected for processing (4)
  • .github/workflows/e2e-scenarios-all.yaml
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/scenarios/run.ts
  • test/e2e-scenario/scenarios/runner-routing.ts

Comment on lines +39 to +42
const explicit = (scenario.runnerRequirements ?? []).find((req) => req.startsWith("runs-on:"));
if (explicit) {
return { runner: explicit.slice("runs-on:".length), reason: "runnerRequirements override" };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject empty runs-on: overrides to prevent invalid runner labels.

runs-on: with no label currently resolves to an empty string and propagates into the matrix. Fail fast here with a clear error.

Proposed fix
 export function resolveRunnerForScenario(scenario: ScenarioDefinition): ResolvedRunner {
   const explicit = (scenario.runnerRequirements ?? []).find((req) => req.startsWith("runs-on:"));
   if (explicit) {
-    return { runner: explicit.slice("runs-on:".length), reason: "runnerRequirements override" };
+    const runner = explicit.slice("runs-on:".length).trim();
+    if (!runner) {
+      throw new Error(`Cannot resolve runner for scenario '${scenario.id}': empty runs-on override.`);
+    }
+    return { runner, reason: "runnerRequirements override" };
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/runner-routing.ts` around lines 39 - 42, The code
currently accepts a `runs-on:` override that can slice to an empty string;
update the block that finds `explicit` from `scenario.runnerRequirements` so it
rejects empty labels — e.g., after computing `const label =
explicit.slice("runs-on:".length)` (or by ensuring `req.length >
"runs-on:".length` when matching) validate `label !== ""` and if empty throw a
clear Error like "Invalid runs-on: override: empty runner label" (or return a
failing result) instead of returning `{ runner: "", ... }`; refer to the
`explicit` variable and the `runnerRequirements` lookup when making this change.

@cv cv added the v0.0.55 Release target label May 27, 2026
@wscurran wscurran added E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. labels May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. v0.0.55 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants