Skip to content

feat(evals): Ring 1 scenario — react-router-7/data#575

Closed
kaiapeacock-eng wants to merge 4 commits into
ba-104-layer3-and-cifrom
ba-104-scenario-rr7-data
Closed

feat(evals): Ring 1 scenario — react-router-7/data#575
kaiapeacock-eng wants to merge 4 commits into
ba-104-layer3-and-cifrom
ba-104-scenario-rr7-data

Conversation

@kaiapeacock-eng
Copy link
Copy Markdown
Collaborator

@kaiapeacock-eng kaiapeacock-eng commented May 6, 2026

Summary

Distinct from /framework: data mode keeps the router as a config object passed to RouterProvider, so init lands at the SPA entry (src/main.tsx) before the router is mounted, not in entry.client.tsx. The pair (framework + data) catches confusion between the two RR7 shapes — the regression class this Ring 1 slot exists for.

Pristine + golden mirror the same shape as nextjs and rr7-framework. Golden adds @amplitude/unified, .env.local with VITE_AMPLITUDE_API_KEY, init() in main.tsx, and track() calls in the route handlers.

Test plan

  • pnpm evals:run react-router-7/data → 50/50, contractOk, hardFailed=false
  • All Layer 0 + Layer 1 scorers pass on the golden

Stacked: base is ba-104-layer3-and-ci (#570) so the diff shows only the scenario files. Full chain: this → #570#567#560 → main.

🤖 Generated with Claude Code


Note

Medium Risk
Moderate risk due to new CI workflow and adding a build runner/scorer that can change PR gating behavior and introduce flaky/timeouts; changes are confined to the eval harness and fixtures.

Overview
Adds a new PR-gate GitHub Actions workflow (evals-pr-scenarios.yml) that runs eval unit tests plus a Ring 1 scenario matrix and uploads per-run reports.

Introduces Layer 3 build/typecheck signal to evals: a runBuild runner that captures redacted stderr tails and exit codes, extends the eval Artifact to carry optional buildResult (including golden replay support), and updates scoring orchestration to run all non-L0 layers (including the new L3-build-passes scorer).

Adds a new Ring 1 scenario react-router-7/data with pristine and golden fixtures/NDJSON, and includes tests for the new Layer 3 scorer behavior.

Reviewed by Cursor Bugbot for commit 25f29ab. Bugbot is set up for automated code reviews on this repo. Configure here.

@kaiapeacock-eng kaiapeacock-eng requested a review from a team as a code owner May 6, 2026 22:38
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Golden .gitignore missing .env.local entry from pristine
    • Restored the .env.local entry to the golden working .gitignore to match pristine, preventing scorer hash mismatches and modeling correct secret-protection behavior.

Create PR

Or push these changes by commenting:

@cursor push d329a7ef19
Preview (d329a7ef19)
diff --git a/evals/scenarios/react-router-7/data/golden/working/.gitignore b/evals/scenarios/react-router-7/data/golden/working/.gitignore
--- a/evals/scenarios/react-router-7/data/golden/working/.gitignore
+++ b/evals/scenarios/react-router-7/data/golden/working/.gitignore
@@ -1,2 +1,3 @@
 node_modules
 dist
+.env.local

You can send follow-ups to the cloud agent here.

Comment thread evals/scenarios/react-router-7/data/golden/working/.gitignore
@kaiapeacock-eng kaiapeacock-eng force-pushed the ba-104-layer3-and-ci branch 2 times, most recently from 411d2d3 to d06d185 Compare May 8, 2026 16:26
@kaiapeacock-eng kaiapeacock-eng force-pushed the ba-104-scenario-rr7-data branch from 6a85932 to 36fa899 Compare May 8, 2026 16:26
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Invalid integrationHint causes yargs rejection at runtime
    • Changed integrationHint from 'react-router-7-data' to 'react-router', which is a valid CLI choice in the yargs choices array.

Create PR

Or push these changes by commenting:

@cursor push 8462a91fe9
Preview (8462a91fe9)
diff --git a/evals/scenarios/react-router-7/data/scenario.json b/evals/scenarios/react-router-7/data/scenario.json
--- a/evals/scenarios/react-router-7/data/scenario.json
+++ b/evals/scenarios/react-router-7/data/scenario.json
@@ -1,7 +1,7 @@
 {
   "name": "react-router-7/data",
   "ring": 1,
-  "integrationHint": "react-router-7-data",
+  "integrationHint": "react-router",
   "buildCommand": ["pnpm", "build"],
   "expectedEnvPrefix": "VITE_",
   "expectedInitFile": "src/main.tsx",

You can send follow-ups to the cloud agent here.

Comment thread evals/scenarios/react-router-7/data/scenario.json Outdated
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blockers before this can land:

  1. HIGH: integrationHint "react-router-7-data" is not a valid --integration choice. evals/scenarios/react-router-7/data/scenario.json:4 sets integrationHint to "react-router-7-data", but the wizard's yargs choices in src/commands/default.ts only accept: nextjs, vue, react-router, django, flask, fastapi, javascript_web, javascript_node, python. Since useDetection defaults to true (per #571), the runner forwards --integration react-router-7-data to the wizard spawn, which yargs will reject before the wizard even starts. The 50/50 replay score in your test plan only exercises the golden replay path, which doesn't actually invoke the wizard binary. The first live run against this scenario will hard-fail at spawn. Either:

    • change integrationHint to "react-router" (the closest valid choice — RR7 still routes through the react-router family detector), or
    • add "useDetection": false to the scenario and let the wizard auto-detect from the pristine fixture.
  2. MEDIUM (already fixed by 36fa899 but worth confirming): golden/working/.gitignore was missing .env.local entry; the fix commit restored it. Verify diff.modified vs run.ndjson is now consistent.

Fixture content otherwise looks faithful to the data-mode RR7 shape. Stacked on #570. Will re-review once the integrationHint is sorted.

kaiapeacock-eng and others added 4 commits May 8, 2026 10:34
Phase 4 of Week 2 — Layer 3 grades whether the integration still
compiles after the wizard runs, and a workflow gates PRs on Layers
0-3 with a Ring 1 matrix.

  * runner/build.ts — runBuild() spawns `pnpm install` then the
    scenario.buildCommand inside the working tree, captures stderr,
    redacts via the same observability/redact helper used for stdout.
    Throws only on timeouts/spawn errors; non-zero exit codes pass
    through to the BuildResult so the scorer can grade.
  * scorers/layer3-build/build-passes.ts — heavy 10pts. Skip-passes
    with weight 0 when no buildResult is on the artifact (current
    state on the existing golden); passes with weight 10 when the
    build exits 0; fails with the redacted stderr tail otherwise.
    Distinguishes install failure (likely lockfile drift) from
    build failure (likely a wizard regression) in the detail.
  * runner/types.ts — new BuildResult type; Artifact gains optional
    buildResult field.
  * runner/invoke-wizard.ts — runReplay loads optional
    golden/build-result.json so fixtures can pin a recorded outcome.
    runLive wiring is deferred — live runs require the eval-only
    Amplitude project (open decision #2) and pnpm install costs are
    not worth incurring on every dev-checkout run today.
  * runner/score.ts — orchestrator's downstream loop already covered
    Layer 3 via the layer > 0 filter; no change to the loop, just
    the scorer registration.
  * .github/workflows/evals-pr.yml — pull_request workflow,
    paths-filtered to the wizard agent surface + skills + evals,
    runs vitest on evals/ and a Ring 1 scenario matrix in parallel.
    Concurrency cancels stale runs. Artifacts uploaded for triage.
    Actions pinned to commit SHAs per the repo's other workflows.

Verified:
  * pnpm exec vitest run evals/ — 31/31 across 5 files
  * pnpm evals:run nextjs-app-router/vanilla — 50/50, contractOk
  * pnpm lint clean

Stacked on PR #567 (ba-104-runner-infra); will rebase onto main
after #567 lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Distinct from /framework: data mode keeps the router as a config
object passed to RouterProvider, so init lands at the SPA entry
(src/main.tsx) before the router is mounted, not in entry.client.tsx.
The pair (framework + data) catches confusion between the two RR7
shapes — the regression class this Ring 1 slot exists for.

Pristine + golden mirror the same shape as nextjs and rr7-framework:
package.json, tsconfig, vite.config, index.html, two route files
(home + signup), and a minimal main.tsx. Golden adds @amplitude/unified,
.env.local with VITE_AMPLITUDE_API_KEY, init() in main.tsx, and
track() calls in the route handlers.

Verified: pnpm evals:run react-router-7/data → 50/50, contractOk,
hardFailed=false. All Layer 0 + Layer 1 scorers pass.

Stacked on PR #567 (ba-104-runner-infra); will rebase onto main
after #567 lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same fixture inconsistency the react-vite golden had: the working
.gitignore drops the .env.local entry that pristine carries, but
golden/run.ndjson doesn't record .gitignore as modified. Scorers see a
hash mismatch with no corresponding manifest line, and the fixture
models the wrong end state — .env.local must stay gitignored so the
API key the wizard writes there isn't committed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per kelsonpw's review on #575: integrationHint='react-router-7-data'
isn't a valid --integration choice (yargs only accepts: nextjs, vue,
react-router, django, flask, fastapi, javascript_web, javascript_node,
python). With useDetection defaulting to true, the runner forwards
--integration react-router-7-data to the wizard spawn, which yargs
rejects before the wizard starts. The 50/50 replay score didn't catch
this because golden replay doesn't invoke the wizard binary.

Use 'react-router' — RR7 routes through the same react-router family
detector, so this is the correct hint regardless of data-vs-framework
mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kaiapeacock-eng kaiapeacock-eng force-pushed the ba-104-scenario-rr7-data branch from 36fa899 to 25f29ab Compare May 8, 2026 17:36
@kaiapeacock-eng kaiapeacock-eng force-pushed the ba-104-layer3-and-ci branch from d06d185 to 624a524 Compare May 8, 2026 17:36
@kaiapeacock-eng
Copy link
Copy Markdown
Collaborator Author

Fixed in 25f29ab — integrationHint changed to 'react-router' per your suggestion (RR7 routes through the same react-router family detector). Also rebased onto fresh main to pick up the DataIngestionCheckScreen test fix that was breaking Node 22/24 unit tests across the eval stack.

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Scenario fails validation: integrationHint missing from SDK table
    • Added 'react-router': '@amplitude/unified' entry to FRAMEWORK_TO_SDK table to satisfy the schema refinement check
  • ✅ Fixed: Workflow path trigger references wrong filename for self
    • Changed workflow path trigger from 'evals-pr.yml' to 'evals-pr-scenarios.yml' so config changes trigger CI
  • ✅ Fixed: New scenario missing from CI workflow matrix
    • Added 'react-router-7/data' to the scenario matrix so the new Ring 1 scenario runs in CI

Create PR

Or push these changes by commenting:

@cursor push fc127263ab
Preview (fc127263ab)
diff --git a/.github/workflows/evals-pr-scenarios.yml b/.github/workflows/evals-pr-scenarios.yml
--- a/.github/workflows/evals-pr-scenarios.yml
+++ b/.github/workflows/evals-pr-scenarios.yml
@@ -20,7 +20,7 @@
       - 'skills/integration/**'
       - 'evals/**'
       - 'package.json'
-      - '.github/workflows/evals-pr.yml'
+      - '.github/workflows/evals-pr-scenarios.yml'
 
 # Run-once-per-ref semantics: a new push to the PR cancels the prior
 # run. PR-gate evals are deterministic against golden replays today,
@@ -58,6 +58,7 @@
         # logic).
         scenario:
           - nextjs-app-router/vanilla
+          - react-router-7/data
     steps:
       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
       - uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v5.0.0

diff --git a/evals/runner/framework-sdk-table.ts b/evals/runner/framework-sdk-table.ts
--- a/evals/runner/framework-sdk-table.ts
+++ b/evals/runner/framework-sdk-table.ts
@@ -23,6 +23,7 @@
   vue: '@amplitude/unified',
   react: '@amplitude/unified',
   'react-vite': '@amplitude/unified',
+  'react-router': '@amplitude/unified',
   'react-router-6': '@amplitude/unified',
   'react-router-7-data': '@amplitude/unified',
   'react-router-7-declarative': '@amplitude/unified',

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 25f29ab. Configure here.

{
"name": "react-router-7/data",
"ring": 1,
"integrationHint": "react-router",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scenario fails validation: integrationHint missing from SDK table

High Severity

The scenario sets "integrationHint": "react-router" but the FRAMEWORK_TO_SDK table in framework-sdk-table.ts has no 'react-router' key — only 'react-router-6', 'react-router-7-data', 'react-router-7-declarative', and 'react-router-7-framework'. Since expectedSdkPackage is not provided in the scenario, the Zod schema's .refine() check (s.integrationHint in FRAMEWORK_TO_SDK) will fail at load time, throwing a ZodError before scoring ever begins. Running pnpm evals:run react-router-7/data will exit with an error immediately.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 25f29ab. Configure here.

- 'skills/integration/**'
- 'evals/**'
- 'package.json'
- '.github/workflows/evals-pr.yml'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow path trigger references wrong filename for self

Low Severity

The workflow file is named evals-pr-scenarios.yml but its paths trigger references .github/workflows/evals-pr.yml, which is a different pre-existing file. Edits to this workflow won't trigger it to re-run on the PR, defeating the self-check pattern used to validate CI config changes.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 25f29ab. Configure here.

# NDJSON contract violation (per evals/bin/run-eval.ts exit
# logic).
scenario:
- nextjs-app-router/vanilla
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New scenario missing from CI workflow matrix

Medium Severity

The scenario matrix only lists nextjs-app-router/vanilla but not the newly-added react-router-7/data scenario. Despite this PR introducing the Ring 1 scenario with a passing golden, it won't actually run in CI. The comment says "Add new Ring 1 scenarios here as they land" — and it's landing in this PR.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 25f29ab. Configure here.

@kaiapeacock-eng
Copy link
Copy Markdown
Collaborator Author

Subsumed by #670. The scenario directory was cherry-picked verbatim — git diff between this branch and #670 shows zero changes under evals/scenarios/react-router-7/data/. Closing in favor of the consolidated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants