feat(evals): Ring 1 scenario — react-router-7/data#575
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Golden
.gitignoremissing.env.localentry from pristine- Restored the
.env.localentry to the golden working.gitignoreto match pristine, preventing scorer hash mismatches and modeling correct secret-protection behavior.
- Restored the
Or push these changes by commenting:
@cursor push d329a7ef19
Preview (d329a7ef19)
diff --git a/evals/scenarios/react-router-7/data/golden/working/.gitignore b/evals/scenarios/react-router-7/data/golden/working/.gitignore
--- a/evals/scenarios/react-router-7/data/golden/working/.gitignore
+++ b/evals/scenarios/react-router-7/data/golden/working/.gitignore
@@ -1,2 +1,3 @@
node_modules
dist
+.env.localYou can send follow-ups to the cloud agent here.
411d2d3 to
d06d185
Compare
6a85932 to
36fa899
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Invalid
integrationHintcauses yargs rejection at runtime- Changed integrationHint from 'react-router-7-data' to 'react-router', which is a valid CLI choice in the yargs choices array.
Or push these changes by commenting:
@cursor push 8462a91fe9
Preview (8462a91fe9)
diff --git a/evals/scenarios/react-router-7/data/scenario.json b/evals/scenarios/react-router-7/data/scenario.json
--- a/evals/scenarios/react-router-7/data/scenario.json
+++ b/evals/scenarios/react-router-7/data/scenario.json
@@ -1,7 +1,7 @@
{
"name": "react-router-7/data",
"ring": 1,
- "integrationHint": "react-router-7-data",
+ "integrationHint": "react-router",
"buildCommand": ["pnpm", "build"],
"expectedEnvPrefix": "VITE_",
"expectedInitFile": "src/main.tsx",You can send follow-ups to the cloud agent here.
kelsonpw
left a comment
There was a problem hiding this comment.
Two blockers before this can land:
-
HIGH:
integrationHint"react-router-7-data" is not a valid--integrationchoice.evals/scenarios/react-router-7/data/scenario.json:4setsintegrationHintto"react-router-7-data", but the wizard's yargs choices insrc/commands/default.tsonly accept:nextjs,vue,react-router,django,flask,fastapi,javascript_web,javascript_node,python. SinceuseDetectiondefaults totrue(per #571), the runner forwards--integration react-router-7-datato the wizard spawn, which yargs will reject before the wizard even starts. The 50/50 replay score in your test plan only exercises the golden replay path, which doesn't actually invoke the wizard binary. The first live run against this scenario will hard-fail at spawn. Either:- change
integrationHintto"react-router"(the closest valid choice — RR7 still routes through the react-router family detector), or - add
"useDetection": falseto the scenario and let the wizard auto-detect from the pristine fixture.
- change
-
MEDIUM (already fixed by 36fa899 but worth confirming):
golden/working/.gitignorewas missing.env.localentry; the fix commit restored it. Verifydiff.modifiedvsrun.ndjsonis now consistent.
Fixture content otherwise looks faithful to the data-mode RR7 shape. Stacked on #570. Will re-review once the integrationHint is sorted.
Phase 4 of Week 2 — Layer 3 grades whether the integration still
compiles after the wizard runs, and a workflow gates PRs on Layers
0-3 with a Ring 1 matrix.
* runner/build.ts — runBuild() spawns `pnpm install` then the
scenario.buildCommand inside the working tree, captures stderr,
redacts via the same observability/redact helper used for stdout.
Throws only on timeouts/spawn errors; non-zero exit codes pass
through to the BuildResult so the scorer can grade.
* scorers/layer3-build/build-passes.ts — heavy 10pts. Skip-passes
with weight 0 when no buildResult is on the artifact (current
state on the existing golden); passes with weight 10 when the
build exits 0; fails with the redacted stderr tail otherwise.
Distinguishes install failure (likely lockfile drift) from
build failure (likely a wizard regression) in the detail.
* runner/types.ts — new BuildResult type; Artifact gains optional
buildResult field.
* runner/invoke-wizard.ts — runReplay loads optional
golden/build-result.json so fixtures can pin a recorded outcome.
runLive wiring is deferred — live runs require the eval-only
Amplitude project (open decision #2) and pnpm install costs are
not worth incurring on every dev-checkout run today.
* runner/score.ts — orchestrator's downstream loop already covered
Layer 3 via the layer > 0 filter; no change to the loop, just
the scorer registration.
* .github/workflows/evals-pr.yml — pull_request workflow,
paths-filtered to the wizard agent surface + skills + evals,
runs vitest on evals/ and a Ring 1 scenario matrix in parallel.
Concurrency cancels stale runs. Artifacts uploaded for triage.
Actions pinned to commit SHAs per the repo's other workflows.
Verified:
* pnpm exec vitest run evals/ — 31/31 across 5 files
* pnpm evals:run nextjs-app-router/vanilla — 50/50, contractOk
* pnpm lint clean
Stacked on PR #567 (ba-104-runner-infra); will rebase onto main
after #567 lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Distinct from /framework: data mode keeps the router as a config object passed to RouterProvider, so init lands at the SPA entry (src/main.tsx) before the router is mounted, not in entry.client.tsx. The pair (framework + data) catches confusion between the two RR7 shapes — the regression class this Ring 1 slot exists for. Pristine + golden mirror the same shape as nextjs and rr7-framework: package.json, tsconfig, vite.config, index.html, two route files (home + signup), and a minimal main.tsx. Golden adds @amplitude/unified, .env.local with VITE_AMPLITUDE_API_KEY, init() in main.tsx, and track() calls in the route handlers. Verified: pnpm evals:run react-router-7/data → 50/50, contractOk, hardFailed=false. All Layer 0 + Layer 1 scorers pass. Stacked on PR #567 (ba-104-runner-infra); will rebase onto main after #567 lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same fixture inconsistency the react-vite golden had: the working .gitignore drops the .env.local entry that pristine carries, but golden/run.ndjson doesn't record .gitignore as modified. Scorers see a hash mismatch with no corresponding manifest line, and the fixture models the wrong end state — .env.local must stay gitignored so the API key the wizard writes there isn't committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per kelsonpw's review on #575: integrationHint='react-router-7-data' isn't a valid --integration choice (yargs only accepts: nextjs, vue, react-router, django, flask, fastapi, javascript_web, javascript_node, python). With useDetection defaulting to true, the runner forwards --integration react-router-7-data to the wizard spawn, which yargs rejects before the wizard starts. The 50/50 replay score didn't catch this because golden replay doesn't invoke the wizard binary. Use 'react-router' — RR7 routes through the same react-router family detector, so this is the correct hint regardless of data-vs-framework mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
36fa899 to
25f29ab
Compare
d06d185 to
624a524
Compare
|
Fixed in 25f29ab — integrationHint changed to 'react-router' per your suggestion (RR7 routes through the same react-router family detector). Also rebased onto fresh main to pick up the DataIngestionCheckScreen test fix that was breaking Node 22/24 unit tests across the eval stack. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix prepared fixes for all 3 issues found in the latest run.
- ✅ Fixed: Scenario fails validation:
integrationHintmissing from SDK table- Added 'react-router': '@amplitude/unified' entry to FRAMEWORK_TO_SDK table to satisfy the schema refinement check
- ✅ Fixed: Workflow path trigger references wrong filename for self
- Changed workflow path trigger from 'evals-pr.yml' to 'evals-pr-scenarios.yml' so config changes trigger CI
- ✅ Fixed: New scenario missing from CI workflow matrix
- Added 'react-router-7/data' to the scenario matrix so the new Ring 1 scenario runs in CI
Or push these changes by commenting:
@cursor push fc127263ab
Preview (fc127263ab)
diff --git a/.github/workflows/evals-pr-scenarios.yml b/.github/workflows/evals-pr-scenarios.yml
--- a/.github/workflows/evals-pr-scenarios.yml
+++ b/.github/workflows/evals-pr-scenarios.yml
@@ -20,7 +20,7 @@
- 'skills/integration/**'
- 'evals/**'
- 'package.json'
- - '.github/workflows/evals-pr.yml'
+ - '.github/workflows/evals-pr-scenarios.yml'
# Run-once-per-ref semantics: a new push to the PR cancels the prior
# run. PR-gate evals are deterministic against golden replays today,
@@ -58,6 +58,7 @@
# logic).
scenario:
- nextjs-app-router/vanilla
+ - react-router-7/data
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v5.0.0
diff --git a/evals/runner/framework-sdk-table.ts b/evals/runner/framework-sdk-table.ts
--- a/evals/runner/framework-sdk-table.ts
+++ b/evals/runner/framework-sdk-table.ts
@@ -23,6 +23,7 @@
vue: '@amplitude/unified',
react: '@amplitude/unified',
'react-vite': '@amplitude/unified',
+ 'react-router': '@amplitude/unified',
'react-router-6': '@amplitude/unified',
'react-router-7-data': '@amplitude/unified',
'react-router-7-declarative': '@amplitude/unified',You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 25f29ab. Configure here.
| { | ||
| "name": "react-router-7/data", | ||
| "ring": 1, | ||
| "integrationHint": "react-router", |
There was a problem hiding this comment.
Scenario fails validation: integrationHint missing from SDK table
High Severity
The scenario sets "integrationHint": "react-router" but the FRAMEWORK_TO_SDK table in framework-sdk-table.ts has no 'react-router' key — only 'react-router-6', 'react-router-7-data', 'react-router-7-declarative', and 'react-router-7-framework'. Since expectedSdkPackage is not provided in the scenario, the Zod schema's .refine() check (s.integrationHint in FRAMEWORK_TO_SDK) will fail at load time, throwing a ZodError before scoring ever begins. Running pnpm evals:run react-router-7/data will exit with an error immediately.
Reviewed by Cursor Bugbot for commit 25f29ab. Configure here.
| - 'skills/integration/**' | ||
| - 'evals/**' | ||
| - 'package.json' | ||
| - '.github/workflows/evals-pr.yml' |
There was a problem hiding this comment.
Workflow path trigger references wrong filename for self
Low Severity
The workflow file is named evals-pr-scenarios.yml but its paths trigger references .github/workflows/evals-pr.yml, which is a different pre-existing file. Edits to this workflow won't trigger it to re-run on the PR, defeating the self-check pattern used to validate CI config changes.
Reviewed by Cursor Bugbot for commit 25f29ab. Configure here.
| # NDJSON contract violation (per evals/bin/run-eval.ts exit | ||
| # logic). | ||
| scenario: | ||
| - nextjs-app-router/vanilla |
There was a problem hiding this comment.
New scenario missing from CI workflow matrix
Medium Severity
The scenario matrix only lists nextjs-app-router/vanilla but not the newly-added react-router-7/data scenario. Despite this PR introducing the Ring 1 scenario with a passing golden, it won't actually run in CI. The comment says "Add new Ring 1 scenarios here as they land" — and it's landing in this PR.
Reviewed by Cursor Bugbot for commit 25f29ab. Configure here.
624a524 to
6ee0853
Compare



Summary
Distinct from
/framework: data mode keeps the router as a config object passed toRouterProvider, so init lands at the SPA entry (src/main.tsx) before the router is mounted, not inentry.client.tsx. The pair (framework + data) catches confusion between the two RR7 shapes — the regression class this Ring 1 slot exists for.Pristine + golden mirror the same shape as nextjs and rr7-framework. Golden adds
@amplitude/unified,.env.localwithVITE_AMPLITUDE_API_KEY,init()inmain.tsx, andtrack()calls in the route handlers.Test plan
pnpm evals:run react-router-7/data→ 50/50, contractOk, hardFailed=falseStacked: base is
ba-104-layer3-and-ci(#570) so the diff shows only the scenario files. Full chain: this → #570 → #567 → #560 → main.🤖 Generated with Claude Code
Note
Medium Risk
Moderate risk due to new CI workflow and adding a build runner/scorer that can change PR gating behavior and introduce flaky/timeouts; changes are confined to the eval harness and fixtures.
Overview
Adds a new PR-gate GitHub Actions workflow (
evals-pr-scenarios.yml) that runs eval unit tests plus a Ring 1 scenario matrix and uploads per-run reports.Introduces Layer 3 build/typecheck signal to evals: a
runBuildrunner that captures redacted stderr tails and exit codes, extends the evalArtifactto carry optionalbuildResult(including golden replay support), and updates scoring orchestration to run all non-L0 layers (including the newL3-build-passesscorer).Adds a new Ring 1 scenario
react-router-7/datawith pristine and golden fixtures/NDJSON, and includes tests for the new Layer 3 scorer behavior.Reviewed by Cursor Bugbot for commit 25f29ab. Bugbot is set up for automated code reviews on this repo. Configure here.