feat(evals): Ring 1 scenario — react-vite/vanilla#576
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Golden
.gitignoreincorrectly drops.env.localentry- Restored the missing
.env.localline to the golden working.gitignoreto match the pristine version and maintain filesystem consistency with the ndjson log.
- Restored the missing
Or push these changes by commenting:
@cursor push 3f843b0287
Preview (3f843b0287)
diff --git a/evals/scenarios/react-vite/vanilla/golden/working/.gitignore b/evals/scenarios/react-vite/vanilla/golden/working/.gitignore
--- a/evals/scenarios/react-vite/vanilla/golden/working/.gitignore
+++ b/evals/scenarios/react-vite/vanilla/golden/working/.gitignore
@@ -1,2 +1,3 @@
node_modules
dist
+.env.localYou can send follow-ups to the cloud agent here.
411d2d3 to
d06d185
Compare
6a153bd to
216edee
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Invalid
integrationHint"react-vite" not a valid CLI choice- Changed integrationHint from 'react-vite' to 'javascript_web', which is a valid CLI choice that maps to the same SDK package (@amplitude/unified).
Or push these changes by commenting:
@cursor push cf4ca2185f
Preview (cf4ca2185f)
diff --git a/evals/scenarios/react-vite/vanilla/scenario.json b/evals/scenarios/react-vite/vanilla/scenario.json
--- a/evals/scenarios/react-vite/vanilla/scenario.json
+++ b/evals/scenarios/react-vite/vanilla/scenario.json
@@ -1,7 +1,7 @@
{
"name": "react-vite/vanilla",
"ring": 1,
- "integrationHint": "react-vite",
+ "integrationHint": "javascript_web",
"buildCommand": ["pnpm", "build"],
"expectedEnvPrefix": "VITE_",
"expectedInitFile": "src/main.tsx",
@@ -13,5 +13,5 @@
"webpack.config.js",
"babel.config.js"
],
- "notes": "Vanilla React + Vite SPA Ring 1 scenario — the 'plain SPA' cohort with no meta-framework router. Tests VITE_ env prefix and the standard src/main.tsx entry. SDK family defaults from FRAMEWORK_TO_SDK (react-vite → @amplitude/unified)."
+ "notes": "Vanilla React + Vite SPA Ring 1 scenario — the 'plain SPA' cohort with no meta-framework router. Tests VITE_ env prefix and the standard src/main.tsx entry. SDK family defaults from FRAMEWORK_TO_SDK (javascript_web → @amplitude/unified)."
}You can send follow-ups to the cloud agent here.
kelsonpw
left a comment
There was a problem hiding this comment.
Same blocker as #575: HIGH: integrationHint "react-vite" is not a valid --integration choice.\n\nevals/scenarios/react-vite/vanilla/scenario.json:4 sets integrationHint="react-vite", but src/commands/default.ts yargs choices only accept: nextjs, vue, react-router, django, flask, fastapi, javascript_web, javascript_node, python. With useDetection defaulting to true (introduced in #571), the runner forwards \ to the wizard spawn → yargs rejects → live runs hard-fail at spawn time. The 50/50 replay score doesn't exercise the wizard binary, so this passed your test plan.\n\nFix: either set integrationHint to "javascript_web" (the matching fallback for vanilla React + Vite) or add "useDetection": false to let the wizard auto-detect from the pristine fixture.\n\nThe 216edee .gitignore restoration fix is good — golden/working/.gitignore now matches pristine, so the captureFsSnapshot vs run.ndjson manifest will line up.\n\nFixture itself looks fine — minimal create-vite-style shape, init in main.tsx, track() in App.tsx. Will re-review once the integrationHint is fixed.
kelsonpw
left a comment
There was a problem hiding this comment.
Same blocker as #575: HIGH: integrationHint "react-vite" is not a valid --integration choice.
evals/scenarios/react-vite/vanilla/scenario.json:4 sets integrationHint="react-vite", but src/commands/default.ts yargs choices only accept: nextjs, vue, react-router, django, flask, fastapi, javascript_web, javascript_node, python. With useDetection defaulting to true (introduced in #571), the runner forwards --integration react-vite to the wizard spawn → yargs rejects → live runs hard-fail at spawn time. The 50/50 replay score doesn't exercise the wizard binary, so this passed your test plan.
Fix: either set integrationHint to "javascript_web" (the matching fallback for vanilla React + Vite) or add "useDetection": false to let the wizard auto-detect from the pristine fixture.
The 216edee .gitignore restoration fix is good — golden/working/.gitignore now matches pristine, so the captureFsSnapshot vs run.ndjson manifest will line up.
Fixture itself looks fine — minimal create-vite-style shape, init in main.tsx, track() in App.tsx. Will re-review once the integrationHint is fixed.
Phase 4 of Week 2 — Layer 3 grades whether the integration still
compiles after the wizard runs, and a workflow gates PRs on Layers
0-3 with a Ring 1 matrix.
* runner/build.ts — runBuild() spawns `pnpm install` then the
scenario.buildCommand inside the working tree, captures stderr,
redacts via the same observability/redact helper used for stdout.
Throws only on timeouts/spawn errors; non-zero exit codes pass
through to the BuildResult so the scorer can grade.
* scorers/layer3-build/build-passes.ts — heavy 10pts. Skip-passes
with weight 0 when no buildResult is on the artifact (current
state on the existing golden); passes with weight 10 when the
build exits 0; fails with the redacted stderr tail otherwise.
Distinguishes install failure (likely lockfile drift) from
build failure (likely a wizard regression) in the detail.
* runner/types.ts — new BuildResult type; Artifact gains optional
buildResult field.
* runner/invoke-wizard.ts — runReplay loads optional
golden/build-result.json so fixtures can pin a recorded outcome.
runLive wiring is deferred — live runs require the eval-only
Amplitude project (open decision #2) and pnpm install costs are
not worth incurring on every dev-checkout run today.
* runner/score.ts — orchestrator's downstream loop already covered
Layer 3 via the layer > 0 filter; no change to the loop, just
the scorer registration.
* .github/workflows/evals-pr.yml — pull_request workflow,
paths-filtered to the wizard agent surface + skills + evals,
runs vitest on evals/ and a Ring 1 scenario matrix in parallel.
Concurrency cancels stale runs. Artifacts uploaded for triage.
Actions pinned to commit SHAs per the repo's other workflows.
Verified:
* pnpm exec vitest run evals/ — 31/31 across 5 files
* pnpm evals:run nextjs-app-router/vanilla — 50/50, contractOk
* pnpm lint clean
Stacked on PR #567 (ba-104-runner-infra); will rebase onto main
after #567 lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vanilla React + Vite SPA — the "plain SPA" cohort with no
meta-framework router. Tests VITE_ env prefix and the standard
src/main.tsx entry. Distinct from RR7-data: no router config, init
lands directly in main.tsx alongside createRoot.
Pristine: minimal create-vite-style React+TS starter (package.json,
tsconfig, vite.config, index.html, src/main.tsx, src/App.tsx).
Golden adds @amplitude/unified, .env.local with VITE_AMPLITUDE_API_KEY,
init() in main.tsx, and a track('Sign Up Submitted') call in App.tsx.
Verified: pnpm evals:run react-vite/vanilla → 50/50, contractOk,
hardFailed=false. All Layer 0 + Layer 1 scorers pass.
Stacked on PR #570 (ba-104-layer3-and-ci); will rebase down the
chain once #560 → #567 → #570 land.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The golden working .gitignore was missing the .env.local entry that the pristine fixture has. Two problems: 1. Internal inconsistency — captureFsSnapshot would include .gitignore in diff.modified because the hash differs from pristine, but golden/run.ndjson doesn't list it as modified. Scorers comparing the fs diff against the ndjson manifest would see a mismatch. 2. Models incorrect agent behavior — removing .env.local from .gitignore is exactly the regression we want scorers to catch (criterion 8: API key never hardcoded; .env.local holding the key must stay gitignored). The vanilla nextjs golden correctly leaves .gitignore identical between pristine and golden. Restore the entry so the fixture models the correct end state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per kelsonpw's review on #576: integrationHint='react-vite' isn't a valid --integration choice (yargs only accepts the registered framework names). With useDetection defaulting to true, the runner forwards --integration react-vite to the wizard spawn → yargs rejects → live runs hard-fail at spawn time. Use 'javascript_web' — the matching fallback for vanilla React + Vite which doesn't have its own integration entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
216edee to
9bec9ba
Compare
d06d185 to
624a524
Compare
|
Fixed in 9bec9ba — integrationHint changed to 'javascript_web' per your suggestion (matching fallback for vanilla React + Vite). Also rebased onto fresh main. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: CI matrix missing newly added Ring 1 scenario
- Added react-vite/vanilla to the workflow matrix alongside nextjs-app-router/vanilla so the Ring 1 scenario is now exercised in PR-gate CI.
- ✅ Fixed: Workflow paths trigger references wrong workflow filename
- Changed the paths trigger from evals-pr.yml to evals-pr-scenarios.yml so the workflow correctly triggers when it is modified.
Or push these changes by commenting:
@cursor push 856db6c10e
Preview (856db6c10e)
diff --git a/.github/workflows/evals-pr-scenarios.yml b/.github/workflows/evals-pr-scenarios.yml
--- a/.github/workflows/evals-pr-scenarios.yml
+++ b/.github/workflows/evals-pr-scenarios.yml
@@ -20,7 +20,7 @@
- 'skills/integration/**'
- 'evals/**'
- 'package.json'
- - '.github/workflows/evals-pr.yml'
+ - '.github/workflows/evals-pr-scenarios.yml'
# Run-once-per-ref semantics: a new push to the PR cancels the prior
# run. PR-gate evals are deterministic against golden replays today,
@@ -58,6 +58,7 @@
# logic).
scenario:
- nextjs-app-router/vanilla
+ - react-vite/vanilla
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v5.0.0You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 9bec9ba. Configure here.
| # NDJSON contract violation (per evals/bin/run-eval.ts exit | ||
| # logic). | ||
| scenario: | ||
| - nextjs-app-router/vanilla |
There was a problem hiding this comment.
CI matrix missing newly added Ring 1 scenario
Medium Severity
The workflow matrix only includes nextjs-app-router/vanilla but omits the react-vite/vanilla scenario being added in this PR. The comment on line 55 explicitly says "Add new Ring 1 scenarios here as they land," and scenario.json declares "ring": 1. This means the new scenario won't be exercised by the PR-gate CI, defeating the purpose of this workflow.
Reviewed by Cursor Bugbot for commit 9bec9ba. Configure here.
| - 'skills/integration/**' | ||
| - 'evals/**' | ||
| - 'package.json' | ||
| - '.github/workflows/evals-pr.yml' |
There was a problem hiding this comment.
Workflow paths trigger references wrong workflow filename
Low Severity
The paths trigger references .github/workflows/evals-pr.yml (the call-site gate workflow) instead of .github/workflows/evals-pr-scenarios.yml (this file's own name). Changes to this workflow won't trigger itself to re-run, so workflow modifications can't be validated before merge.
Reviewed by Cursor Bugbot for commit 9bec9ba. Configure here.
624a524 to
6ee0853
Compare



Summary
Vanilla React + Vite SPA — the "plain SPA" cohort with no meta-framework router. Tests
VITE_env prefix and the standardsrc/main.tsxentry. Distinct from RR7-data: no router config, init lands directly inmain.tsxalongsidecreateRoot.Pristine: minimal create-vite-style React+TS starter. Golden adds
@amplitude/unified,.env.localwithVITE_AMPLITUDE_API_KEY,init()inmain.tsx, and atrack('Sign Up Submitted')call inApp.tsx.Test plan
pnpm evals:run react-vite/vanilla→ 50/50, contractOk, hardFailed=falseStacked: base is
ba-104-layer3-and-ci(#570). Full chain: this → #570 → #567 → #560 → main.🤖 Generated with Claude Code
Note
Medium Risk
Introduces new CI gating and new eval runner/scoring surfaces (including spawning
pnpm install/build), which can affect PR signal quality and CI stability/timeouts but does not touch production runtime logic.Overview
Adds a new PR-gate GitHub Actions workflow (
evals-pr-scenarios.yml) that runs eval unit tests plus a Ring 1 scenario matrix and uploads eval reports.Extends the eval framework with Layer 3 build validation: introduces
runBuildto runpnpm install+ a scenariobuildCommand, stores a redacted stderr tail + exit codes in a newArtifact.buildResult, and adds anL3-build-passesscorer (with tests) wired into the scorer stack.Introduces a new Ring 1 scenario
react-vite/vanillawith pristine and golden fixtures and a manifest assertingVITE_env usage, init insrc/main.tsx, and aSign Up Submittedtracking call.Reviewed by Cursor Bugbot for commit 9bec9ba. Bugbot is set up for automated code reviews on this repo. Configure here.