Skip to content

feat(evals): Ring 1 scenario — react-vite/vanilla#576

Closed
kaiapeacock-eng wants to merge 4 commits into
ba-104-layer3-and-cifrom
ba-104-scenario-react-vite
Closed

feat(evals): Ring 1 scenario — react-vite/vanilla#576
kaiapeacock-eng wants to merge 4 commits into
ba-104-layer3-and-cifrom
ba-104-scenario-react-vite

Conversation

@kaiapeacock-eng
Copy link
Copy Markdown
Collaborator

@kaiapeacock-eng kaiapeacock-eng commented May 6, 2026

Summary

Vanilla React + Vite SPA — the "plain SPA" cohort with no meta-framework router. Tests VITE_ env prefix and the standard src/main.tsx entry. Distinct from RR7-data: no router config, init lands directly in main.tsx alongside createRoot.

Pristine: minimal create-vite-style React+TS starter. Golden adds @amplitude/unified, .env.local with VITE_AMPLITUDE_API_KEY, init() in main.tsx, and a track('Sign Up Submitted') call in App.tsx.

Test plan

  • pnpm evals:run react-vite/vanilla → 50/50, contractOk, hardFailed=false
  • All Layer 0 + Layer 1 scorers pass on the golden

Stacked: base is ba-104-layer3-and-ci (#570). Full chain: this → #570#567#560 → main.

🤖 Generated with Claude Code


Note

Medium Risk
Introduces new CI gating and new eval runner/scoring surfaces (including spawning pnpm install/build), which can affect PR signal quality and CI stability/timeouts but does not touch production runtime logic.

Overview
Adds a new PR-gate GitHub Actions workflow (evals-pr-scenarios.yml) that runs eval unit tests plus a Ring 1 scenario matrix and uploads eval reports.

Extends the eval framework with Layer 3 build validation: introduces runBuild to run pnpm install + a scenario buildCommand, stores a redacted stderr tail + exit codes in a new Artifact.buildResult, and adds an L3-build-passes scorer (with tests) wired into the scorer stack.

Introduces a new Ring 1 scenario react-vite/vanilla with pristine and golden fixtures and a manifest asserting VITE_ env usage, init in src/main.tsx, and a Sign Up Submitted tracking call.

Reviewed by Cursor Bugbot for commit 9bec9ba. Bugbot is set up for automated code reviews on this repo. Configure here.

@kaiapeacock-eng kaiapeacock-eng requested a review from a team as a code owner May 6, 2026 22:45
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Golden .gitignore incorrectly drops .env.local entry
    • Restored the missing .env.local line to the golden working .gitignore to match the pristine version and maintain filesystem consistency with the ndjson log.

Create PR

Or push these changes by commenting:

@cursor push 3f843b0287
Preview (3f843b0287)
diff --git a/evals/scenarios/react-vite/vanilla/golden/working/.gitignore b/evals/scenarios/react-vite/vanilla/golden/working/.gitignore
--- a/evals/scenarios/react-vite/vanilla/golden/working/.gitignore
+++ b/evals/scenarios/react-vite/vanilla/golden/working/.gitignore
@@ -1,2 +1,3 @@
 node_modules
 dist
+.env.local

You can send follow-ups to the cloud agent here.

Comment thread evals/scenarios/react-vite/vanilla/golden/working/.gitignore
@kaiapeacock-eng kaiapeacock-eng force-pushed the ba-104-layer3-and-ci branch 2 times, most recently from 411d2d3 to d06d185 Compare May 8, 2026 16:26
@kaiapeacock-eng kaiapeacock-eng force-pushed the ba-104-scenario-react-vite branch from 6a153bd to 216edee Compare May 8, 2026 16:27
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Invalid integrationHint "react-vite" not a valid CLI choice
    • Changed integrationHint from 'react-vite' to 'javascript_web', which is a valid CLI choice that maps to the same SDK package (@amplitude/unified).

Create PR

Or push these changes by commenting:

@cursor push cf4ca2185f
Preview (cf4ca2185f)
diff --git a/evals/scenarios/react-vite/vanilla/scenario.json b/evals/scenarios/react-vite/vanilla/scenario.json
--- a/evals/scenarios/react-vite/vanilla/scenario.json
+++ b/evals/scenarios/react-vite/vanilla/scenario.json
@@ -1,7 +1,7 @@
 {
   "name": "react-vite/vanilla",
   "ring": 1,
-  "integrationHint": "react-vite",
+  "integrationHint": "javascript_web",
   "buildCommand": ["pnpm", "build"],
   "expectedEnvPrefix": "VITE_",
   "expectedInitFile": "src/main.tsx",
@@ -13,5 +13,5 @@
     "webpack.config.js",
     "babel.config.js"
   ],
-  "notes": "Vanilla React + Vite SPA Ring 1 scenario — the 'plain SPA' cohort with no meta-framework router. Tests VITE_ env prefix and the standard src/main.tsx entry. SDK family defaults from FRAMEWORK_TO_SDK (react-vite → @amplitude/unified)."
+  "notes": "Vanilla React + Vite SPA Ring 1 scenario — the 'plain SPA' cohort with no meta-framework router. Tests VITE_ env prefix and the standard src/main.tsx entry. SDK family defaults from FRAMEWORK_TO_SDK (javascript_web → @amplitude/unified)."
 }

You can send follow-ups to the cloud agent here.

Comment thread evals/scenarios/react-vite/vanilla/scenario.json Outdated
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same blocker as #575: HIGH: integrationHint "react-vite" is not a valid --integration choice.\n\nevals/scenarios/react-vite/vanilla/scenario.json:4 sets integrationHint="react-vite", but src/commands/default.ts yargs choices only accept: nextjs, vue, react-router, django, flask, fastapi, javascript_web, javascript_node, python. With useDetection defaulting to true (introduced in #571), the runner forwards \ to the wizard spawn → yargs rejects → live runs hard-fail at spawn time. The 50/50 replay score doesn't exercise the wizard binary, so this passed your test plan.\n\nFix: either set integrationHint to "javascript_web" (the matching fallback for vanilla React + Vite) or add "useDetection": false to let the wizard auto-detect from the pristine fixture.\n\nThe 216edee .gitignore restoration fix is good — golden/working/.gitignore now matches pristine, so the captureFsSnapshot vs run.ndjson manifest will line up.\n\nFixture itself looks fine — minimal create-vite-style shape, init in main.tsx, track() in App.tsx. Will re-review once the integrationHint is fixed.

Copy link
Copy Markdown
Member

@kelsonpw kelsonpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same blocker as #575: HIGH: integrationHint "react-vite" is not a valid --integration choice.

evals/scenarios/react-vite/vanilla/scenario.json:4 sets integrationHint="react-vite", but src/commands/default.ts yargs choices only accept: nextjs, vue, react-router, django, flask, fastapi, javascript_web, javascript_node, python. With useDetection defaulting to true (introduced in #571), the runner forwards --integration react-vite to the wizard spawn → yargs rejects → live runs hard-fail at spawn time. The 50/50 replay score doesn't exercise the wizard binary, so this passed your test plan.

Fix: either set integrationHint to "javascript_web" (the matching fallback for vanilla React + Vite) or add "useDetection": false to let the wizard auto-detect from the pristine fixture.

The 216edee .gitignore restoration fix is good — golden/working/.gitignore now matches pristine, so the captureFsSnapshot vs run.ndjson manifest will line up.

Fixture itself looks fine — minimal create-vite-style shape, init in main.tsx, track() in App.tsx. Will re-review once the integrationHint is fixed.

kaiapeacock-eng and others added 4 commits May 8, 2026 10:35
Phase 4 of Week 2 — Layer 3 grades whether the integration still
compiles after the wizard runs, and a workflow gates PRs on Layers
0-3 with a Ring 1 matrix.

  * runner/build.ts — runBuild() spawns `pnpm install` then the
    scenario.buildCommand inside the working tree, captures stderr,
    redacts via the same observability/redact helper used for stdout.
    Throws only on timeouts/spawn errors; non-zero exit codes pass
    through to the BuildResult so the scorer can grade.
  * scorers/layer3-build/build-passes.ts — heavy 10pts. Skip-passes
    with weight 0 when no buildResult is on the artifact (current
    state on the existing golden); passes with weight 10 when the
    build exits 0; fails with the redacted stderr tail otherwise.
    Distinguishes install failure (likely lockfile drift) from
    build failure (likely a wizard regression) in the detail.
  * runner/types.ts — new BuildResult type; Artifact gains optional
    buildResult field.
  * runner/invoke-wizard.ts — runReplay loads optional
    golden/build-result.json so fixtures can pin a recorded outcome.
    runLive wiring is deferred — live runs require the eval-only
    Amplitude project (open decision #2) and pnpm install costs are
    not worth incurring on every dev-checkout run today.
  * runner/score.ts — orchestrator's downstream loop already covered
    Layer 3 via the layer > 0 filter; no change to the loop, just
    the scorer registration.
  * .github/workflows/evals-pr.yml — pull_request workflow,
    paths-filtered to the wizard agent surface + skills + evals,
    runs vitest on evals/ and a Ring 1 scenario matrix in parallel.
    Concurrency cancels stale runs. Artifacts uploaded for triage.
    Actions pinned to commit SHAs per the repo's other workflows.

Verified:
  * pnpm exec vitest run evals/ — 31/31 across 5 files
  * pnpm evals:run nextjs-app-router/vanilla — 50/50, contractOk
  * pnpm lint clean

Stacked on PR #567 (ba-104-runner-infra); will rebase onto main
after #567 lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vanilla React + Vite SPA — the "plain SPA" cohort with no
meta-framework router. Tests VITE_ env prefix and the standard
src/main.tsx entry. Distinct from RR7-data: no router config, init
lands directly in main.tsx alongside createRoot.

Pristine: minimal create-vite-style React+TS starter (package.json,
tsconfig, vite.config, index.html, src/main.tsx, src/App.tsx).
Golden adds @amplitude/unified, .env.local with VITE_AMPLITUDE_API_KEY,
init() in main.tsx, and a track('Sign Up Submitted') call in App.tsx.

Verified: pnpm evals:run react-vite/vanilla → 50/50, contractOk,
hardFailed=false. All Layer 0 + Layer 1 scorers pass.

Stacked on PR #570 (ba-104-layer3-and-ci); will rebase down the
chain once #560#567#570 land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The golden working .gitignore was missing the .env.local entry that
the pristine fixture has. Two problems:

1. Internal inconsistency — captureFsSnapshot would include .gitignore
   in diff.modified because the hash differs from pristine, but
   golden/run.ndjson doesn't list it as modified. Scorers comparing
   the fs diff against the ndjson manifest would see a mismatch.

2. Models incorrect agent behavior — removing .env.local from
   .gitignore is exactly the regression we want scorers to catch
   (criterion 8: API key never hardcoded; .env.local holding the key
   must stay gitignored). The vanilla nextjs golden correctly leaves
   .gitignore identical between pristine and golden.

Restore the entry so the fixture models the correct end state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per kelsonpw's review on #576: integrationHint='react-vite' isn't a
valid --integration choice (yargs only accepts the registered
framework names). With useDetection defaulting to true, the runner
forwards --integration react-vite to the wizard spawn → yargs
rejects → live runs hard-fail at spawn time.

Use 'javascript_web' — the matching fallback for vanilla React + Vite
which doesn't have its own integration entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kaiapeacock-eng kaiapeacock-eng force-pushed the ba-104-scenario-react-vite branch from 216edee to 9bec9ba Compare May 8, 2026 17:36
@kaiapeacock-eng kaiapeacock-eng force-pushed the ba-104-layer3-and-ci branch from d06d185 to 624a524 Compare May 8, 2026 17:36
@kaiapeacock-eng
Copy link
Copy Markdown
Collaborator Author

Fixed in 9bec9ba — integrationHint changed to 'javascript_web' per your suggestion (matching fallback for vanilla React + Vite). Also rebased onto fresh main.

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: CI matrix missing newly added Ring 1 scenario
    • Added react-vite/vanilla to the workflow matrix alongside nextjs-app-router/vanilla so the Ring 1 scenario is now exercised in PR-gate CI.
  • ✅ Fixed: Workflow paths trigger references wrong workflow filename
    • Changed the paths trigger from evals-pr.yml to evals-pr-scenarios.yml so the workflow correctly triggers when it is modified.

Create PR

Or push these changes by commenting:

@cursor push 856db6c10e
Preview (856db6c10e)
diff --git a/.github/workflows/evals-pr-scenarios.yml b/.github/workflows/evals-pr-scenarios.yml
--- a/.github/workflows/evals-pr-scenarios.yml
+++ b/.github/workflows/evals-pr-scenarios.yml
@@ -20,7 +20,7 @@
       - 'skills/integration/**'
       - 'evals/**'
       - 'package.json'
-      - '.github/workflows/evals-pr.yml'
+      - '.github/workflows/evals-pr-scenarios.yml'
 
 # Run-once-per-ref semantics: a new push to the PR cancels the prior
 # run. PR-gate evals are deterministic against golden replays today,
@@ -58,6 +58,7 @@
         # logic).
         scenario:
           - nextjs-app-router/vanilla
+          - react-vite/vanilla
     steps:
       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
       - uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v5.0.0

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 9bec9ba. Configure here.

# NDJSON contract violation (per evals/bin/run-eval.ts exit
# logic).
scenario:
- nextjs-app-router/vanilla
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI matrix missing newly added Ring 1 scenario

Medium Severity

The workflow matrix only includes nextjs-app-router/vanilla but omits the react-vite/vanilla scenario being added in this PR. The comment on line 55 explicitly says "Add new Ring 1 scenarios here as they land," and scenario.json declares "ring": 1. This means the new scenario won't be exercised by the PR-gate CI, defeating the purpose of this workflow.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9bec9ba. Configure here.

- 'skills/integration/**'
- 'evals/**'
- 'package.json'
- '.github/workflows/evals-pr.yml'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow paths trigger references wrong workflow filename

Low Severity

The paths trigger references .github/workflows/evals-pr.yml (the call-site gate workflow) instead of .github/workflows/evals-pr-scenarios.yml (this file's own name). Changes to this workflow won't trigger itself to re-run, so workflow modifications can't be validated before merge.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9bec9ba. Configure here.

@kaiapeacock-eng
Copy link
Copy Markdown
Collaborator Author

Subsumed by #670. The scenario directory was cherry-picked verbatim — git diff between this branch and #670 shows zero changes under evals/scenarios/react-vite/. Closing in favor of the consolidated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants