Skip to content

chore(tooling): enforce GitHub agent workflow#2

Merged
tataihono merged 14 commits into
mainfrom
fix/1-enforce-agent-workflow
Feb 15, 2026
Merged

chore(tooling): enforce GitHub agent workflow#2
tataihono merged 14 commits into
mainfrom
fix/1-enforce-agent-workflow

Conversation

@tataihono
Copy link
Copy Markdown
Contributor

@tataihono tataihono commented Feb 15, 2026

Resolves #1

Summary

Enforce mandatory GitHub workflow for agents: issue first, branch, incremental conventional commits, PR, checks pass. Adds AGENTS.md workflow section, gh-workflow.mdc rule, commitlint+husky for conventional commits, and CI job to validate PR commits.

Contracts Changed

  • yes
  • no

Regeneration Required

  • yes
  • no

Validation

  • Contracts validated
  • Generated code verified (no manual edits)
  • Tests and build passed
  • Terraform plan reviewed (if infra change)

tataihono and others added 4 commits February 16, 2026 10:49
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 15, 2026

Warning

Rate limit exceeded

@tataihono has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds mandatory workflow documentation and enforces conventional commit messages via Husky/Commitlint locally and a new GitHub Actions "commits" job that validates PR commit messages; updates ESLint config and package scripts for tooling setup.

Changes

Cohort / File(s) Summary
Workflow docs
.cursor/rules/gh-workflow.mdc, AGENTS.md
New mandatory GitHub workflow specification and AGENTS.md subsection describing issue-first flow, branch/commit/PR rules, and required PR metadata.
Commit message enforcement
commitlint.config.js, .husky/commit-msg, package.json
Adds commitlint config, Husky commit-msg hook, prepare script, and devDependencies to enforce conventional commits locally. Check hook path and installation step.
CI validation
.github/workflows/ci.yml
Introduces a "commits" job on pull_request that runs commitlint between PR base and head SHAs (sets up pnpm and Node.js). Verify cache/fetch-depth and pnpm install flags.
Frontend tooling
apps/web/eslint.config.mjs, apps/web/package.json
Adds an ES module ESLint config for Next and switches lint script to run eslint . with new ESLint devDependencies. Confirm integration with existing linting pipeline.

Sequence Diagram(s)

sequenceDiagram
  participant Dev as Developer
  participant Issues as Issue Tracker
  participant Git as Local Git + Husky
  participant Commitlint as commitlint
  participant GH as GitHub (PR + Actions)

  Dev->>Issues: Create structured issue (bounded context, outcome, impacted folders)
  Dev->>Git: Create branch (name per rules)
  Dev->>Git: Make atomic commits
  Git->>Commitlint: commit-msg hook runs commitlint
  Commitlint-->>Git: Accept/Reject commit
  Dev->>GH: Push branch and open PR (template, "Resolves #")
  GH->>GH: Trigger workflows incl. "commits" job
  GH->>Commitlint: Run commitlint across PR commits
  Commitlint-->>GH: Pass/Fail job
  GH-->>Dev: CI results (fix/re-run if failures)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes to apps/web/eslint.config.mjs and apps/web/package.json for ESLint setup are not part of the linked issue #1 objectives on enforcing agent workflow. Remove or justify ESLint configuration changes in apps/web/, or create a separate issue to track this unrelated tooling enhancement.
✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed All coding requirements from issue #1 are met: AGENTS.md workflow section added, gh-workflow.mdc rule created, commitlint+husky configured, and CI job added to validate commits.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'chore(tooling): enforce GitHub agent workflow' accurately summarizes the main changes, which include adding workflow documentation, commitlint/Husky tooling, and CI validation for conventional commits.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/1-enforce-agent-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@AGENTS.md`:
- Line 19: The markdownlint MD038 warning is caused by a trailing space inside
the code span "`[Context] `"; remove the trailing space inside the backticks so
the prefix becomes "`[Context]`" or, if the space is intentional, keep the
current backticks but add an explicit note immediately after the code span
explaining why the trailing space is required (e.g., "Note: trailing space after
the prefix is intentional to separate from title text"). Update the AGENTS.md
text where the prefix appears to use one of these fixes and ensure any examples
or title instructions reference the corrected code span.
🧹 Nitpick comments (1)
.husky/commit-msg (1)

1-1: Quote $1 to prevent word splitting.

Shell best practice — if the temp file path ever contains spaces, this would break.

Proposed fix
-pnpm exec commitlint --edit $1
+pnpm exec commitlint --edit "$1"

Comment thread AGENTS.md Outdated
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/web/package.json (1)

18-22: Consider adding @types/react-dom.

react-dom is listed as a dependency (Line 16), but only @types/react is included in devDependencies. Adding @types/react-dom would provide proper type coverage for DOM-related React APIs.

Proposed fix
   "devDependencies": {
     "@eslint/eslintrc": "^3.0.0",
     "@types/react": "^19.0.0",
+    "@types/react-dom": "^19.0.0",
     "eslint": "^9.0.0",
     "eslint-config-next": "^15.0.0"
   }

@tataihono tataihono changed the title fix: enforce GitHub agent workflow chore(tooling): enforce GitHub agent workflow Feb 15, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/workflows/issue-labels.yml Fixed
Comment thread .github/workflows/issue-labels.yml Fixed
tataihono and others added 3 commits February 16, 2026 11:10
…ble solutions

Co-authored-by: Cursor <cursoragent@cursor.com>
…sue template

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@tataihono
Copy link
Copy Markdown
Contributor Author

PR feedback resolution

Workflow permissions (github-advanced-security): Fixed. Added permissions: issues: write to issue-labels.yml.

Quote $1 in .husky/commit-msg (CodeRabbit): Fixed. Changed to pnpm exec commitlint --edit "$1" to prevent word splitting.

@types/react-dom (CodeRabbit): Fixed. Added to apps/web devDependencies for proper type coverage.

AGENTS.md MD038 / [Context] (CodeRabbit): Obsolete—template now uses type(scope): format. No action needed.

tataihono and others added 3 commits February 16, 2026 11:16
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@tataihono tataihono self-assigned this Feb 15, 2026
@tataihono tataihono merged commit a1bbb9d into main Feb 15, 2026
8 checks passed
Ur-imazing added a commit that referenced this pull request Apr 20, 2026
- ContentRail: pass `item` directly through the renderItem hooks
  callback instead of looking it up via `data[index]` at callback
  time. FlatList fires focus callbacks asynchronously and an Apollo
  cache update can shrink `data` between render and callback, making
  `data[index]` undefined and crashing consumers that read
  `item.documentId` (review finding P1 #2).
- ContentRail: remove the redundant wrapper `<View onFocus>`. Both
  the wrapper and `hooks.onFocus` resolved to the same
  `handleItemFocus(index)` call and fired per focus event on
  platforms where View focus bubbles, double-dispatching the
  debounce timer and any analytics (P2 #5).
- ContentRail: drop the dead `focusMemory` module-level Map. It was
  written on every focus event but never read anywhere in the
  codebase (P3 #11).
- HomeHero: move `HeroEntry` type to module scope — it was defined
  inside the component body for no reason (P3 #16).
- HomeScreen: replace the local `COLORS` constant that duplicated
  `src/lib/colors.ts`. Every other component imports the canonical
  tokens; `index.tsx` was silently drifting (P3 #13).
- HomeScreen: derive `effectiveCommittedId = committedId ??
  homepageExperience?.documentId ?? null` so the hero renders on
  first paint rather than waiting for the `useEffect`-seeded
  `committedId`. Previously a blank hero surface flashed for
  ~50-100ms on cold mount while the effect fired (P2 #9).
- HomeScreen: depend the accessibility-announce effect on
  `hero?.id`/title/subtitle rather than the `hero` object reference,
  so Apollo cache re-normalisations that produce a new object
  identity for the same experience don't trigger spurious re-runs
  (P3 #14).
- queries.ts: remove orphaned comment referencing GET_WATCH_EXPERIENCE
  that trailed LIST_EXPERIENCES after the surrounding context was
  moved (P3 #12).
Ur-imazing added a commit that referenced this pull request Apr 20, 2026
…803)

* feat(tv): include VideoHero block in LIST_EXPERIENCES for focus-driven hero

Extend the home-screen listing query to carry each experience's first
ComponentSectionsVideoHero block alongside its lightweight fields so
the rail-driven hero can swap without a second round-trip per focus
change. Non-VideoHero blocks are returned with __typename only.

Plan: docs/plans/2026-04-17-001-feat-tv-focus-driven-hero-plan.md (Unit 1)

* feat(tv): add onItemFocus prop to ContentRail

Surface per-item focus events to rail consumers so the home screen
can drive a focus-driven hero. Preserves existing focusMemory write
behavior; the new prop is optional and additive.

Plan: docs/plans/2026-04-17-001-feat-tv-focus-driven-hero-plan.md (Unit 2)

* feat(tv): focus-driven hero swaps with rail card focus

Rewrite HomeHero to accept a single hero prop and cross-dissolve
between previous and current media via two stacked layers. The Explore
CTA lives in a stable text overlay (not a crossfading layer) so its
first-mount focus claim and identity survive hero swaps. Honors
AccessibilityInfo reduce-motion by snapping between states.

In HomeScreen, replace the dual LIST_EXPERIENCES + GET_WATCH_EXPERIENCE
queries with the extended LIST_EXPERIENCES (now includes each
experience's first VideoHero block). A 300ms trailing-only debounce
timer holds a committed-experience id; rail focus resets the timer,
commit fires on timeout. Initial render seeds the committed id to
isHomepage. Explore CTA targets whichever experience the hero
currently reflects, never a transiently-focused card.

Accessibility: after commit, dispatch AccessibilityInfo.announceForAccessibility
with the new hero's title + subtitle, guarded against duplicate
announcements when focus returns to the already-committed card.

Plan: docs/plans/2026-04-17-001-feat-tv-focus-driven-hero-plan.md (Units 3-6)

* fix(tv): wire rail focus through FocusableCard, not wrapper View

The wrapper View's onFocus does not reliably fire on react-native-tvos
when a nested Pressable inside FocusableCard gains focus, so the
focus-driven hero swap never triggered on real hardware. Pass an
explicit focus hook into renderItem and plumb it straight into
FocusableCard's onFocus prop, where it fires deterministically. Keep
the wrapper View's onFocus as a fallback so existing callers that
don't consume the hook still get focusMemory updates.

* fix(tv): show poster during HLS init to prevent black hero flash

When the focus-driven hero swaps to a new experience's streaming URL,
the native VideoView surface renders black while HLS loads the
manifest, estimates bandwidth, and decodes the first frame — often
200–800ms on TV hardware, and unmaskable on Android TV where the
VideoView punches through the RN view hierarchy.

Render the poster image as a base layer that always paints first, and
only mount the VideoView once the player reports readyToPlay. When
the video is ready, fade it in over the poster in 200ms. The user
never sees a black hero during a swap.

* fix(tv): freeze outgoing video and hold new poster for a seamless swap

Replace the prev/current slot pattern with a keyed layer stack so
each MediaLayer stays mounted across a hero commit. The outgoing
layer now stays where it was — its VideoView keeps painting the
video's last frame during the fade instead of re-mounting against
the outgoing experience's poster image (which was the jarring
mid-transition still the user was seeing).

Pause the outgoing player on deactivate so the painted frame
freezes instead of continuing to animate during the fade.

On the incoming layer, after the player reports readyToPlay hold
the poster visible for 1s, then crossfade to the video over 500ms.
This gives the eye a single stable still between the outgoing and
incoming videos rather than a rapid-fire video→still→still→video
sequence. Reduce Motion skips the hold and snaps.

* chore(tv): shorten hero poster hold to 500ms

* fix(tv): keep focus out of home hero video, add silent focus target on detail

Home: propagate pointerEvents="none" to every wrapper around the
VideoView inside MediaLayer so the TV focus engine doesn't stop on
the native video surface when the user D-pads up out of the rail.
Focus now consistently lands on the Explore CTA whether the hero's
video is playing or still loading as a poster.

Experience detail: the hero was non-focusable, which prevented the
user from scrolling back to the top of the screen once they moved
down into the blocks. Add a full-bleed Pressable behind the text
overlay as a silent focus target — invisible focus state, but
acceptable as a D-pad UP landing spot so ScrollView can scroll the
hero back into view.

* fix(tv): let hero-wide TVFocusGuide redirect rail UP-focus to Explore

The previous guide only wrapped the text container at the bottom of
the hero. When D-padding UP from the rail, focus tried to land
somewhere in the video region above the guide — the native VideoView
caught it and the Explore button was never reached while the video
was mounted.

Wrap the entire hero container in TVFocusGuideView so any upward
focus attempt into the hero area is redirected to the Explore
Pressable, regardless of whether the video is mounted or not.

* fix(tv): let rail receive focus directly on first DOWN from Explore

trapFocusDown=false on the hero TVFocusGuideView so DOWN from Explore
exits the hero region in a single press instead of bouncing off the
guide's bounds once.

* fix(tv): make Explore a sibling of the hero focus guide, not a descendant

Explore Pressable was inside the TVFocusGuideView, so DOWN from
Explore bounced off the guide's bounds once and required a second
press to reach the rail. Move the guide to wrap only the media
layers and gradient so Explore sits as a sibling in the view
hierarchy. The guide still catches upward focus attempts into the
media region and redirects them to Explore, but no longer traps
Explore's own outbound DOWN movement.

* fix(tv): single-press DOWN from Explore to rail via nextFocusDown

trapFocusDown on the hero's TVFocusGuideView didn't prevent Explore's
outbound DOWN from bouncing off the guide's bounds, so the rail
required two presses. Plumb the rail's TVFocusGuideView handle up to
HomeScreen, pass it into HomeHero as `nextFocusDownHandle`, and bind
it to Explore's `nextFocusDown` prop. Focus now crosses directly from
Explore into the rail on a single D-pad press without any guide
bounce. Verified end-to-end via keystroke-driven screenshots on the
tvOS simulator: DOWN → rail focus, UP → Explore focus, DOWN again →
rail focus on first press.

Also switch Explore's ref to a callback ref + state-backed node
handle so the hero's focus-guide destinations always resolve on
first render (React commits refs after render, so a render-time
read of `exploreRef.current` was null initially and left the guide
without a destination).

Route Explore's node handle through a callback-ref so the hero focus
guide has a valid destination from first render.

* docs(tv): add focus-driven hero brainstorm and plan

Brainstorm requirements and implementation plan that drove the
focus-driven hero feature shipped in the preceding tv commits.

* fix(tv): explore wins initial focus and never drops into limbo

Pre-fix: ContentRail's TVFocusGuideView autoFocus claimed initial
focus for the first rail card, overriding the Explore Pressable's
hasTVPreferredFocus. Pressing UP or DOWN from Explore could leave
focus stranded in the hero's non-focusable video region.

Changes:
- Drop autoFocus from ContentRail's TVFocusGuideView so Explore's
  hasTVPreferredFocus wins on first mount.
- Wrap the hero's text overlay in a TVFocusGuideView with
  trapFocusUp + autoFocus so UP from Explore keeps Explore focused
  instead of leaking focus into the video area.
- Plumb the Explore Pressable's native handle up through
  onExploreHandleChange so each rail card gets nextFocusUp routed
  directly to Explore.
- Keep nextFocusDown on Explore so DOWN moves into the rail.
- Make Explore's focused state visible: add a white border + glow
  on top of the 1.08x scale, since tvOS Pressable shadows in the
  button's own primary color were nearly invisible against the
  hero gradient.
- Disable ScrollView scroll on the home screen (content fits and
  scroll attempts were blurring Explore on UP).

* fix(tv): single-press DOWN from Explore while keeping UP trapped

Move the `trapFocusUp` TVFocusGuideView from the text container up
to the hero container. Wrapping only the text container made the
guide's frame tight enough that the first DOWN press got absorbed
by the guide instead of routing through Explore's `nextFocusDown`
to the rail.

Wrapping the entire hero keeps UP trapped inside the hero (no
focusable above Explore in the hero — so UP becomes a no-op), while
DOWN from Explore exits the hero boundary in a single press and
lands on the rail via the `nextFocusDown` handle.

Verified on tvOS simulator via keystroke-driven screenshots:
- Initial: Explore visibly focused
- UP from Explore: stays on Explore
- DOWN from Explore: first press focuses a rail card
- UP from rail: returns to Explore

* fix(tv): harden hero against VideoView hijacking focus while playing

When the hero's native VideoView is actively painting, the tvOS
focus engine treats it as a focus candidate even with
focusable={false} on VideoView itself. This caused UP/DOWN from
Explore to blur into limbo while the video played.

Layer the guards so tvOS reliably skips over the video surface:
- isTVSelectable={false} on every View/Animated.View wrapping the
  media layers inside the hero (outer and inner).
- Self-referencing nextFocusUp on the Explore Pressable so UP from
  Explore resolves back to Explore instead of the video above.
- Retain the outer hero TVFocusGuideView trapFocusUp as a second
  line of defense.

DOWN from Explore now reaches a rail card in a single press even
with the video playing. UP from rail into Explore and initial focus
on Explore are preserved.

* refactor(tv): make the home hero non-interactive, rail owns all focus

Remove the Explore CTA and all the focus-routing plumbing that tried
to coordinate focus between it and the rail. The native VideoView
kept hijacking focus away from any interactive element placed above
the rail, which forced an ever-growing list of guards (nextFocusUp
self-references, TVFocusGuideView trapFocusUp wrappers,
isTVSelectable={false} on every wrapper, state-backed node handles,
etc.) and still wasn't robust once the video was painting.

New model:
- HomeHero is now purely presentational — title, subtitle, and the
  crossfading video/poster. No Pressable, no focus concerns.
- ContentRail's TVFocusGuideView autoFocus claims initial focus on
  the first card. The user navigates experiences by D-padding the
  rail and pressing Select to open the focused experience.
- Drop the Explore-related props from HomeHero/ContentRail/FocusableCard
  (nextFocusUp, nextFocusDown, onExploreHandleChange,
  onFocusHandleChange, itemNextFocusUp).
- Re-enable ScrollView scroll on the home screen.

Net effect: something is always visibly focused (the rail auto-focuses
on mount), the video hero no longer competes for focus, and the
whole focus system is dramatically simpler.

* fix(tv): apply code review safe_auto findings

- ContentRail: pass `item` directly through the renderItem hooks
  callback instead of looking it up via `data[index]` at callback
  time. FlatList fires focus callbacks asynchronously and an Apollo
  cache update can shrink `data` between render and callback, making
  `data[index]` undefined and crashing consumers that read
  `item.documentId` (review finding P1 #2).
- ContentRail: remove the redundant wrapper `<View onFocus>`. Both
  the wrapper and `hooks.onFocus` resolved to the same
  `handleItemFocus(index)` call and fired per focus event on
  platforms where View focus bubbles, double-dispatching the
  debounce timer and any analytics (P2 #5).
- ContentRail: drop the dead `focusMemory` module-level Map. It was
  written on every focus event but never read anywhere in the
  codebase (P3 #11).
- HomeHero: move `HeroEntry` type to module scope — it was defined
  inside the component body for no reason (P3 #16).
- HomeScreen: replace the local `COLORS` constant that duplicated
  `src/lib/colors.ts`. Every other component imports the canonical
  tokens; `index.tsx` was silently drifting (P3 #13).
- HomeScreen: derive `effectiveCommittedId = committedId ??
  homepageExperience?.documentId ?? null` so the hero renders on
  first paint rather than waiting for the `useEffect`-seeded
  `committedId`. Previously a blank hero surface flashed for
  ~50-100ms on cold mount while the effect fired (P2 #9).
- HomeScreen: depend the accessibility-announce effect on
  `hero?.id`/title/subtitle rather than the `hero` object reference,
  so Apollo cache re-normalisations that produce a new object
  identity for the same experience don't trigger spurious re-runs
  (P3 #14).
- queries.ts: remove orphaned comment referencing GET_WATCH_EXPERIENCE
  that trailed LIST_EXPERIENCES after the surrounding context was
  moved (P3 #12).

* feat(tv): focus-driven hero swaps to selected experience's video

Home hero now renders the VideoHero block of whichever Experience card
is focused in the rail, so the background poster/video tracks D-pad
selection instead of being hard-coded. Adds compile-time asserts that
gql.tada's discriminated union for the blocks dynamic zone did not
collapse to `never`, since that failure mode is silent under tsc.
Documents the intentional divergence from the mobile LIST_EXPERIENCES
shape so future sync passes don't clobber the per-experience hero data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(solutions): capture TV focus-driven hero patterns and refresh the cluster

Adds a new best-practices learning documenting the patterns that came out
of PR #803 (non-interactive hero, rail-owns-focus, poster-hold during HLS
source swap, compile-time `never`-collapse assert for gql.tada dynamic
zones), and refreshes four adjacent docs whose guidance was either
superseded or incomplete in light of it:

- ui-bugs/tv-videoview-steals-dpad-focus — Prevention superseded the
  "wrap the hero in TVFocusGuideView" recommendation for hero-above-rail
  layouts; now points at the new learning.
- ui-bugs/tv-video-hero-blank-autoplay — adds a "Source swap on focus
  change" section covering the poster-hold technique.
- best-practices/react-native-tvos-porting-pitfalls — adds Pitfall 6
  (background VideoView + focusable siblings).
- best-practices/expo-tv-platform-setup-sdui-monorepo — Section 3 notes
  the LIST_EXPERIENCES divergence; Section 6 adds the rail-owns-focus
  pattern. Also drops a duplicate last_updated key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ur-imazing added a commit that referenced this pull request May 7, 2026
…nit 3) (#902)

* chore(admin): add seed-easter and run-experience-dump dev scripts

- seed-easter: insert Easter experience into admin's local Postgres,
  translated block-for-block from apps/cms/src/bootstrap/seed-easter.ts
  to admin's BlockSchema. Local-dev parity with Strapi's Easter; UI/E2E
  fixture without depending on Strapi or the R3 dump pipeline.
- run-experience-dump: workstation wrapper around the R3
  experience-content-dump service. Bypasses the GraphQL trigger +
  workflow runtime for local dev. Mirrors run-embeds.ts pattern.

Both are dev tooling alongside existing run-embeds, run-sync,
pull-mapping. Neither ships to production runtime.

* docs(brainstorms,plans): consumer-migration brief + Unit 3 dual-client plan

Brief and plan that this PR implements:

- docs/brainstorms/2026-05-05-consumer-migration-implementer-brief-requirements.md
  Web-implementer's brief for the Strapi to admin consumer-side migration.
  Captures the 7-unit migration shape, dual-client decision, parity-driven
  per-route flag rollout, and (as of 2026-05-07) the single-owner shape
  after tatai handed full execution + decision authority over.

- docs/plans/2026-05-05-001-feat-dual-client-codegen-unit-3-plan.md
  Implementation plan covering brief Unit 2 (admin SDL emit + drift CI)
  and Unit 3 (packages/graphql multi-schema codegen + adminGraphql factory
  + type-isolation test). All five technical decisions resolved inline
  (factory naming, schema-source convention, scalar mappings, CI shape,
  Turbo wiring). Implementation lands as the subsequent commits on this
  branch.

* feat(admin): emit deterministic SDL via Pothos printSchema (U2)

- apps/admin/src/scripts/print-schema.ts: emits admin's Pothos schema
  to apps/admin/schema.graphql via printSchema(lexicographicSortSchema(
  builder.toSchema())). Strips Pothos plugin directives (e.g.
  scope-auth's @authScopes) post-print so gql.tada's tsconfig parser
  can consume the SDL — auth is enforced at the resolver, not declared
  in the SDL surface consumers see.
- apps/admin/schema.graphql: initial committed SDL artifact, 865 lines.
  Mirrors apps/cms/schema.graphql's role: the contract handoff between
  admin's TypeScript schema (Pothos) and packages/graphql's typed
  client (gql.tada).
- apps/admin/package.json: adds schema:print script.
- turbo.json: adds schema:print task with outputs declaration.

Output verified byte-identical across consecutive runs (deterministic).
The committed SDL is consumed by packages/graphql's gql.tada codegen
in U3; consumers never introspect admin's running server (production
introspection is disabled via @envelop/disable-introspection).

Per docs/plans/2026-05-05-001-feat-dual-client-codegen-unit-3-plan.md
(U2).

* fix(admin): exclude schema.graphql from prettier (U2)

apps/cms/schema.graphql is already in .prettierignore for the same
reason — Strapi's GraphQL plugin auto-emits canonical SDL, and prettier
reformatting GraphQL docstrings (single-line to multi-line) breaks the
re-generation byte-equality the schema:print drift CI check depends on.

Apply the same exclusion to:
- apps/admin/schema.graphql: emitted by schema:print
- packages/graphql/src/admin-graphql-env.d.ts: will be emitted by
  gql.tada codegen in U3 (added preemptively to mirror the existing
  graphql-env.d.ts entry)

Re-emitted apps/admin/schema.graphql to its canonical (un-prettied)
shape so the drift CI check (next commit) reads zero.

* ci(admin): add admin-schema-drift job (U2)

Mirror the existing graphql-generate job (lines 78-95) for admin's
SDL artifact. Job runs:

  pnpm turbo run schema:print --filter=@forge/admin
  git diff --exit-code apps/admin/schema.graphql

Gated on @forge/admin appearing in the affected outputs from Turbo's
--affected detection, so the job only fires on PRs that actually touch
admin. Catches the case where a Pothos schema change is merged without
regenerating apps/admin/schema.graphql, which would silently make
consumer codegen wrong.

Per docs/plans/2026-05-05-001-feat-dual-client-codegen-unit-3-plan.md
(U2).

* feat(graphql): multi-schema codegen for Strapi + admin (U3)

Configure packages/graphql for two introspection targets via gql.tada's
multi-schema config shape — `schemas: [...]` array in tsconfig with
per-entry name, schema path, and tadaOutputLocation. Each schema is
emitted to its own .d.ts file with a structurally-distinct `name`
property in the introspection type, which is what gives downstream
factories compile-time type isolation (U5 verifies).

Changes:

- packages/graphql/tsconfig.json: switches the gql.tada plugin entry
  from single `schema` to `schemas: [{ name: 'strapi', ... }, { name:
  'admin', ... }]`.
- packages/graphql/package.json: adds @forge/admin: workspace:* to
  devDependencies, mirroring the existing @forge/cms entry. Codegen
  reads admin's committed SDL via the workspace-relative path; admin
  itself is not a runtime dependency.
- packages/graphql/src/admin-graphql-env.d.ts: NEW. Generated admin
  introspection types (43KB). Committed alongside the source change
  per repo convention (graphql-env.d.ts is committed too).
- packages/graphql/src/graphql-env.d.ts: regenerated against the
  upstream Strapi schema refresh that landed in main; behavior-
  preserving for existing callsites.
- turbo.json: extends the existing `generate` task — adds
  apps/admin/schema.graphql to inputs (alongside apps/cms's) and
  src/admin-graphql-env.d.ts to outputs. NO dependsOn edge: matches
  the existing Strapi pattern where the SDL author runs schema:print
  manually and CI's admin-schema-drift job catches drift if they
  forget. dependsOn would re-build admin's full Pothos schema on every
  local generate run.
- pnpm-lock.yaml: registers the new workspace link for @forge/admin.

R4 verified: apps/web typecheck passes against the new dual-target
codegen output. The adminGraphql() factory itself lands in U4.

Per docs/plans/2026-05-05-001-feat-dual-client-codegen-unit-3-plan.md
(U3).

* feat(graphql): adminGraphql factory + dual-client docs (U4)

- packages/graphql/src/admin.ts: new factory module mirroring graphql.ts.
  Exports adminGraphql = initGraphQLTada<{ introspection }>() bound to
  admin's introspection types. Re-exports gql.tada's FragmentOf /
  ResultOf / VariablesOf as AdminFragmentOf / AdminResultOf /
  AdminVariablesOf so call sites visually distinguish admin types from
  Strapi types.
- packages/graphql/src/index.ts: re-exports both factories side-by-side.
  Strapi side keeps bare names (graphql, ResultOf, ...); admin side
  carries the Admin* prefix.
- packages/graphql/package.json: adds ./admin subpath export, parity
  with the existing ./graphql subpath.
- packages/graphql/AGENTS.md, CLAUDE.md: rewritten to cover the
  dual-client conventions — which factory to use when, the auth-posture
  convention with the acknowledged compile-time-guard gap, the
  multi-schema generation flow (Strapi + admin), the U5 type-isolation
  test path as the AE1 enforcement mechanism, and the live-introspection
  rationale (admin disables it via @envelop/disable-introspection).

The dual-client is temporary scaffolding. When Strapi is decommissioned
and all consumer routes have moved to admin, this package collapses
back to single-target admin (adminGraphql renames to graphql; the
Strapi factory + types + graphql-env.d.ts get deleted in one PR).

Per docs/plans/2026-05-05-001-feat-dual-client-codegen-unit-3-plan.md
(U4).

* test(graphql): type-isolation test enforcing AE1 (U5)

Compile-time enforcement that mixing Strapi-typed and admin-typed
GraphQL result values fails TypeScript compilation (the AE1 origin
acceptance criterion).

The test runs as part of pnpm --filter @forge/graphql typecheck — no
runtime test runner needed; it's a pure type-level check via
@ts-expect-error directives. Two negative cases assert cross-schema
assignment fails; two positive cases assert same-schema assignment
compiles clean. If a directive is misplaced — missing from a negative
case OR present on a positive case — typecheck fails by design, which
is what makes the test meaningful (not a vacuous tautology).

Query selection (critical): the two negative cases use queries against
schema-EXCLUSIVE root fields — bibleBook(documentId) on Strapi (admin
has no bibleBook root query) and experienceBySlug(locale, slug) on
admin (Strapi has no experienceBySlug root query). The resulting
ResultOf<...> shapes have NO overlapping property names, so structural
typing rejects the cross-assignment. An inline file-header rule warns
future contributors against substituting structurally-overlapping
queries — that would silently make the directives unused and drop the
real guard.

Mutation-tested locally: deleting the @ts-expect-error on the negative
case re-surfaces a real 'Property X is missing in type Y' error,
confirming the directive is gating a real check.

Per docs/plans/2026-05-05-001-feat-dual-client-codegen-unit-3-plan.md
(U5).

* fix(graphql,admin): address review findings — Phase A safe_auto fixes

Applied seven mechanical fixes from the multi-agent code review on this
branch:

- ci.yml: extend graphql-generate's git diff --exit-code to also check
  packages/graphql/src/admin-graphql-env.d.ts (review finding #1).
  Without this, an admin Pothos change that regenerates one .d.ts but
  not the other lands silently.

- packages/graphql/src/admin.ts: re-export readFragment so the ./admin
  subpath surface matches ./graphql (review finding #3). readFragment
  is schema-agnostic in gql.tada — works for both factories.

- apps/admin/src/scripts/print-schema.ts: anchor the output path to the
  script's own location via fileURLToPath(import.meta.url), not
  process.cwd() (review finding #5). Wrap writeFileSync in try/catch
  with structured stderr output and exit(1) on failure (review finding
  #7). cwd-based resolution silently misplaced schema.graphql when
  invoked outside pnpm/Turbo; an unhandled write error produced a
  confusing git-diff failure rather than a named cause.

- turbo.json: declare explicit inputs on the schema:print task so it
  invalidates only on Pothos source / printer changes, not on every
  apps/admin file touch (review finding #8). Mirrors the existing
  generate task's inputs declaration.

- apps/admin/package.json: wire run-experience-dump as a pnpm script
  alongside run-embeds, run-sync, etc. (review finding #12). The script
  was committed earlier without a discoverable invocation path.

- CLAUDE.md (root): split the GraphQL Change Flow into separate Strapi
  and admin sub-flows (review finding #9). The single-flow version still
  documented only the Strapi path post-PR, so an agent reading root
  context wouldn't know schema:print exists.

- apps/admin/AGENTS.md: add 'SDL emission for consumer codegen' and
  'Local-dev scripts' sections so an agent scoped to the admin package
  can discover schema:print and the other tsx-script tools (review
  finding #9). Mirrors how packages/graphql/AGENTS.md was updated in
  the U4 commit.

R4 verification: packages/graphql + apps/admin typecheck pass.
schema:print output unchanged (drift-clean after script-relative path
change, confirmed by repeated invocation).

* fix(graphql,admin): address review findings — Phase B engineering decisions

Three findings that needed engineering judgment, all from the
multi-agent code review:

- print-schema.ts (review finding #2): replace regex-based directive
  stripping with AST round-trip via graphql-js parse/visit/print. The
  regex had three concrete failure modes flagged across six reviewers:
  (a) [^)]* in @authScopes(...) substring strips DESCRIPTION text that
  legitimately contains the pattern, silently corrupting committed SDL;
  (b) only handled single-line directive declarations + a narrow
  two-line on continuation; (c) broke on nested parens in directive
  args. The AST visitor removes nodes by kind+name, structurally aware,
  with no formatting heuristics. Pothos directive set is a single
  POTHOS_DIRECTIVE_NAMES Set extended as new plugins land.
  apps/admin/schema.graphql regenerated against the new stripper —
  output is byte-deterministic across runs (verified twice). Output
  shrinks from 25419 to 25369 chars due to graphql-js print's canonical
  formatting (no semantic difference).

- dual-client.types.ts (review finding #4 + #11): expand AE1 coverage
  and clarify what the test actually exercises. Adds positive +
  negative cross-assignment cases for AdminFragmentOf/FragmentOf and
  AdminVariablesOf/VariablesOf — the original test only covered ResultOf,
  but R1's 'both factories independently typed' invariant should hold
  across all three utility types. File header rewritten to be honest
  about the mechanism: structural distinctness via schema-exclusive
  fields, NOT nominal factory branding (the AdminResultOf / etc.
  aliases carry no nominal information beyond the underlying ResultOf;
  AdminResultOf<typeof STRAPI_QUERY> would compile cleanly today).
  Header also explicitly notes what the test does NOT exercise:
  factory-level query rejection (which gql.tada represents via an
  error variant of the document type and only surfaces at consumer
  call sites), and 'as' casts (which deliberately bypass the structural
  check; documented as an as-cast ban in AGENTS.md / CLAUDE.md with an
  ESLint rule as a follow-up).

Mutation-tested: deleting any of the six @ts-expect-error directives
in negative cases re-surfaces the underlying type error, confirming
the directives gate real checks.

Review finding #6 (eager imports of the full mutation graph at SDL
emit time) is deferred — the architectural restructure of builder.ts
to lazy-load services is out of scope for this fix pass. Current setup
works because skipValidation: !!process.env.CI bypasses zod env
validation in CI; documented as residual risk in the review report
and surfaces if env validation tightens or local-dev friction
increases.

* docs(roadmap): feat-120 decouple admin SDL emit from runtime graph

Defers review finding #6 from the multi-agent code review on
feat/dual-client-codegen-unit-3 to a tracked roadmap ticket. The
finding flags that print-schema.ts triggers Prisma client construction
+ env validation at SDL emit time via the side-effect import chain
through @/graphql/schema.

Current behavior is correct because:

- CI sets CI=true, which triggers skipValidation: !!process.env.CI in
  apps/admin/src/config/env.ts:154 (zod skips).
- Local dev contributors keep a populated .env via pnpm fetch-secrets.

Two latent risks the ticket captures:

1. Future side-effect imports propagate. A new service that validates
   a third-party API key on module load gets triggered every time
   schema:print runs in CI, including in PRs unrelated to that
   service.
2. The skipValidation bypass is brittle. If env-validation policy
   tightens, the admin-schema-drift CI job breaks until the decoupling
   lands.

The ticket sketches two candidate strategies (dedicated build-schema
module vs conditional builder construction) and locks the verification
criteria, including byte-identical SDL output and a regression test
that asserts no Prisma client construction at module load.

* docs(solutions): compound learning — dual-client gql.tada multi-schema pattern

Captures the reusable architectural pattern surfaced by this PR's work:
how to bridge a Pothos-defined TypeScript schema in one app to gql.tada-typed
consumer code in other apps via a committed SDL artifact, including the six
sub-decisions and one compile-time test that compose the pattern.

Sub-decisions documented:

1. Multi-schema gql.tada config (`schemas: [...]` array with named
   entries; `name` discriminator is load-bearing for type isolation).
2. Two factories, one package (graphql + adminGraphql side-by-side;
   subpath exports for narrow imports; readFragment shared as
   schema-agnostic).
3. Committed SDL artifact via Pothos printSchema (lexicographicSortSchema
   for determinism; .prettierignore exclusion to avoid drift fights;
   script anchored via fileURLToPath, not process.cwd).
4. AST-based directive stripping via graphql-js parse/visit/print, NOT
   regex (regex has three concrete failure modes: description corruption,
   multi-line directive declarations, nested parens).
5. Inputs-based Turbo wiring, NOT dependsOn (avoids re-running the full
   Pothos schema build on every local generate; relies on schema-author
   discipline + CI drift gate).
6. Drift CI job mirroring graphql-generate (gated on Turbo affected
   detection; covers both directions: stale committed SDL AND missing
   regen of consumer .d.ts files).
7. Compile-time AE1 type-isolation test using @ts-expect-error on
   schema-exclusive selection sets, with three pitfalls called out:
   prose containing the literal pattern accidentally creates spurious
   directives (use block comments instead); typeof-only bindings need
   explicit eslint-disable in this repo's config; queries must select
   schema-exclusive fields or the structural distinctness mechanism
   doesn't fire.

Cross-references three related docs (expo-graphql-schema-drift,
mocked-shape-vs-real-contract-discipline, nextjs-route-shape-migration);
follow-up cross-reference additions in those docs are noted as
recommended next steps but deliberately not included in this commit
to keep the doc-write atomic.

Origin: PR #902 (this branch). Brief: docs/brainstorms/2026-05-05-...
Plan: docs/plans/2026-05-05-001-feat-dual-client-codegen-unit-3-plan.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Kneesal added a commit that referenced this pull request May 12, 2026
* feat(admin): semantic-search eval harness Phase 1 (foundation)

First phase of the eval harness from
docs/plans/2026-05-07-001-feat-semantic-search-eval-harness-plan.md.

Lays the foundation modules every later unit compiles against:
- Hard-coded HARNESS_LOCALES (top 30 by Core labeledVideoCounts) +
  QUICK_LOCALES + per-locale judge-confidence tier
- Shared types: Verdict (six values), Outcome (discriminated-union),
  Fingerprint, Baseline, RunReport — re-exports admin's SearchResult
- Search client: GET /api/search wrapper with codepoint-safe snippet
  truncation (200 cap), typed SearchClientError discriminating
  rate_limited / validation / server_error / transport / timeout /
  validation_failed
- Judge client: OpenRouter chat-completions wrapper with
  response_format json_schema + Zod re-parse, 5xx/429/transport
  retry honoring Retry-After (capped 30s), per-attempt 45s
  AbortSignal.timeout. Default model anthropic/claude-haiku-4-5
  override via OPENROUTER_JUDGE_MODEL
- Fingerprint reader: single combined prisma.$queryRaw for
  video_scene_locale + video_transcript_chunk + experience_locale
  with embedding-IS-NOT-NULL gates. compareFingerprints() returns
  human-readable drift summary
- Env vars added to src/config/env.ts: OPENROUTER_JUDGE_MODEL,
  EVAL_JUDGE_CONCURRENCY, EVAL_SEARCH_CONCURRENCY, ADMIN_BASE_URL

49 unit tests across 4 modules. Typecheck + lint clean.

Origin: docs/brainstorms/2026-05-06-semantic-search-eval-harness-requirements.md

* feat(admin): semantic-search eval harness Phase 2 (data inputs)

Adds the input-layer modules that feed the eval engine:

- query-generator.ts: corpus-blind synthetic query generation via
  OpenRouter. createQueryGenerator() wraps a chat-completions call
  with response_format json_schema + Zod re-parse, returns deduped/
  trimmed query lists. createSyntheticQueryLoader() persists
  generated queries to apps/admin/eval/synthetic-queries/{locale}.json
  so re-runs use the same set against the same baseline.
  Explicit regenerate command — never silent rebuild.
- regressions.ts: loader for apps/admin/eval/regressions.json.
  Hand-edited entries: [{locale, query, notes, addedAt, addedBy}].
  Zod-validated; allowMissing=true defaults to [] when no
  regressions are authored yet. Friction-free compounding loop —
  every bug found in the wild becomes a permanent test entry by
  appending one JSON object.
- apps/admin/eval/regressions.json: starter file with a $schema-doc
  comment header documenting the schema.

20 new unit tests (11 query-generator + 9 regressions). All 69
search-eval tests across 6 modules pass. Typecheck + lint clean.

* feat(admin): semantic-search eval harness Phase 3 (engine)

Adds the eval engine that composes all the foundation pieces:

- baseline.ts: load/save/getQueriesForRun/detectDrift. Atomic writes
  via .tmp + rename so a crash mid-write never leaves a half-written
  baseline. Zod-validated on both read and write so a save can never
  produce a baseline that wouldn't load.
- calibration.ts + apps/admin/eval/calibration.json: 10 hand-labeled
  cases (4 obvious-A-wins, 3 ties, 2 both-irrelevant, 1 prodigal-son).
  runCalibration() returns CalibrationReport with passed/matched/
  total + per-case detail. ≥80% match threshold; sub-threshold runs
  emit `event=judge_calibration_failure` log line at warn level.
  Judge errors count as case failures (don't abort).
- runner.ts: end-to-end orchestrator. Loads baseline → reads
  fingerprint → detects drift → runs calibration → searches admin
  for current results (p-limit search pool, default 4) → judges
  pairwise with A/B swap (p-limit judge pool, default 8) → collapses
  to win/loss/tie/both-irrelevant/judge-disagreement → aggregates
  totals + per-locale + cost + snippet-improvement heuristic.
  Search failures are logged as ties (run continues).
- types.ts: re-export Tier from locales.ts so consumers don't have
  to know which file owns it.

40 new tests (13 baseline, 12 calibration, 15 runner). All 109
search-eval tests across 9 modules pass. Typecheck + lint clean.

* feat(admin): semantic-search eval harness Phases 4+5 (reporter + CLI)

Completes the harness end-to-end:

- reporter.ts: writeRunJson() persists per-run reports under
  apps/admin/.tmp/eval/runs/{runId}.json. renderConsoleSummary()
  produces the user-facing summary: header, headline net win rate,
  drift warning (if any), calibration PASS/FAIL, per-locale table
  sorted by |net win rate|, top-10 regressions sorted by judge
  confidence (▼▼ = clearly, ▼ = slightly), snippet-improvement
  caveat (if heuristic triggered), cost, JSON file path. 15 tests.
- eval-search.ts: thin CLI entry. Subcommands run | rebaseline
  | regenerate-queries | calibrate. Mirrors run-embeds.ts shape
  (lazy imports, JSON-line stdout logs, SIGINT/SIGTERM handler
  with prisma disconnect, exit 130 on signal). pnpm scripts:
  eval:search, eval:search:quick, eval:search:full,
  eval:search:rebaseline, eval:search:regenerate-queries,
  eval:search:calibrate.
- paths.ts: shared anchor module that resolves all default file
  paths via import.meta.url instead of process.cwd(). Lets the
  harness work whether invoked from repo root or apps/admin/.
- runner.ts: added baselineOverride injectable so tests can skip
  the disk read.

15 reporter tests + CLI smoke tested. All 124 search-eval tests
pass across 10 modules. Typecheck + lint clean.

Harness is now runnable end-to-end. Next steps (post-merge):
1) eval:search:rebaseline to capture baseline against a stable admin
2) iterate on synthetic-query prompts per locale
3) empirically validate the --quick locale set
4) hand-tune calibration cases as drift appears

* docs(plan): mark eval-harness plan completed

* refactor(admin): apply ce-review safe_auto fixes to eval harness

Applies ~10 review findings from /ce:review (interactive mode,
gated_auto P1s deferred for human decision):

P2 #3 — Wire EVAL_JUDGE_CONCURRENCY + EVAL_SEARCH_CONCURRENCY through
the CLI to runEval (previously decorative). Add EVAL_GIT_SHA to env.ts
and replace process.env.GIT_SHA reads (AGENTS.md compliance).

P2 #4 — Remove dead code: RunEvalOptions.syntheticQueryLoader and
loadRegressionsImpl options + their imports were never read. Remove
baseline.ts getQueriesForRun + RunFilter (runner reimplements with
same logic). Delete the 4 corresponding tests.

P2 #5 — JSON-Schema vs Zod drift: judge.ts schema now declares
rationale.minLength=1, maxLength=1000 to match Zod. query-generator.ts
schema now declares minItems=1, maxItems=200, items.minLength=1,
items.maxLength=200 to match Zod, and Zod relaxed from .min(2) to
.min(1) to accept single-codepoint CJK queries.

P2 #6 — Extract shared SearchResultSchema, SearchResponseSchema,
FingerprintSchema into src/services/search-eval/schemas.ts. baseline.ts
and calibration.ts now import the canonical schema. Replace
search-client.ts hand-rolled isSearchResponse/isSearchResult with
SearchResponseSchema.safeParse — three representations of one contract
collapse to one source of truth.

P2 #7 — Widen CalibrationCase.locale from HarnessLocale to string
(operator-authored cases legitimately target out-of-set locales).
Drop the `as unknown as CalibrationCase[]` cast that papered over the
type mismatch.

P2 #8 — Add sanitizeForLog() in runner.ts that strips CR/LF/TAB. Apply
to query, locale, and error message in event=search_error log line +
event=fingerprint_drift_detected. Per the project's documented
log-injection-sanitizer learning.

P2 #9 — Add SQL invariant test in fingerprint.test.ts: asserts
WHERE embedding IS NOT NULL appears 6× and experience_locale uses
status='published'. Catches a refactor that drops the gates.

P2 #11 — Rebaseline now uses pLimit + Promise.allSettled for
loadOrGenerate fan-out across 30 locales. One transient OpenRouter
failure no longer aborts the whole batch.

P3 cleanups — Three `as Judge`/`as PrismaClient` casts in runner.ts
replaced with narrowing local consts + a readCurrentFingerprint helper.
Token accumulation now collected onto JudgeAttempts and reduced once
(removes shared mutable state across pLimit workers). reporter.ts
truncate() now codepoint-safe (Array.from). Type predicate replaces
broad cast in renderTopRegressions. SearchClientError code
"validation_failed" → "response_invalid" for clarity. saveBaseline
unlinks .tmp on rename failure. Add .strict() to JudgeResponseSchema.
Extract extractMessageContent + extractTokenCounts + safeReadBody to
shared openrouter-helpers.ts (was duplicated between judge.ts and
query-generator.ts). LOCALE_TIER and QUICK_LOCALES now use
`as const satisfies` for compile-time typo safety. CLI rejects
conflicting flag combinations (--quick + --locale, --full + --quick)
instead of silently picking one. Add the exact curl + jq refresh
command to HARNESS_LOCALES doc-comment. Add 2026-05-07 date stamp to
Haiku pricing constants.

Tests pass (121 search-eval, 1625 admin total). Typecheck + lint
clean. Deferred for human decision: P1 search-client rate-limit
retry, P1 path-traversal validation, P2 judgeDisagreements counter
double-count, P2 Promise.all → allSettled in runner, P2 search-error
synthetic-tie kind.

* fix(admin): apply ce-review P1 fixes (rate-limit retry + locale validation)

P1 #1 — Search-client now retries on 429 / 5xx / transport / timeout
to match judge.ts. Mirrors the same retry shape: max 3 attempts,
Retry-After honored capped at 30s, exponential backoff (500ms /
1s / 2s), per-attempt AbortSignal.timeout (45s default). Fixes the
documented rate-limit mismatch where a full run would have flooded
admin's 30/min cap and silently corrupted baseline data with empty
search results. 400 / 4xx other than 429 remain non-retryable.
Structured retry log: `event=search.retry attempt=N status=N wait_ms=N`.
Added 8 tests covering retry success on 5xx + 429, Retry-After
honoring + 30s cap, transport-error retry, no-retry-on-400, single-shot
mode, structured log assertion.

P1 #2 — Path-traversal guard for `--locale=<arg>`: query-generator
now validates locale against a strict BCP-47 regex before joining it
into a filename. `../foo`, `../../etc/passwd`, `en/..`, `en\foo`,
whitespace, control characters, and shell metacharacters all reject
with `QueryGeneratorError(code: "validation")` BEFORE any filesystem
op. Canonical forms (`en`, `pt-PT`, `zh-Hans`, `es-419`, `fil`) pass.
Added 4 tests including a regression for "never invokes the
generator when locale is invalid" — confirms the guard short-circuits
before any external work.

133 search-eval tests across 10 modules pass. Typecheck + lint clean.
Both P1s now closed; remaining gated_auto findings (P2 judgeDisagreements
counter, Promise.all → allSettled, search-error outcome kind) still
deferred for human decision.

* docs(solutions): capture external-client retry-parity learning from PR #922

Documents the bug class caught during /ce:review: when >=2 external
clients participate in the same runner fan-out (pLimit, Promise.all),
they MUST share the same retry contract. If one peer has retry and
another doesn't, the runner's per-item try/catch absorbs failures
silently, the run reports success with exit 0, and the persisted data
is corrupted with no loud error.

Concrete instance: search-client.ts had no retry on 429/5xx/transport
while sibling judge.ts had full retry. Both invoked under pLimit() from
runner.ts. Admin's /api/search is 30/min/IP rate-limited; default
search concurrency (4) on a 1500-query run = 40x over cap. Without
retry, ~98% of queries would have hit 429; runner records searchError,
judge evaluates empty current-results list, run exits 0 with thousands
of meaningless ties. Fix in commit 803af81.

Adds back-reference one-liners to two related docs:
- parallel-workflow-error-robustness-20260420.md (aggregation axis)
- bounded-parallelism-per-target-workflow-pattern-20260505.md (pLimit
  primitive)

* docs(admin): add eval-harness operational docs

Two complementary surfaces, matching existing admin doc patterns:

apps/admin/CLAUDE.md — new "Semantic search eval harness" section
slotted between manager-enrichment trigger and "Common pitfalls".
Covers architecture (module map + one-paragraph data flow), the
HARNESS_LOCALES decision, full day-zero + day-N runbook with
commands, CLI exit codes, env vars, calibration semantics, drift
detection, and gotchas (read-only against admin, shared OpenRouter
key, retry parity invariant). Mirrors the R1-R5 section pattern.

apps/admin/eval/README.md — collocated quick-reference for anyone
poking around the eval/ directory. Explains each data file
(baselines/, synthetic-queries/, regressions.json, calibration.json),
how operators interact with each, what NOT to commit, and cost
guard rails per run mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore(tooling): enforce GitHub agent workflow

2 participants