feat(web): consumer migration — web cutover branch + verification + runbook (Units 4-9 + UB7)#933
Merged
Ur-imazing merged 14 commits intoMay 13, 2026
Conversation
13fa6f8 to
e94c0b0
Compare
…snapshot (Unit 4)
First commit of PR-B. Per the test-first regression discipline at
docs/solutions/best-practices/test-first-regression-snapshot-byte-identical-default-20260429.md,
the regression snapshot lands first so subsequent units (U5-U9) cannot
silently change strapi-mode behavior.
Changes:
- apps/web/src/env.ts:
* Add WEB_ADMIN_API_KEYS (optional CSV string). Web's SSR sends the
first entry as `Authorization: Bearer` so admin buckets web's
traffic as `consumer:<key>` rather than `public:<railway-egress-ip>`.
Optional per the institutional learning about required-without-
default env vars bricking Railway deploys.
* Keep FORGE_CONTENT_API enum permissive over the four historical
values (strapi / dual-read / admin-with-fallback / admin). The
SOFT-REMOVAL of dual-read and admin-with-fallback happens at the
runtime narrower, NOT the schema validator — stale Doppler configs
don't brick boot.
- apps/web/src/lib/content-api-mode.ts:
* ContentApiMode collapses to `"strapi" | "admin"` (was strapi|dual-read).
* RECOGNIZED_MODES = ["strapi", "admin"]. LEGACY_SOFT_REMOVED_MODES =
["dual-read", "admin-with-fallback"] — these get a distinct
"soft-removed; update your Doppler config" warn before falling back
to strapi. Operators reading logs differentiate "I have a stale
config" from "I have a typo".
* Deletion-checklist docstring updated to point at plan-003 + reflect
that ADMIN_GRAPHQL_URL + WEB_ADMIN_API_KEYS STAY post-Strapi-removal
(they become required, not deleted).
- apps/web/src/lib/content-api-mode.test.ts:
* Rewrote tests for the new closed set. "admin" is now a valid mode
(no warn). "dual-read" and "admin-with-fallback" fall back with the
soft-removed warn. Added a differentiation test verifying the
soft-removed warn message differs from the unknown-value warn.
- apps/web/src/lib/__tests__/content-mode-regression.test.ts (NEW):
* Source-of-truth input/output table. Each row locks "given env value
X, getContentApiMode() returns Y." U5-U9 cannot silently change
these outputs without updating the test in the same commit.
* Type-level contract: ContentApiMode union has exactly 2 members.
Compile-time exhaustiveness check via switch + `never` assertion.
Note: fetchSlugExperience's branch table is NOT touched by U4. The
existing dual-read code path becomes unreachable when `mode === "strapi"`
(legacy values normalize to strapi) and continues to fire as canary-like
behavior when `mode === "admin"`. U6 replaces the admin branch with the
real cutover code (admin-only fetch + WatchPageAdminError throws).
Between U4 merge and U6 merge, the "admin" mode produces
admin-with-fallback-like behavior — acceptable because the env flip
doesn't happen until U6 ships.
Verification:
- pnpm --filter @forge/web typecheck: clean
- pnpm --filter @forge/web test: 32 files, 352 tests + 3 todo, 0 fails
- Regression snapshot covers 5-input matrix: undefined / strapi / admin /
dual-read / admin-with-fallback (the last two soft-removed to strapi).
…rity infra (Unit 5) Ship admin-shape `WatchExperience` fragments in `packages/graphql` for future mobile/TV reuse, extend the renderer dispatch with admin typenames, and update the parity infrastructure (bridge mode guard + normalizer type narrowing + fixture rewrites) for PR-A's typed-blocks schema. **Fragments (20 new files, 739 LOC in `packages/graphql/src/fragments/admin/`):** - Root `WatchExperience` fragment on `ExperienceLocale` using `adminGraphql()`. - 17 per-block-kind fragments + 2 nested-union fragments. Each selects fields available on the matching Pothos type from PR-A. - Re-exported through `packages/graphql/src/index.ts`; new `./admin/fragments` package.json exports entry. **Renderer dispatch (`apps/web/src/components/sections/index.tsx`):** - Adds 17 admin-typename cases alongside the existing Strapi cases. Both paths route through the existing per-kind renderers via a `renderAdminBlock(...)` helper. Additive (not replacement) so the branch stays shippable between U5 and U6 — U6 may remove Strapi cases when `content.ts` flips, or keep both for the cutover window. - `CardBlock` and `VideoRecommendationsBlock` cases emit dev-warn placeholders: CardBlock has no Strapi precedent / no renderer yet; VideoRecommendationsBlock needs videoId→slug hydration that's U6's scope. **Prop-shape audit:** `grep` for `.data?.` / `.attributes?.` in `apps/web/src/components/sections/` returned ZERO matches — Strapi v5's GraphQL plugin already flattens the response, so the wrapped-envelope problem doesn't exist on the renderer side. Relation-accessor diffs surfaced for video / images / imageOverride (`enrichment.ts`, `CarouselVideo.tsx`, `Video.tsx`); admin returns flat `videoId` instead of a joined Video row. All renderer code uses optional chaining with sane fallbacks (`titleOverride`, `imageUrl`), so admin payloads degrade rather than crash. U6 will wire videoId→slug hydration. **`admin-experience.ts` (web): updated, not deleted.** `content.ts` still imports `adminExperienceBySlugOperation` for the dual-fetch orchestration shape. Composed `AdminWatchExperience` so the query typechecks against PR-A's typed-union `blocks` field. U6 may delete when `content.ts` no longer needs the dual-fetch. **Parity-bridge mode guard (`apps/web/src/lib/parity-bridge.ts`):** - Added `isCanaryEmissionEnabled()` helper comparing `getContentApiMode() as string === "dual-read"`. The cast is deliberate: U4 collapsed `ContentApiMode` to `"strapi" | "admin"` so the literal `"dual-read"` can never match by type — the guard intentionally kills canary emission post-cutover. - Guard fires at the entry of `runDualReadComparison(...)`. Admin-mode and strapi-mode requests now emit ZERO `forge.parity.*` log events, so U9's monitoring section doesn't have to disambiguate "real admin failure" from "leftover canary noise." - 2 new tests in `parity-bridge.test.ts` assert admin/strapi modes short-circuit; existing 16 dual-read tests run under a mocked-mode pattern. **Parity normalizer (`packages/graphql/src/parity/normalize-admin.ts`):** - `AdminExperienceLocaleInput['blocks']` narrows from `unknown` to `ReadonlyArray<Block>` (admin's Zod-derived domain type). After PR-A, admin returns typed blocks by construction; the existing `BlocksSchema.safeParse(rawBlocks)` becomes defensive belt-and- suspenders — kept with an explanatory comment for runtime drift. - `normalize-admin.test.ts` — rewrote 3 happy-path block fixtures (mediaCollection, text, videoRecommendations) to typed-union shape matching `BlockSchema.options[N]` exactly. Added `LooseAdminInput` + `toInput()` helper for adversarial fixtures (intentional shape drift to exercise the defensive `safeParse` path). **Cross-package verification:** - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 32 files, 354 tests + 3 todo, 0 fails - pnpm --filter @forge/graphql typecheck: clean - pnpm --filter @forge/graphql test: 8 files, 113 tests, 0 fails - pnpm --filter @forge/graphql generate: both schemas regenerated cleanly - pnpm --filter @forge/admin typecheck (regression check): clean
…earer + cache re-throw (Unit 6)
Wire the actual cutover branch. When `FORGE_CONTENT_API === "admin"`,
web fetches from admin via a bearer-aware Apollo client; admin failures
throw `WatchPageAdminError`; `unstable_cache` re-throws (not caches)
admin errors so they reach UB7's segment error boundary. Strapi sentinel
errors keep their current inline-rendered path.
**admin-client.ts:**
- Module-scope bearer read: `env.WEB_ADMIN_API_KEYS?.split(",")[0]?.trim()`.
Inside `typeof window === "undefined"` guard so client bundles don't
trip the server-env Proxy.
- timeoutFetch override constructs a fresh `Headers(init?.headers)` per
request and sets `Authorization: Bearer ${ADMIN_BEARER}` only when
defined. Bearer undefined → header omitted entirely (admin treats
request as anonymous PUBLIC). Fresh Headers also replaces any
caller-supplied Authorization rather than merging — defense-in-depth
against accidental key echo.
- Per-call AbortSignal.timeout(3000) preserved (outbound-timeout-shorter-
than-caller-budget discipline).
**content.ts changes:**
1. `WatchPageAdminError` class (near WatchVideoError) — mirrors that
precedent:
```
class WatchPageAdminError extends Error {
readonly code: "NOT_FOUND" | "UNAVAILABLE"
constructor(code, { cause }: { cause?: Error } = {}) { ... }
}
```
`code` field (not `kind` as plan text suggested) — matches existing
WatchVideoError precedent in the same file. UB7 dispatches on
`error.code`.
2. fetchSlugExperience 2-case branch table:
- `mode === "strapi"`: byte-identical to current main.
- `mode === "admin"` AND `!env.WEB_ADMIN_API_KEYS` (runtime safety
net for deploy-order errors): log `forge.parity.consumer_bearer_missing`,
fall back to strapi semantics for THIS request. Inline TODO
references the post-Strapi-removal switch to throwing UNAVAILABLE
per plan-003 Key Technical Decisions.
- `mode === "admin"`: call `fetchAdminSlugExperience`. Outcome
classification (`error.name`-based, never message-substring per the
AWS NoSuchKey learning):
* `ok:true, response==null` → log `admin_null` + throw `NOT_FOUND`
* `ok:true, response!=null` → return
* `ok:"timeout"` → log `admin_timeout` + throw `UNAVAILABLE`
* `ok:"error"` → log `admin_fetch_error` + throw `UNAVAILABLE`
with original as `cause`
3. `unstable_cache` re-throw inside `fetchResolvedWatchPage`'s catch:
```
if (error instanceof WatchPageAdminError) throw error
```
Goes BEFORE the existing sentinel return. The prior plan flagged
this as P0 — without it, the cache wrapper swallows the throw and
the error boundary never fires. `unstable_cache` re-throws errors
from its inner function (verified by the existing comment).
`resolveWatchPage` is wrapped in React `cache()` which propagates
the throw to `[slug]/page.tsx`. Strapi-mode generic Errors keep the
sentinel path → page.tsx renders `<ExperienceError>` inline.
4. Removed `runDualReadComparison` import + `fetchStrapiSlugExperience`
function — dead code after the cutover. The parity-bridge module
itself stays (U5 added the mode guard; U5 deletion follow-up PR
retires it). content.ts no longer calls into the bridge.
**Tests:**
- admin-client.test.ts: 6 new — bearer present, bearer absent
(`Authorization` header NOT set), whitespace trim on first CSV entry,
console-scrub invariant (mutation-tested: removing scrub locally
fails), singleton preservation, per-call AbortSignal.
- content.test.ts: ~14 cutover tests covering happy paths admin + strapi,
NOT_FOUND, UNAVAILABLE via Apollo error / timeout / abort,
consumer_bearer_missing safety net, cache re-throw + sentinel
preservation, end-to-end propagation chain.
- Mocked-shape discipline: Apollo errors thrown via
`Object.assign(new Error("..."), { name: "ApolloError", networkError: ... })`
— REAL typed shape, not generic Error.
**Note on `kind` vs `code`:** plan-003 text used `kind` for the
discriminator field. Existing `WatchVideoError` precedent uses `code`.
Followed the precedent for codebase consistency. UB7's classifier
will dispatch on `error.code`.
Verification:
- pnpm --filter @forge/web typecheck: clean
- pnpm --filter @forge/web test: 32 files, 353 + 3 todo, 0 fails
- Operator manual smoke (post-deploy): FORGE_CONTENT_API=admin
WEB_ADMIN_API_KEYS=<key> ADMIN_GRAPHQL_URL=<url> + curl canary slug;
verify admin called, Strapi not, log stream contains no bearer string.
Slug-page error boundary that catches typed WatchPageAdminError from U6's admin-mode fetch. Renders one of two static UX shapes matching the Strapi-mode inline rendering — end users see no behavior difference between admin-mode and Strapi-mode failures. **Mode-aware:** the boundary only handles `WatchPageAdminError`. Strapi- mode sentinel errors continue through the existing inline path in `page.tsx` and never reach this boundary (it's additive for admin mode). **Information-disclosure discipline:** `error.message` is NEVER rendered as visible text. The classifier dispatches on `error.code`: - `NOT_FOUND` → `<ExperienceEmpty>` (matches Strapi-mode's NO_EXPERIENCE_FOUND_MESSAGE inline render) - `UNAVAILABLE` → `<ExperienceError>` with hardcoded "Service temporarily unavailable" message + reset button (mirrors locale-level error boundary pattern). ExperienceError additionally sanitizes via its KNOWN_ERRORS table; the stable string maps to its generic fallback so admin-mode UNAVAILABLE and Strapi-mode generic-error renderings stay visually consistent. **Catch-all:** non-typed errors (or future codes not in the union) re-throw to Next.js's segment-default error boundary, which emits a generic 500 page without echoing the underlying error.message. Safe contract: this boundary handles two static cases and re-throws everything else. Pattern follows `apps/web/src/app/[slug]/[locale]/error.tsx` precedent (the 2-segment error boundary that the locale-level route uses). Tests (7): - Happy path NOT_FOUND renders `<ExperienceEmpty>` (no reset button — matches Strapi-mode behavior on empty content). - Happy path UNAVAILABLE renders `<ExperienceError>` + reset button. - Reset callback fires on Try again click. - Information-disclosure: NOT_FOUND error.message NEVER appears in DOM (mutation-tested with a planted secret fragment via both `cause` and the error's own message field). - Information-disclosure: UNAVAILABLE error.message NEVER appears in DOM (same planted-secret pattern). - Catch-all: generic Error (non-WatchPageAdminError) re-throws. - Catch-all: off-band code on WatchPageAdminError (forged via Object.defineProperty bypassing the TS readonly) re-throws — guards against a future widening of the code union without updating this boundary. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 33 files, 360 + 3 todo, 0 fails
THE cutover gate. Offline harness that fetches every published slug from
Strapi + admin in parallel, diffs the responses via existing parity
primitives, applies allow-list rules, and produces a structured per-slug
report. Gate passes iff every remaining diff is allow-listed; the
operator never flips FORGE_CONTENT_API to admin without a green gate.
**Files:**
- packages/graphql/src/parity/batch-verification.ts (886 LOC):
Core harness. Module-level so it's both typechecked AND unit-testable.
Exports the orchestrator (`runBatchVerification`), the per-slug
comparator, args parser, env-bearer reader, stratified sampler, gate
logic, JSON-report builder, and supporting helpers (`postGraphQL`
with 429 backoff, `sanitizeError` for bearer redaction).
- packages/graphql/scripts/run-batch-verification.ts (345 LOC):
Thin CLI shim. Parses env + args, builds real GraphQL fetchers against
Strapi + admin, calls runBatchVerification, writes report, emits exit
code. Same scripts/-outside-tsconfig pattern as capture-parity-fixture.
- packages/graphql/src/parity/batch-verification.test.ts (49 new tests):
parseArgs, readBearerFromEnv hard-fail, sanitizeError redaction,
allow-list combiner, stratified sample determinism + bucket
distribution, per-slug compare across happy/admin-missing/strapi-
missing/both-failed paths, bearer-redaction in errors, value-diff,
allow-list application, runBatchVerification PASSED/FAILED gate paths,
CONCURRENCY CAP ENFORCEMENT (instrumented counter), --since filter,
no-crash-on-throwing-fetcher, backoffDelayMs cap, postGraphQL
429-then-200 retry, max-retries RateLimitExhaustedError, bearer header
presence/absence, and a LOCKED JSON inline snapshot for the downstream
report shape.
**CLI surface:**
--sample <n> representative-sample-first count (default null
= full corpus). Stratified by updatedAt:
oldest 30 / middle 40 / newest 30.
--concurrency <n> bounded-parallel admin/strapi fetches (default 5).
--out <path> report JSON output (default .tmp/batch-
verification-<UTC-YYYYMMDD-HHmmss>.json).
--allow-list <path> operator-supplied additional allow-list entries
(JSON) — combined with DEFAULT_ALLOW_LIST from
packages/graphql/src/parity/allow-list.ts.
--since <iso> delta filter (only slugs with updatedAt > ISO)
— for editorial-freeze workflows where the
operator re-runs immediately before env-flip.
--anonymous explicit opt-out of the bearer auto-read for
local-dev use. Without this AND without
WEB_ADMIN_API_KEYS set, the CLI HARD-FAILS with
a clear error explaining the self-DoS risk
(anonymous traffic hits the public:${ip} bucket
that real end-user SSR is about to share).
--help / -h
**Bearer discipline:**
- Auto-read from WEB_ADMIN_API_KEYS env (first CSV entry). Hard-fail
when unset unless --anonymous opt-out.
- sanitizeError strips the bearer token from all error messages before
they hit the report or stderr.
- Per-fetch AbortSignal.timeout(3000ms) matches U6's outbound budget.
- HTTP 429 retry: 500ms base, exponential backoff capped at 30s, 3
attempts before RateLimitExhaustedError.
**Gate + exit code:**
- PASSED (exit 0): all four diff channels + errors total ZERO across
the corpus. Allow-listed diffs don't count.
- FAILED (exit 1): any non-allow-listed diff or error.
- Misconfig (bad args, missing env, malformed allow-list JSON,
unparseable --since) → exit 2.
**Report JSON shape (downstream contract — locked by inline snapshot):**
{
generatedAt: ISO,
totals: { slugs, withStructural, withValue, withOrder, withSemantic,
withErrors, allowListed },
gate: "PASSED" | "FAILED",
slugs: [{
slug, locale,
structural / value / order / semantic / allowListed:
{ count, paths[] },
timingMs: { strapi, admin, compare },
error?: { side, message },
}]
}
**Dependencies:** added `p-limit ^7.3.0` (matches workspace versions in
apps/admin and apps/manager).
**Unresolved (operator-side):**
- Strapi corpus query uses `filters: { isTemplate: { eq: false },
publishedAt: { notNull: true } }`. Not gql.tada-typechecked (scripts/
outside tsconfig). Operator validates on first run; if the Strapi
deployment uses different filter literals (e.g. `$notNull`), the
inline query string is patched in-place. The orchestration loop is
decoupled via the Fetchers interface so the patch is one edit.
- `I18NLocaleCode` GraphQL variable type hardcoded; same patch path.
**Editorial-freeze coordination** is documented in U9's runbook (not
this CLI). The --since flag exists for the alternative path where a
freeze isn't operationally possible.
Verification:
- pnpm --filter @forge/graphql typecheck: clean
- pnpm --filter @forge/graphql test: 9 files, 162 tests (49 new), 0 fails
- pnpm --filter @forge/graphql lint: clean
…re flag (Unit 9)
Final PR-B unit. Ships the cutover runbook + emergency route-disable
mechanism (rollback layer 1).
**docs/admin-core-migration/cutover-runbook.md (315 lines):**
Comprehensive operator-facing doc. Status header marks
`draft until measured` until the MTTR numbers are observed. Sections:
1. Pre-cutover checklist (8 items: PR-A deployed, WEB_ADMIN_API_KEYS
set on both Doppler projects, ADMIN_GRAPHQL_URL healthy, SDL
regenerated, batch verification gate green, editorial freeze
coordinated for the 24-48h window between gate-green and env-flip).
2. Concurrent-backend exposure note (admin for slug-page, Strapi for
homepage + watch-video during the cutover window; both must outlast
TV burn-in completion before Strapi shuts down).
3. Mean-time-to-rollback measurement procedure (TODO placeholders until
operator runs two test flips on forge-web; measures deploy timing +
user-visible 5xx rate + cache thrash duration + maintenance-fallback
response time; escalates if worst-case deploy > 10min OR user-impact
5xx > 5%).
4. Cutover procedure (5 sequential steps with shell-block commands).
5. Rollback layers in escalation order (4 layers):
- Layer 1: FORGE_DISABLE_WATCH_ROUTES on forge-web Doppler (seconds).
- Layer 2: revert FORGE_CONTENT_API to strapi (if Strapi live).
- Layer 3: code-revert PR-B + redeploy (5-15 min).
- Layer 4: revert admin SDL — last resort; sequenced ONLY after
Layer 3 reverts web to Strapi so admin's blast radius is contained.
6. No-degraded-hybrid-mode escalation reminder.
7. Monitoring queries (forge.parity.* events, Apollo error rate, 5xx).
8. Planned bearer-key rotation R8a (90-day cadence, additive-then-
remove-old, 7 steps).
9. Emergency bearer-key revocation (remove-first ordering, 5 steps,
distinct from rotation).
10. Unbounded-cycles contingency (R6 + T-7 threshold; 3 ordered
contingencies; explicitly NOT phased ramp).
11. TODO(U7) markers (canonical-plan U7: no-redeploy mechanism,
parity-diff CI gate, GraphQL Armor recalibration).
12. Document history table.
**apps/web/src/env.ts:**
Adds `FORGE_DISABLE_WATCH_ROUTES: z.string().optional()` to server
schema + runtimeEnv. Docstring: EMERGENCY ONLY, runbook cross-ref.
**apps/web/src/components/MaintenanceFallback.tsx (26 lines):**
Static Client Component mirroring ExperienceEmpty/ExperienceError
layout shape. Zero fetching, zero client interactivity, zero imports
beyond JSX. Failsafe contract.
**apps/web/src/app/[slug]/page.tsx (45 lines added):**
Module-scope IIFE parses CSV into `DISABLED_ROUTES: ReadonlySet<string>`.
Whitespace tolerance, empty filter, warn-and-continue on entries missing
leading `/` (well-formed siblings on the same CSV still take effect).
At top of SlugPage, before resolveWatchPage is awaited:
`DISABLED_ROUTES.has(/${slug})` returns <MaintenanceFallback />. Reads
from module-scope set — no headers()/cookies() (would defeat ISR per
the institutional learning).
**apps/web/src/app/[slug]/page.test.tsx (7 tests):**
Match → short-circuits (resolveWatchPage NOT called); whitespace
tolerance; non-match falls through; unset env; empty string;
malformed CSV warns + falls through; malformed sibling does not
break well-formed entry.
Note: test environment switched to node (not jsdom). @t3-oss/env-nextjs
throws when window is defined, so the page module can't be imported
under jsdom. react-dom/server's renderToStaticMarkup works fine in
node. Documented inline in the test header.
**MTTR numbers remain TODO** until operator runs two real flips and
records observed timing. The runbook stays at status: draft until measured.
Verification:
- pnpm --filter @forge/web typecheck: clean
- pnpm --filter @forge/web test: 34 files, 367 + 3 todo, 0 fails
Address Group A findings from the consumer-migration ce-code-review:
- REL-03 (run-batch-verification.ts): add pagination: { limit: -1 } to
STRAPI_EXPERIENCE_QUERY's blocks selection. Strapi v5 silently caps
nested relations at 10 rows, which would surface as false structural
diffs in the cutover gate for any experience with >10 blocks.
- PERF-01 (batch-verification.ts compareSlug): fetch Strapi and admin
in parallel via Promise.allSettled instead of sequential await chains.
Halves per-slug wall time; materially shortens the editorial-freeze
window across a 1000-slug corpus.
- REL-01 (batch-verification.ts postGraphQL): add isTimeoutOrAbortError
discriminator before retry. AbortError/TimeoutError indicate the
request hit the outbound budget — a persistent admin slowness is
not a transient blip, so 3×3s + backoff is wasted gate-convergence
time. Now fast-fails the per-slug fetch.
- REL-02 (batch-verification.ts backoffDelayMs): add full-jitter via
injectable RNG (defaults Math.random). Kills the thundering-herd
where N concurrent workers hitting 429 all resume at the same ms.
Test rewritten to bound delay within the exponential envelope and
assert distribution non-determinism.
Verification: pnpm --filter @forge/graphql typecheck clean, 9 test files,
164 tests pass (added 2 jitter tests). No api shape changes.
Address Group B (security) findings:
- C2 (correctness): align bearer-presence check in fetchSlugExperience's
safety-net branch with admin-client.ts. Both must classify whitespace-
only or empty-first-CSV-entry values as 'unset' so the safety net
fires consistently. admin-client uses
`.split(",")[0]?.trim() || undefined`; safety net now uses the
same parse via local `bearerFirstEntry`.
- sec-002 (security): scrub the bearer key from causeError.message
BEFORE logging forge.parity.admin_fetch_error. Apollo's response-error
formatter can include downstream-server-controlled response body
text in error.message — a hostile or misconfigured admin echoing
the Authorization header in a 500 response would leak the bearer
to Railway logs without this scrub. Mutation-tested by the cross-
reviewer agreement between security and kieran-typescript.
Verification: pnpm --filter @forge/web typecheck clean,
34 test files, 396 tests + 3 todo, 0 fails.
- AC-04 (api-contract + kieran-typescript, cross-reviewer): add `as const satisfies Record<Block["t"] | SectionContentBlock["t"] | ContainerContentBlock["t"], string>` annotation on T_TO_TYPENAME in apps/admin/src/graphql/types/blocks.ts. Now tsc enforces that every Zod `t` discriminator across all 3 unions has a matching Pothos typename entry at compile time. Pairs with drift-CI's runtime three-way bijection (typo-detection on typename values). - M-01 (maintainability): comment on WatchPageAdminError clarifying the discriminator field is `code` (matches WatchVideoError precedent; plan-003 used "kind" interchangeably). - M-04 (maintainability): update parity-bridge deletion checklist — the "every callsite of runDualReadComparison in content.ts" item is now zero callsites (U6's cutover removed the import). - W2 (agent-native): add `run-batch-verification` script alias to packages/graphql/package.json so the cutover runbook's `pnpm --filter @forge/graphql run-batch-verification` invocation works. Verification: pnpm --filter @forge/admin typecheck clean.
…e-review) principal.test.ts case table previously covered null/PUBLIC/VIEWER/EDITOR/ ADMIN/SYSTEM/WORKFLOW_TRIGGER but not CONSUMER_BEARER. Per the testing reviewer, this gap meant a future typo widening isEditorOrAdmin to include CONSUMER_BEARER would silently pass every test — and silently leak templates / drafts to anonymous web SSR traffic via the rate-limit bearer. Now every bearer role is tested explicitly. A regression surfaces here at test time, not at the rate-limit dashboard.
62a7e83 to
e6ac034
Compare
Apply the project's inline-comment contract across PR-B's surface: default to no comment; when warranted, lead with WHY in one short line; cut filler; remove U-ID / PR-ID references that belong in PR descriptions and git blame, not in code. Net: 21 files, ~616 lines of comment cruft removed (821 deletions, 205 new — mostly tightened header docstrings). Zero functional code changes. Preserved (load-bearing): - Zod construct audit in blocks.ts header (institutional learning for future drift) - T_TO_TYPENAME satisfies annotation rationale - Security notes (timing-safe assertion, log-scrub threat model, hostile admin echo, info-disclosure invariant) - Deletion checklists in parity-bridge.ts + content-api-mode.ts (the fast-follow deletion PR uses them as a source of truth) - TODO(post-strapi-removal) lifecycle marker on the runtime safety net - snapshot-locked report contract on SlugReport - Cross-references to docs/solutions/ and docs/plans/ that cite load-bearing rationale Verification: - pnpm --filter @forge/admin typecheck: clean - pnpm --filter @forge/graphql typecheck: clean - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/graphql test: 164/164 pass - pnpm --filter @forge/web test: 396/396 pass - pnpm --filter @forge/admin test: clean (one pre-existing fs-race flake in unrelated s3.test.ts not introduced here)
F1 — ADMIN_EXPERIENCE_QUERY was broken in two ways that would cause
every slug to fail with a GraphQL parse error:
1. Selected `description` (admin's field is `metaDescription`).
2. Selected bare `blocks` on a [ExperienceBlock!]! union — invalid
GraphQL syntax for non-leaf types.
Rewritten with:
- `description: metaDescription` alias at top level so the response
shape matches AdminExperienceLocaleInput's Strapi-vocab `description`
key (the normalizer's adapted input type).
- Full inline fragments on all 17 block kinds + nested
SectionContentBlock and ContainerContentBlock unions, using admin's
NATIVE field names (not the Strapi-vocab aliases the renderer
fragments use). Required because normalize-admin.ts runs
BlocksSchema.safeParse(), which validates against admin's Zod
discriminated union with original field names.
The harness query now parses cleanly. Schema-level field validity is
enforced server-side when the harness runs against a live admin.
F20 — compareSlug captured tStrapiStart=tAdminStart=Date.now() outside
Promise.allSettled and tStrapiEnd=tAdminEnd=Date.now() after the join.
Both timingMs values were identical = max(strapi, admin). Operators
couldn't diagnose which side regressed. Timings now captured INSIDE
each promise via tagged outcomes — strapiDurationMs and adminDurationMs
reflect actual per-side wall-clock.
Findings from the external review pass on PR #933. F1 was gate-blocking
(harness would have failed loudly on every slug); F20 was a regression
introduced in the earlier PERF-01 fix.
Five high-value findings from the external review, fixed before PR-B lifts from draft. - F2 fetchResolvedWatchPage now takes mode as an arg so unstable_cache derives the cache key from it. Was a stable cross-mode key before, which would serve stale cross-mode hits during the flip. - F6 ADMIN_BLOCK_TYPENAMES split into typed list const plus derived ReadonlySet. Default switch branch dev-warns when typename is in the set but not handled. - F7 error.tsx checks WatchPageAdminError shape by duck-type rather than instanceof. Next SSR to client serialization loses class identity. - F10 runbook Layer 1 wording corrected. Module-scope reads need a Railway redeploy; advantage is surgical scope not raw speed. - F12 experienceToMetadata handles admin flat ogImageUrl alongside Strapi nested ogImage. Without this branch admin-mode pages lose open-graph imagery. - F15 negative test where Apollo error message echoes the bearer; asserts log payload has redacted and zero bearer occurrences. Skipped: F14 (17 per-kind smoke tests) and F9 (cache-test rewrite needs integration setup) — too large for one batch. Verified: web typecheck, web test (15/15 content.test.ts), graphql typecheck + test (164/164). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-draft-lift cleanup batch on PR-B. - F13 bearer scrub now redacts EVERY CSV entry, not just the first. Mid- rotation both keys are live and either can show up in an echoed error body. ~10 LOC swap from split/join-once to a reduce over all entries. - F15 test extended to construct an error message containing both keys and assert neither survives. 2x aaa + 1x bbb = 3 redactions. - A3 added a post-rotation test where WEB_ADMIN_API_KEYS holds only the second key (operator removed the leaked first entry). Asserts admin path authenticates normally with no consumer_bearer_missing fallback. The Emergency revocation runbook section already exists at line 258 and covers the operator-facing procedure; A3 collapses to the test. - F18 MTTR gate removed from the runbook. Staging is decommissioned so the pre-flip-measurement-in-staging story doesn't apply. First prod flip doubles as the measurement flip; operator captures observed timings into the table during the flip and commits the doc same session. Status flips from "draft until measured" to "live". - Document-history table updated with F13 + F18 entries. Verified: web typecheck, content.test.ts 16/16. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
Ur-imazing
added a commit
that referenced
this pull request
May 13, 2026
…unbook (Units 4-9 + UB7) (#933) * feat(web): consumer migration — ContentApiMode collapse + regression snapshot (Unit 4) First commit of PR-B. Per the test-first regression discipline at docs/solutions/best-practices/test-first-regression-snapshot-byte-identical-default-20260429.md, the regression snapshot lands first so subsequent units (U5-U9) cannot silently change strapi-mode behavior. Changes: - apps/web/src/env.ts: * Add WEB_ADMIN_API_KEYS (optional CSV string). Web's SSR sends the first entry as `Authorization: Bearer` so admin buckets web's traffic as `consumer:<key>` rather than `public:<railway-egress-ip>`. Optional per the institutional learning about required-without- default env vars bricking Railway deploys. * Keep FORGE_CONTENT_API enum permissive over the four historical values (strapi / dual-read / admin-with-fallback / admin). The SOFT-REMOVAL of dual-read and admin-with-fallback happens at the runtime narrower, NOT the schema validator — stale Doppler configs don't brick boot. - apps/web/src/lib/content-api-mode.ts: * ContentApiMode collapses to `"strapi" | "admin"` (was strapi|dual-read). * RECOGNIZED_MODES = ["strapi", "admin"]. LEGACY_SOFT_REMOVED_MODES = ["dual-read", "admin-with-fallback"] — these get a distinct "soft-removed; update your Doppler config" warn before falling back to strapi. Operators reading logs differentiate "I have a stale config" from "I have a typo". * Deletion-checklist docstring updated to point at plan-003 + reflect that ADMIN_GRAPHQL_URL + WEB_ADMIN_API_KEYS STAY post-Strapi-removal (they become required, not deleted). - apps/web/src/lib/content-api-mode.test.ts: * Rewrote tests for the new closed set. "admin" is now a valid mode (no warn). "dual-read" and "admin-with-fallback" fall back with the soft-removed warn. Added a differentiation test verifying the soft-removed warn message differs from the unknown-value warn. - apps/web/src/lib/__tests__/content-mode-regression.test.ts (NEW): * Source-of-truth input/output table. Each row locks "given env value X, getContentApiMode() returns Y." U5-U9 cannot silently change these outputs without updating the test in the same commit. * Type-level contract: ContentApiMode union has exactly 2 members. Compile-time exhaustiveness check via switch + `never` assertion. Note: fetchSlugExperience's branch table is NOT touched by U4. The existing dual-read code path becomes unreachable when `mode === "strapi"` (legacy values normalize to strapi) and continues to fire as canary-like behavior when `mode === "admin"`. U6 replaces the admin branch with the real cutover code (admin-only fetch + WatchPageAdminError throws). Between U4 merge and U6 merge, the "admin" mode produces admin-with-fallback-like behavior — acceptable because the env flip doesn't happen until U6 ships. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 32 files, 352 tests + 3 todo, 0 fails - Regression snapshot covers 5-input matrix: undefined / strapi / admin / dual-read / admin-with-fallback (the last two soft-removed to strapi). * feat(web): consumer migration — admin-shape fragments + dispatch + parity infra (Unit 5) Ship admin-shape `WatchExperience` fragments in `packages/graphql` for future mobile/TV reuse, extend the renderer dispatch with admin typenames, and update the parity infrastructure (bridge mode guard + normalizer type narrowing + fixture rewrites) for PR-A's typed-blocks schema. **Fragments (20 new files, 739 LOC in `packages/graphql/src/fragments/admin/`):** - Root `WatchExperience` fragment on `ExperienceLocale` using `adminGraphql()`. - 17 per-block-kind fragments + 2 nested-union fragments. Each selects fields available on the matching Pothos type from PR-A. - Re-exported through `packages/graphql/src/index.ts`; new `./admin/fragments` package.json exports entry. **Renderer dispatch (`apps/web/src/components/sections/index.tsx`):** - Adds 17 admin-typename cases alongside the existing Strapi cases. Both paths route through the existing per-kind renderers via a `renderAdminBlock(...)` helper. Additive (not replacement) so the branch stays shippable between U5 and U6 — U6 may remove Strapi cases when `content.ts` flips, or keep both for the cutover window. - `CardBlock` and `VideoRecommendationsBlock` cases emit dev-warn placeholders: CardBlock has no Strapi precedent / no renderer yet; VideoRecommendationsBlock needs videoId→slug hydration that's U6's scope. **Prop-shape audit:** `grep` for `.data?.` / `.attributes?.` in `apps/web/src/components/sections/` returned ZERO matches — Strapi v5's GraphQL plugin already flattens the response, so the wrapped-envelope problem doesn't exist on the renderer side. Relation-accessor diffs surfaced for video / images / imageOverride (`enrichment.ts`, `CarouselVideo.tsx`, `Video.tsx`); admin returns flat `videoId` instead of a joined Video row. All renderer code uses optional chaining with sane fallbacks (`titleOverride`, `imageUrl`), so admin payloads degrade rather than crash. U6 will wire videoId→slug hydration. **`admin-experience.ts` (web): updated, not deleted.** `content.ts` still imports `adminExperienceBySlugOperation` for the dual-fetch orchestration shape. Composed `AdminWatchExperience` so the query typechecks against PR-A's typed-union `blocks` field. U6 may delete when `content.ts` no longer needs the dual-fetch. **Parity-bridge mode guard (`apps/web/src/lib/parity-bridge.ts`):** - Added `isCanaryEmissionEnabled()` helper comparing `getContentApiMode() as string === "dual-read"`. The cast is deliberate: U4 collapsed `ContentApiMode` to `"strapi" | "admin"` so the literal `"dual-read"` can never match by type — the guard intentionally kills canary emission post-cutover. - Guard fires at the entry of `runDualReadComparison(...)`. Admin-mode and strapi-mode requests now emit ZERO `forge.parity.*` log events, so U9's monitoring section doesn't have to disambiguate "real admin failure" from "leftover canary noise." - 2 new tests in `parity-bridge.test.ts` assert admin/strapi modes short-circuit; existing 16 dual-read tests run under a mocked-mode pattern. **Parity normalizer (`packages/graphql/src/parity/normalize-admin.ts`):** - `AdminExperienceLocaleInput['blocks']` narrows from `unknown` to `ReadonlyArray<Block>` (admin's Zod-derived domain type). After PR-A, admin returns typed blocks by construction; the existing `BlocksSchema.safeParse(rawBlocks)` becomes defensive belt-and- suspenders — kept with an explanatory comment for runtime drift. - `normalize-admin.test.ts` — rewrote 3 happy-path block fixtures (mediaCollection, text, videoRecommendations) to typed-union shape matching `BlockSchema.options[N]` exactly. Added `LooseAdminInput` + `toInput()` helper for adversarial fixtures (intentional shape drift to exercise the defensive `safeParse` path). **Cross-package verification:** - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 32 files, 354 tests + 3 todo, 0 fails - pnpm --filter @forge/graphql typecheck: clean - pnpm --filter @forge/graphql test: 8 files, 113 tests, 0 fails - pnpm --filter @forge/graphql generate: both schemas regenerated cleanly - pnpm --filter @forge/admin typecheck (regression check): clean * feat(web): consumer migration â�� Unit 6 (cutover + admin bearer + cache re-throw) Wire the actual cutover branch. When `FORGE_CONTENT_API === "admin"`, web fetches from admin via a bearer-aware Apollo client; admin failures throw `WatchPageAdminError`; `unstable_cache` re-throws (not caches) admin errors so they reach UB7's segment error boundary. Strapi sentinel errors keep their current inline-rendered path. **admin-client.ts:** - Module-scope bearer read: `env.WEB_ADMIN_API_KEYS?.split(",")[0]?.trim()`. Inside `typeof window === "undefined"` guard so client bundles don't trip the server-env Proxy. - timeoutFetch override constructs a fresh `Headers(init?.headers)` per request and sets `Authorization: Bearer ${ADMIN_BEARER}` only when defined. Bearer undefined → header omitted entirely (admin treats request as anonymous PUBLIC). Fresh Headers also replaces any caller-supplied Authorization rather than merging — defense-in-depth against accidental key echo. - Per-call AbortSignal.timeout(3000) preserved (outbound-timeout-shorter- than-caller-budget discipline). **content.ts changes:** 1. `WatchPageAdminError` class (near WatchVideoError) — mirrors that precedent: ``` class WatchPageAdminError extends Error { readonly code: "NOT_FOUND" | "UNAVAILABLE" constructor(code, { cause }: { cause?: Error } = {}) { ... } } ``` `code` field (not `kind` as plan text suggested) — matches existing WatchVideoError precedent in the same file. UB7 dispatches on `error.code`. 2. fetchSlugExperience 2-case branch table: - `mode === "strapi"`: byte-identical to current main. - `mode === "admin"` AND `!env.WEB_ADMIN_API_KEYS` (runtime safety net for deploy-order errors): log `forge.parity.consumer_bearer_missing`, fall back to strapi semantics for THIS request. Inline TODO references the post-Strapi-removal switch to throwing UNAVAILABLE per plan-003 Key Technical Decisions. - `mode === "admin"`: call `fetchAdminSlugExperience`. Outcome classification (`error.name`-based, never message-substring per the AWS NoSuchKey learning): * `ok:true, response==null` → log `admin_null` + throw `NOT_FOUND` * `ok:true, response!=null` → return * `ok:"timeout"` → log `admin_timeout` + throw `UNAVAILABLE` * `ok:"error"` → log `admin_fetch_error` + throw `UNAVAILABLE` with original as `cause` 3. `unstable_cache` re-throw inside `fetchResolvedWatchPage`'s catch: ``` if (error instanceof WatchPageAdminError) throw error ``` Goes BEFORE the existing sentinel return. The prior plan flagged this as P0 — without it, the cache wrapper swallows the throw and the error boundary never fires. `unstable_cache` re-throws errors from its inner function (verified by the existing comment). `resolveWatchPage` is wrapped in React `cache()` which propagates the throw to `[slug]/page.tsx`. Strapi-mode generic Errors keep the sentinel path → page.tsx renders `<ExperienceError>` inline. 4. Removed `runDualReadComparison` import + `fetchStrapiSlugExperience` function — dead code after the cutover. The parity-bridge module itself stays (U5 added the mode guard; U5 deletion follow-up PR retires it). content.ts no longer calls into the bridge. **Tests:** - admin-client.test.ts: 6 new — bearer present, bearer absent (`Authorization` header NOT set), whitespace trim on first CSV entry, console-scrub invariant (mutation-tested: removing scrub locally fails), singleton preservation, per-call AbortSignal. - content.test.ts: ~14 cutover tests covering happy paths admin + strapi, NOT_FOUND, UNAVAILABLE via Apollo error / timeout / abort, consumer_bearer_missing safety net, cache re-throw + sentinel preservation, end-to-end propagation chain. - Mocked-shape discipline: Apollo errors thrown via `Object.assign(new Error("..."), { name: "ApolloError", networkError: ... })` — REAL typed shape, not generic Error. **Note on `kind` vs `code`:** plan-003 text used `kind` for the discriminator field. Existing `WatchVideoError` precedent uses `code`. Followed the precedent for codebase consistency. UB7's classifier will dispatch on `error.code`. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 32 files, 353 + 3 todo, 0 fails - Operator manual smoke (post-deploy): FORGE_CONTENT_API=admin WEB_ADMIN_API_KEYS=<key> ADMIN_GRAPHQL_URL=<url> + curl canary slug; verify admin called, Strapi not, log stream contains no bearer string. * feat(web): consumer migration — [slug]/error.tsx Client Component (UB7) Slug-page error boundary that catches typed WatchPageAdminError from U6's admin-mode fetch. Renders one of two static UX shapes matching the Strapi-mode inline rendering — end users see no behavior difference between admin-mode and Strapi-mode failures. **Mode-aware:** the boundary only handles `WatchPageAdminError`. Strapi- mode sentinel errors continue through the existing inline path in `page.tsx` and never reach this boundary (it's additive for admin mode). **Information-disclosure discipline:** `error.message` is NEVER rendered as visible text. The classifier dispatches on `error.code`: - `NOT_FOUND` → `<ExperienceEmpty>` (matches Strapi-mode's NO_EXPERIENCE_FOUND_MESSAGE inline render) - `UNAVAILABLE` → `<ExperienceError>` with hardcoded "Service temporarily unavailable" message + reset button (mirrors locale-level error boundary pattern). ExperienceError additionally sanitizes via its KNOWN_ERRORS table; the stable string maps to its generic fallback so admin-mode UNAVAILABLE and Strapi-mode generic-error renderings stay visually consistent. **Catch-all:** non-typed errors (or future codes not in the union) re-throw to Next.js's segment-default error boundary, which emits a generic 500 page without echoing the underlying error.message. Safe contract: this boundary handles two static cases and re-throws everything else. Pattern follows `apps/web/src/app/[slug]/[locale]/error.tsx` precedent (the 2-segment error boundary that the locale-level route uses). Tests (7): - Happy path NOT_FOUND renders `<ExperienceEmpty>` (no reset button — matches Strapi-mode behavior on empty content). - Happy path UNAVAILABLE renders `<ExperienceError>` + reset button. - Reset callback fires on Try again click. - Information-disclosure: NOT_FOUND error.message NEVER appears in DOM (mutation-tested with a planted secret fragment via both `cause` and the error's own message field). - Information-disclosure: UNAVAILABLE error.message NEVER appears in DOM (same planted-secret pattern). - Catch-all: generic Error (non-WatchPageAdminError) re-throws. - Catch-all: off-band code on WatchPageAdminError (forged via Object.defineProperty bypassing the TS readonly) re-throws — guards against a future widening of the code union without updating this boundary. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 33 files, 360 + 3 todo, 0 fails * feat(graphql): consumer migration — batch verification harness (Unit 8) THE cutover gate. Offline harness that fetches every published slug from Strapi + admin in parallel, diffs the responses via existing parity primitives, applies allow-list rules, and produces a structured per-slug report. Gate passes iff every remaining diff is allow-listed; the operator never flips FORGE_CONTENT_API to admin without a green gate. **Files:** - packages/graphql/src/parity/batch-verification.ts (886 LOC): Core harness. Module-level so it's both typechecked AND unit-testable. Exports the orchestrator (`runBatchVerification`), the per-slug comparator, args parser, env-bearer reader, stratified sampler, gate logic, JSON-report builder, and supporting helpers (`postGraphQL` with 429 backoff, `sanitizeError` for bearer redaction). - packages/graphql/scripts/run-batch-verification.ts (345 LOC): Thin CLI shim. Parses env + args, builds real GraphQL fetchers against Strapi + admin, calls runBatchVerification, writes report, emits exit code. Same scripts/-outside-tsconfig pattern as capture-parity-fixture. - packages/graphql/src/parity/batch-verification.test.ts (49 new tests): parseArgs, readBearerFromEnv hard-fail, sanitizeError redaction, allow-list combiner, stratified sample determinism + bucket distribution, per-slug compare across happy/admin-missing/strapi- missing/both-failed paths, bearer-redaction in errors, value-diff, allow-list application, runBatchVerification PASSED/FAILED gate paths, CONCURRENCY CAP ENFORCEMENT (instrumented counter), --since filter, no-crash-on-throwing-fetcher, backoffDelayMs cap, postGraphQL 429-then-200 retry, max-retries RateLimitExhaustedError, bearer header presence/absence, and a LOCKED JSON inline snapshot for the downstream report shape. **CLI surface:** --sample <n> representative-sample-first count (default null = full corpus). Stratified by updatedAt: oldest 30 / middle 40 / newest 30. --concurrency <n> bounded-parallel admin/strapi fetches (default 5). --out <path> report JSON output (default .tmp/batch- verification-<UTC-YYYYMMDD-HHmmss>.json). --allow-list <path> operator-supplied additional allow-list entries (JSON) — combined with DEFAULT_ALLOW_LIST from packages/graphql/src/parity/allow-list.ts. --since <iso> delta filter (only slugs with updatedAt > ISO) — for editorial-freeze workflows where the operator re-runs immediately before env-flip. --anonymous explicit opt-out of the bearer auto-read for local-dev use. Without this AND without WEB_ADMIN_API_KEYS set, the CLI HARD-FAILS with a clear error explaining the self-DoS risk (anonymous traffic hits the public:${ip} bucket that real end-user SSR is about to share). --help / -h **Bearer discipline:** - Auto-read from WEB_ADMIN_API_KEYS env (first CSV entry). Hard-fail when unset unless --anonymous opt-out. - sanitizeError strips the bearer token from all error messages before they hit the report or stderr. - Per-fetch AbortSignal.timeout(3000ms) matches U6's outbound budget. - HTTP 429 retry: 500ms base, exponential backoff capped at 30s, 3 attempts before RateLimitExhaustedError. **Gate + exit code:** - PASSED (exit 0): all four diff channels + errors total ZERO across the corpus. Allow-listed diffs don't count. - FAILED (exit 1): any non-allow-listed diff or error. - Misconfig (bad args, missing env, malformed allow-list JSON, unparseable --since) → exit 2. **Report JSON shape (downstream contract — locked by inline snapshot):** { generatedAt: ISO, totals: { slugs, withStructural, withValue, withOrder, withSemantic, withErrors, allowListed }, gate: "PASSED" | "FAILED", slugs: [{ slug, locale, structural / value / order / semantic / allowListed: { count, paths[] }, timingMs: { strapi, admin, compare }, error?: { side, message }, }] } **Dependencies:** added `p-limit ^7.3.0` (matches workspace versions in apps/admin and apps/manager). **Unresolved (operator-side):** - Strapi corpus query uses `filters: { isTemplate: { eq: false }, publishedAt: { notNull: true } }`. Not gql.tada-typechecked (scripts/ outside tsconfig). Operator validates on first run; if the Strapi deployment uses different filter literals (e.g. `$notNull`), the inline query string is patched in-place. The orchestration loop is decoupled via the Fetchers interface so the patch is one edit. - `I18NLocaleCode` GraphQL variable type hardcoded; same patch path. **Editorial-freeze coordination** is documented in U9's runbook (not this CLI). The --since flag exists for the alternative path where a freeze isn't operationally possible. Verification: - pnpm --filter @forge/graphql typecheck: clean - pnpm --filter @forge/graphql test: 9 files, 162 tests (49 new), 0 fails - pnpm --filter @forge/graphql lint: clean * feat(web): consumer migration — cutover runbook + route-disable feature flag (Unit 9) Final PR-B unit. Ships the cutover runbook + emergency route-disable mechanism (rollback layer 1). **docs/admin-core-migration/cutover-runbook.md (315 lines):** Comprehensive operator-facing doc. Status header marks `draft until measured` until the MTTR numbers are observed. Sections: 1. Pre-cutover checklist (8 items: PR-A deployed, WEB_ADMIN_API_KEYS set on both Doppler projects, ADMIN_GRAPHQL_URL healthy, SDL regenerated, batch verification gate green, editorial freeze coordinated for the 24-48h window between gate-green and env-flip). 2. Concurrent-backend exposure note (admin for slug-page, Strapi for homepage + watch-video during the cutover window; both must outlast TV burn-in completion before Strapi shuts down). 3. Mean-time-to-rollback measurement procedure (TODO placeholders until operator runs two test flips on forge-web; measures deploy timing + user-visible 5xx rate + cache thrash duration + maintenance-fallback response time; escalates if worst-case deploy > 10min OR user-impact 5xx > 5%). 4. Cutover procedure (5 sequential steps with shell-block commands). 5. Rollback layers in escalation order (4 layers): - Layer 1: FORGE_DISABLE_WATCH_ROUTES on forge-web Doppler (seconds). - Layer 2: revert FORGE_CONTENT_API to strapi (if Strapi live). - Layer 3: code-revert PR-B + redeploy (5-15 min). - Layer 4: revert admin SDL — last resort; sequenced ONLY after Layer 3 reverts web to Strapi so admin's blast radius is contained. 6. No-degraded-hybrid-mode escalation reminder. 7. Monitoring queries (forge.parity.* events, Apollo error rate, 5xx). 8. Planned bearer-key rotation R8a (90-day cadence, additive-then- remove-old, 7 steps). 9. Emergency bearer-key revocation (remove-first ordering, 5 steps, distinct from rotation). 10. Unbounded-cycles contingency (R6 + T-7 threshold; 3 ordered contingencies; explicitly NOT phased ramp). 11. TODO(U7) markers (canonical-plan U7: no-redeploy mechanism, parity-diff CI gate, GraphQL Armor recalibration). 12. Document history table. **apps/web/src/env.ts:** Adds `FORGE_DISABLE_WATCH_ROUTES: z.string().optional()` to server schema + runtimeEnv. Docstring: EMERGENCY ONLY, runbook cross-ref. **apps/web/src/components/MaintenanceFallback.tsx (26 lines):** Static Client Component mirroring ExperienceEmpty/ExperienceError layout shape. Zero fetching, zero client interactivity, zero imports beyond JSX. Failsafe contract. **apps/web/src/app/[slug]/page.tsx (45 lines added):** Module-scope IIFE parses CSV into `DISABLED_ROUTES: ReadonlySet<string>`. Whitespace tolerance, empty filter, warn-and-continue on entries missing leading `/` (well-formed siblings on the same CSV still take effect). At top of SlugPage, before resolveWatchPage is awaited: `DISABLED_ROUTES.has(/${slug})` returns <MaintenanceFallback />. Reads from module-scope set — no headers()/cookies() (would defeat ISR per the institutional learning). **apps/web/src/app/[slug]/page.test.tsx (7 tests):** Match → short-circuits (resolveWatchPage NOT called); whitespace tolerance; non-match falls through; unset env; empty string; malformed CSV warns + falls through; malformed sibling does not break well-formed entry. Note: test environment switched to node (not jsdom). @t3-oss/env-nextjs throws when window is defined, so the page module can't be imported under jsdom. react-dom/server's renderToStaticMarkup works fine in node. Documented inline in the test header. **MTTR numbers remain TODO** until operator runs two real flips and records observed timing. The runbook stays at status: draft until measured. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 34 files, 367 + 3 todo, 0 fails * fix(graphql): harness reliability + perf fixes from ce-code-review Address Group A findings from the consumer-migration ce-code-review: - REL-03 (run-batch-verification.ts): add pagination: { limit: -1 } to STRAPI_EXPERIENCE_QUERY's blocks selection. Strapi v5 silently caps nested relations at 10 rows, which would surface as false structural diffs in the cutover gate for any experience with >10 blocks. - PERF-01 (batch-verification.ts compareSlug): fetch Strapi and admin in parallel via Promise.allSettled instead of sequential await chains. Halves per-slug wall time; materially shortens the editorial-freeze window across a 1000-slug corpus. - REL-01 (batch-verification.ts postGraphQL): add isTimeoutOrAbortError discriminator before retry. AbortError/TimeoutError indicate the request hit the outbound budget — a persistent admin slowness is not a transient blip, so 3×3s + backoff is wasted gate-convergence time. Now fast-fails the per-slug fetch. - REL-02 (batch-verification.ts backoffDelayMs): add full-jitter via injectable RNG (defaults Math.random). Kills the thundering-herd where N concurrent workers hitting 429 all resume at the same ms. Test rewritten to bound delay within the exponential envelope and assert distribution non-determinism. Verification: pnpm --filter @forge/graphql typecheck clean, 9 test files, 164 tests pass (added 2 jitter tests). No api shape changes. * fix(web): bearer alignment + Apollo error log scrub from ce-code-review Address Group B (security) findings: - C2 (correctness): align bearer-presence check in fetchSlugExperience's safety-net branch with admin-client.ts. Both must classify whitespace- only or empty-first-CSV-entry values as 'unset' so the safety net fires consistently. admin-client uses `.split(",")[0]?.trim() || undefined`; safety net now uses the same parse via local `bearerFirstEntry`. - sec-002 (security): scrub the bearer key from causeError.message BEFORE logging forge.parity.admin_fetch_error. Apollo's response-error formatter can include downstream-server-controlled response body text in error.message — a hostile or misconfigured admin echoing the Authorization header in a 500 response would leak the bearer to Railway logs without this scrub. Mutation-tested by the cross- reviewer agreement between security and kieran-typescript. Verification: pnpm --filter @forge/web typecheck clean, 34 test files, 396 tests + 3 todo, 0 fails. * fix: type safety + docs cleanup from ce-code-review - AC-04 (api-contract + kieran-typescript, cross-reviewer): add `as const satisfies Record<Block["t"] | SectionContentBlock["t"] | ContainerContentBlock["t"], string>` annotation on T_TO_TYPENAME in apps/admin/src/graphql/types/blocks.ts. Now tsc enforces that every Zod `t` discriminator across all 3 unions has a matching Pothos typename entry at compile time. Pairs with drift-CI's runtime three-way bijection (typo-detection on typename values). - M-01 (maintainability): comment on WatchPageAdminError clarifying the discriminator field is `code` (matches WatchVideoError precedent; plan-003 used "kind" interchangeably). - M-04 (maintainability): update parity-bridge deletion checklist — the "every callsite of runDualReadComparison in content.ts" item is now zero callsites (U6's cutover removed the import). - W2 (agent-native): add `run-batch-verification` script alias to packages/graphql/package.json so the cutover runbook's `pnpm --filter @forge/graphql run-batch-verification` invocation works. Verification: pnpm --filter @forge/admin typecheck clean. * test(admin): add CONSUMER_BEARER case to isEditorOrAdmin (T-01 ce-code-review) principal.test.ts case table previously covered null/PUBLIC/VIEWER/EDITOR/ ADMIN/SYSTEM/WORKFLOW_TRIGGER but not CONSUMER_BEARER. Per the testing reviewer, this gap meant a future typo widening isEditorOrAdmin to include CONSUMER_BEARER would silently pass every test — and silently leak templates / drafts to anonymous web SSR traffic via the rate-limit bearer. Now every bearer role is tested explicitly. A regression surfaces here at test time, not at the rate-limit dashboard. * chore: trim verbose inline comments per the comment contract Apply the project's inline-comment contract across PR-B's surface: default to no comment; when warranted, lead with WHY in one short line; cut filler; remove U-ID / PR-ID references that belong in PR descriptions and git blame, not in code. Net: 21 files, ~616 lines of comment cruft removed (821 deletions, 205 new — mostly tightened header docstrings). Zero functional code changes. Preserved (load-bearing): - Zod construct audit in blocks.ts header (institutional learning for future drift) - T_TO_TYPENAME satisfies annotation rationale - Security notes (timing-safe assertion, log-scrub threat model, hostile admin echo, info-disclosure invariant) - Deletion checklists in parity-bridge.ts + content-api-mode.ts (the fast-follow deletion PR uses them as a source of truth) - TODO(post-strapi-removal) lifecycle marker on the runtime safety net - snapshot-locked report contract on SlugReport - Cross-references to docs/solutions/ and docs/plans/ that cite load-bearing rationale Verification: - pnpm --filter @forge/admin typecheck: clean - pnpm --filter @forge/graphql typecheck: clean - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/graphql test: 164/164 pass - pnpm --filter @forge/web test: 396/396 pass - pnpm --filter @forge/admin test: clean (one pre-existing fs-race flake in unrelated s3.test.ts not introduced here) * fix(graphql): unbreak batch-verification gate query + per-side timings F1 — ADMIN_EXPERIENCE_QUERY was broken in two ways that would cause every slug to fail with a GraphQL parse error: 1. Selected `description` (admin's field is `metaDescription`). 2. Selected bare `blocks` on a [ExperienceBlock!]! union — invalid GraphQL syntax for non-leaf types. Rewritten with: - `description: metaDescription` alias at top level so the response shape matches AdminExperienceLocaleInput's Strapi-vocab `description` key (the normalizer's adapted input type). - Full inline fragments on all 17 block kinds + nested SectionContentBlock and ContainerContentBlock unions, using admin's NATIVE field names (not the Strapi-vocab aliases the renderer fragments use). Required because normalize-admin.ts runs BlocksSchema.safeParse(), which validates against admin's Zod discriminated union with original field names. The harness query now parses cleanly. Schema-level field validity is enforced server-side when the harness runs against a live admin. F20 — compareSlug captured tStrapiStart=tAdminStart=Date.now() outside Promise.allSettled and tStrapiEnd=tAdminEnd=Date.now() after the join. Both timingMs values were identical = max(strapi, admin). Operators couldn't diagnose which side regressed. Timings now captured INSIDE each promise via tagged outcomes — strapiDurationMs and adminDurationMs reflect actual per-side wall-clock. Findings from the external review pass on PR #933. F1 was gate-blocking (harness would have failed loudly on every slug); F20 was a regression introduced in the earlier PERF-01 fix. * fix(web): apply ce-code-review batch (F2 F6 F7 F10 F12 F15) Five high-value findings from the external review, fixed before PR-B lifts from draft. - F2 fetchResolvedWatchPage now takes mode as an arg so unstable_cache derives the cache key from it. Was a stable cross-mode key before, which would serve stale cross-mode hits during the flip. - F6 ADMIN_BLOCK_TYPENAMES split into typed list const plus derived ReadonlySet. Default switch branch dev-warns when typename is in the set but not handled. - F7 error.tsx checks WatchPageAdminError shape by duck-type rather than instanceof. Next SSR to client serialization loses class identity. - F10 runbook Layer 1 wording corrected. Module-scope reads need a Railway redeploy; advantage is surgical scope not raw speed. - F12 experienceToMetadata handles admin flat ogImageUrl alongside Strapi nested ogImage. Without this branch admin-mode pages lose open-graph imagery. - F15 negative test where Apollo error message echoes the bearer; asserts log payload has redacted and zero bearer occurrences. Skipped: F14 (17 per-kind smoke tests) and F9 (cache-test rewrite needs integration setup) — too large for one batch. Verified: web typecheck, web test (15/15 content.test.ts), graphql typecheck + test (164/164). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): apply A3 F13 F18 from ce-code-review Pre-draft-lift cleanup batch on PR-B. - F13 bearer scrub now redacts EVERY CSV entry, not just the first. Mid- rotation both keys are live and either can show up in an echoed error body. ~10 LOC swap from split/join-once to a reduce over all entries. - F15 test extended to construct an error message containing both keys and assert neither survives. 2x aaa + 1x bbb = 3 redactions. - A3 added a post-rotation test where WEB_ADMIN_API_KEYS holds only the second key (operator removed the leaked first entry). Asserts admin path authenticates normally with no consumer_bearer_missing fallback. The Emergency revocation runbook section already exists at line 258 and covers the operator-facing procedure; A3 collapses to the test. - F18 MTTR gate removed from the runbook. Staging is decommissioned so the pre-flip-measurement-in-staging story doesn't apply. First prod flip doubles as the measurement flip; operator captures observed timings into the table during the flip and commits the doc same session. Status flips from "draft until measured" to "live". - Document-history table updated with F13 + F18 entries. Verified: web typecheck, content.test.ts 16/16. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ur-imazing
added a commit
that referenced
this pull request
May 13, 2026
…rification harness (PR-A + PR-B + PR-C) (#937) * feat(admin): consumer migration — admin-side prereqs (Units 1-3) (#932) * docs(plans): admin-core consumer migration u5b — direct cutover plan + brainstorm Drive the architecture pivot from phased ramp to direct cutover, motivated by the compressed Strapi-removal timeline (~1-2 weeks). Three artifacts: - Brainstorm 2026-05-11-consumer-migration-u5b-strapi-sunset-strategy-requirements.md establishes the working assumption (Strapi removal in ~1-2 weeks) and resolves scope to web-only cutover. Mobile/TV inherit the foundation in future brainstorms. - Plan 2026-05-11-003-feat-web-admin-direct-cutover-plan.md ships the cutover via two PRs: PR-A (admin prereqs — CONSUMER_BEARER bearer, isTemplate filter, Pothos block types) deploys first; PR-B (web cutover — mode collapse, fragments, cutover branch, error boundary, batch verification, runbook) opens after. Cutover gate is the batch-verification report (empty diff or allow-listed). - Plan 2026-05-11-002-feat-consumer-migration-unit-5b-web-admin-rendering-plan.md preserved as superseded for decision-history traceability. * feat(admin): admin-core consumer migration — CONSUMER_BEARER principal (Unit 1) Add a bearer-recognized identity to admin's auth system whose sole purpose is to bucket consumer SSR rate-limit traffic separately from anonymous-IP. The principal carries ZERO permissions beyond PUBLIC — it is an identity label for rate-limiting, not an authorization credential. Web's SSR fanout (Unit 6, future PR) will send `Authorization: Bearer <key>` so admin can bucket it as `consumer:<key>` instead of `public:<ip>`. Without this, web's shared Railway egress IP causes per-IP rate-limit collisions across all concurrent SSR requests. Changes: - consumer-bearer.ts mirrors workflow-bearer.ts (timing-safe comparison via `timingSafeEqual` from node:crypto; never naive `===`) - CONSUMER_BEARER role added to permissions.Role union with empty permission set; hasPermission early-return + meetsTier guard ensure the role never satisfies any gate - CONSUMER_BEARER_PRINCIPAL factory carries rateLimitBucketKey so rate-limit identifyFn reads it without re-inspecting headers - Principal-resolution chain: session → workflow-bearer → consumer-bearer → PUBLIC. Session wins to prevent accidental privilege downgrade if an editor sends a bearer alongside their session cookie. - rate-limit identifyFn returns `consumer:${bucketKey}` for CONSUMER_BEARER principals; existing public/workflow buckets unchanged. - env.WEB_ADMIN_API_KEYS added as optional CSV (per the institutional learning about required-without-default env vars breaking Railway deploys). Tests: - consumer-bearer.test.ts: 15 tests covering happy/edge/timing-safe structural assertion (source-file regex), UTF-8 byte-length guard, log-scrub assertion across all console methods. - permissions.test.ts: CONSUMER_BEARER describe block enumerates every PermissionKey + every bucket-key permutation; asserts source-file regex that WEB_ADMIN_API_KEYS and WORKFLOW_API_KEYS are distinct. - context.test.ts: 4 resolution-ordering tests + log-scrub. - rate-limit.test.ts (new file): per-key isolation across IPs (CGNAT scenario), authenticated-overrides-bearer precedence, defensive fallback when bucket key absent. Verification: - pnpm --filter @forge/admin typecheck: clean - pnpm --filter @forge/admin test: 117 files, 1720 tests, 1 todo, 0 fails * feat(admin): admin-core consumer migration — experienceBySlug isTemplate filter (Unit 2) Tighten `experienceBySlug` so PUBLIC and CONSUMER_BEARER callers cannot discover template experiences by slug. R9 of the consumer migration plan — without this filter, web's `asNonTemplateExperience` check would have to round-trip an extra field to differentiate templates from real experiences. VIEWER is intentionally NOT in the filtered cohort. Editorial-tier read-only callers keep template visibility — the filter is narrowly scoped to public-equivalent identities (no session, no session-derived role, or the CONSUMER_BEARER service principal added in Unit 1). Implementation: extends the existing `!isEditorOrAdmin` branch of `getBySlug` with an additional `user === null || role === CONSUMER_BEARER` gate that adds `isTemplate: false` to the existing `experience` filter object alongside `archivedAt: null`. EDITOR / ADMIN paths unchanged. Tests (in experience.service.test.ts): - Updated existing PUBLIC archived test: where.experience now includes isTemplate: false alongside archivedAt: null. - Added VIEWER assertion: where.experience does NOT include isTemplate (templates remain visible to read-only editorial-tier). - New "R9 template filter" describe block: per-principal where-shape assertions (PUBLIC, CONSUMER_BEARER, VIEWER, EDITOR, ADMIN). Manual smoke (operator follow-up): seed a template Experience, curl experienceBySlug via PUBLIC / CONSUMER_BEARER / EDITOR; PUBLIC + bearer should return null on the template, EDITOR should see the row. Verification: - pnpm --filter @forge/admin typecheck: clean - pnpm --filter @forge/admin test: 117 files, 1725 tests, 1 todo, 0 fails * feat(admin): admin-core consumer migration — Pothos block types + drift-CI (Unit 3) Replace admin's `ExperienceLocale.blocks: JSON` with a typed discriminated union at the GraphQL surface. Consumer apps (web slug-page in PR-B, mobile and TV in future brainstorms) can now select per-block-kind fields with gql.tada-typed `... on MediaCollectionBlock { ... }` spreads. Input/output asymmetry: mutations (`createExperience`, `updateExperienceLocale`) keep `blocks: t.arg({ type: "JSON" })` as input — editors continue passing opaque JSON that's Zod-validated server-side. Only the QUERY output side of `ExperienceLocale.blocks` changes shape. Admin's existing editorial UI keeps working unchanged. New file `src/graphql/types/blocks.ts` (~940 LOC): - 19 Pothos `builder.objectRef<Block>(name)` types (17 top-level kinds + quizButton + containerSlot). objectRef (not prismaObject) because the underlying value is a POJO projected from a JSON column, not a Prisma model — prismaObject would fail at schema-build time when the Pothos prisma plugin tries to validate against Prisma model expectations. - 7 leaf object types (BibleQuoteItem, InfoBlockItem, MediaCollectionItem, NavigationCarouselItem, RelatedQuestionItem, VideoCarouselItem, ContainerSlotSpans). - 10 enums (CardVariant, MediaCollectionVariant, ItemsSource — collapses MediaCollectionItemsSource and VideoCarouselItemsSource since the option sets are identical — plus CtaVariant, TextVariant, and four heading / subheading source enums). - 3 unions: `ExperienceBlock` (17 members), `SectionContentBlock` (13), `ContainerContentBlock` (10). Each uses `resolveType: (value) => T_TO_TYPENAME[value.t]` for JSON-to-typename dispatch. - Exported `T_TO_TYPENAME: Record<BlockKind, BlockTypename>` and inverse `TYPENAME_TO_T` as first-class typed artifacts. Drift-CI asserts bijection (typos in the lookup table fail the test; pure set-equality alone would silently pass). - Discriminator field `t: String!` exposed alongside `__typename` so consumers can dispatch on either. - `UnknownBlockKindError` thrown when stored JSON carries an unknown `t` value (data integrity issue worth surfacing as GraphQL error, not silently dropping the block). Pre-implementation Zod construct audit ran `grep` over `domain/blocks.ts` for `.regex/.transform/.refine/z.custom/.url/.email/z.union`: - 25 `.url()` → `String` (no admin URL scalar; validation at write seam). - 1 `.regex(/^https:\/\/.../)` on `QuizButtonBlockSchema.iframeSrc` → `String`. - 1 `z.custom<never>()` on `ContainerBlockSchema.slots` (legacy nested- slot payload) → `JSON` scalar with consumer-should-ignore docstring. - Zero `.transform`, `.refine`, `.email`, non-discriminated `z.union`. Drift-CI test (`blocks.drift.test.ts`, 11 tests) uses Zod 4 PUBLIC API: `BlockSchema.options.map(o => o.shape.t.value)` — NOT the Zod 3 `_def.options` / `_def.shape()` access that would break under a Zod major upgrade. Three assertions per union: 1. Zod `t` literal set ↔ Pothos union member set. 2. Zod `t` literals ↔ T_TO_TYPENAME keys. 3. T_TO_TYPENAME values ↔ Pothos type names registered in the schema. Plus a vacuous-pass guard asserting `BlockSchema.options.length > 0` so a future Zod major upgrade that empties the introspection fails loudly instead of silently passing with empty sets. Unit tests (`blocks.test.ts`, 47 tests): per-kind round-trip for all 19 kinds across all 3 unions; mixed-array nested-union dispatch (Section → Container → leaf); empty `blocks: []`; unknown discriminator throws `UnknownBlockKindError`; the error carries the unknown `t` value. Real-DB integration test (`describe.todo` in drift test): seeded ExperienceLocale + live-server query is deferred — admin has no existing test infra for end-to-end GraphQL-against-Prisma tests. Post-deploy smoke from U3's Verification covers it for now. Modified files: - `experience.ts` — `ExperienceLocale.blocks` switched from `t.field({ type: "JSON" })` to `t.field({ type: [ExperienceBlock], nullable: false })`. Mutation input shapes untouched (input/output asymmetry). - `schema.ts` — side-effect import of `./types/blocks` after `reference.ts` per admin's import-ordering rule. - `schema.test.ts:267` — assertion flipped from `/JSON/` to `/ExperienceBlock/` + negative `.not.toMatch(/JSON/)`. Without this, the test breaks at PR-A merge. - `apps/admin/schema.graphql` — regenerated via `pnpm --filter @forge/admin schema:print`. +430 lines: 3 unions, 19 block types, 10 enums, 7 leaves. - `packages/graphql/src/admin-graphql-env.d.ts` — regenerated via `pnpm --filter @forge/graphql generate`. +41 lines: union types + possibleTypes lists for consumer-side typing. Verification: - pnpm --filter @forge/admin typecheck: clean - pnpm --filter @forge/admin test: 119 files, 2023 tests, 1 todo, 0 fails - pnpm --filter @forge/admin schema:print: idempotent (regenerated cleanly) - pnpm --filter @forge/graphql generate: clean - pnpm --filter @forge/graphql test: 113 tests pass (downstream consumers in the parity harness continue working) * docs(solutions): consumer-bearer rate-limit-identity pattern * feat(web): consumer migration — web cutover branch + verification + runbook (Units 4-9 + UB7) (#933) * feat(web): consumer migration — ContentApiMode collapse + regression snapshot (Unit 4) First commit of PR-B. Per the test-first regression discipline at docs/solutions/best-practices/test-first-regression-snapshot-byte-identical-default-20260429.md, the regression snapshot lands first so subsequent units (U5-U9) cannot silently change strapi-mode behavior. Changes: - apps/web/src/env.ts: * Add WEB_ADMIN_API_KEYS (optional CSV string). Web's SSR sends the first entry as `Authorization: Bearer` so admin buckets web's traffic as `consumer:<key>` rather than `public:<railway-egress-ip>`. Optional per the institutional learning about required-without- default env vars bricking Railway deploys. * Keep FORGE_CONTENT_API enum permissive over the four historical values (strapi / dual-read / admin-with-fallback / admin). The SOFT-REMOVAL of dual-read and admin-with-fallback happens at the runtime narrower, NOT the schema validator — stale Doppler configs don't brick boot. - apps/web/src/lib/content-api-mode.ts: * ContentApiMode collapses to `"strapi" | "admin"` (was strapi|dual-read). * RECOGNIZED_MODES = ["strapi", "admin"]. LEGACY_SOFT_REMOVED_MODES = ["dual-read", "admin-with-fallback"] — these get a distinct "soft-removed; update your Doppler config" warn before falling back to strapi. Operators reading logs differentiate "I have a stale config" from "I have a typo". * Deletion-checklist docstring updated to point at plan-003 + reflect that ADMIN_GRAPHQL_URL + WEB_ADMIN_API_KEYS STAY post-Strapi-removal (they become required, not deleted). - apps/web/src/lib/content-api-mode.test.ts: * Rewrote tests for the new closed set. "admin" is now a valid mode (no warn). "dual-read" and "admin-with-fallback" fall back with the soft-removed warn. Added a differentiation test verifying the soft-removed warn message differs from the unknown-value warn. - apps/web/src/lib/__tests__/content-mode-regression.test.ts (NEW): * Source-of-truth input/output table. Each row locks "given env value X, getContentApiMode() returns Y." U5-U9 cannot silently change these outputs without updating the test in the same commit. * Type-level contract: ContentApiMode union has exactly 2 members. Compile-time exhaustiveness check via switch + `never` assertion. Note: fetchSlugExperience's branch table is NOT touched by U4. The existing dual-read code path becomes unreachable when `mode === "strapi"` (legacy values normalize to strapi) and continues to fire as canary-like behavior when `mode === "admin"`. U6 replaces the admin branch with the real cutover code (admin-only fetch + WatchPageAdminError throws). Between U4 merge and U6 merge, the "admin" mode produces admin-with-fallback-like behavior — acceptable because the env flip doesn't happen until U6 ships. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 32 files, 352 tests + 3 todo, 0 fails - Regression snapshot covers 5-input matrix: undefined / strapi / admin / dual-read / admin-with-fallback (the last two soft-removed to strapi). * feat(web): consumer migration — admin-shape fragments + dispatch + parity infra (Unit 5) Ship admin-shape `WatchExperience` fragments in `packages/graphql` for future mobile/TV reuse, extend the renderer dispatch with admin typenames, and update the parity infrastructure (bridge mode guard + normalizer type narrowing + fixture rewrites) for PR-A's typed-blocks schema. **Fragments (20 new files, 739 LOC in `packages/graphql/src/fragments/admin/`):** - Root `WatchExperience` fragment on `ExperienceLocale` using `adminGraphql()`. - 17 per-block-kind fragments + 2 nested-union fragments. Each selects fields available on the matching Pothos type from PR-A. - Re-exported through `packages/graphql/src/index.ts`; new `./admin/fragments` package.json exports entry. **Renderer dispatch (`apps/web/src/components/sections/index.tsx`):** - Adds 17 admin-typename cases alongside the existing Strapi cases. Both paths route through the existing per-kind renderers via a `renderAdminBlock(...)` helper. Additive (not replacement) so the branch stays shippable between U5 and U6 — U6 may remove Strapi cases when `content.ts` flips, or keep both for the cutover window. - `CardBlock` and `VideoRecommendationsBlock` cases emit dev-warn placeholders: CardBlock has no Strapi precedent / no renderer yet; VideoRecommendationsBlock needs videoId→slug hydration that's U6's scope. **Prop-shape audit:** `grep` for `.data?.` / `.attributes?.` in `apps/web/src/components/sections/` returned ZERO matches — Strapi v5's GraphQL plugin already flattens the response, so the wrapped-envelope problem doesn't exist on the renderer side. Relation-accessor diffs surfaced for video / images / imageOverride (`enrichment.ts`, `CarouselVideo.tsx`, `Video.tsx`); admin returns flat `videoId` instead of a joined Video row. All renderer code uses optional chaining with sane fallbacks (`titleOverride`, `imageUrl`), so admin payloads degrade rather than crash. U6 will wire videoId→slug hydration. **`admin-experience.ts` (web): updated, not deleted.** `content.ts` still imports `adminExperienceBySlugOperation` for the dual-fetch orchestration shape. Composed `AdminWatchExperience` so the query typechecks against PR-A's typed-union `blocks` field. U6 may delete when `content.ts` no longer needs the dual-fetch. **Parity-bridge mode guard (`apps/web/src/lib/parity-bridge.ts`):** - Added `isCanaryEmissionEnabled()` helper comparing `getContentApiMode() as string === "dual-read"`. The cast is deliberate: U4 collapsed `ContentApiMode` to `"strapi" | "admin"` so the literal `"dual-read"` can never match by type — the guard intentionally kills canary emission post-cutover. - Guard fires at the entry of `runDualReadComparison(...)`. Admin-mode and strapi-mode requests now emit ZERO `forge.parity.*` log events, so U9's monitoring section doesn't have to disambiguate "real admin failure" from "leftover canary noise." - 2 new tests in `parity-bridge.test.ts` assert admin/strapi modes short-circuit; existing 16 dual-read tests run under a mocked-mode pattern. **Parity normalizer (`packages/graphql/src/parity/normalize-admin.ts`):** - `AdminExperienceLocaleInput['blocks']` narrows from `unknown` to `ReadonlyArray<Block>` (admin's Zod-derived domain type). After PR-A, admin returns typed blocks by construction; the existing `BlocksSchema.safeParse(rawBlocks)` becomes defensive belt-and- suspenders — kept with an explanatory comment for runtime drift. - `normalize-admin.test.ts` — rewrote 3 happy-path block fixtures (mediaCollection, text, videoRecommendations) to typed-union shape matching `BlockSchema.options[N]` exactly. Added `LooseAdminInput` + `toInput()` helper for adversarial fixtures (intentional shape drift to exercise the defensive `safeParse` path). **Cross-package verification:** - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 32 files, 354 tests + 3 todo, 0 fails - pnpm --filter @forge/graphql typecheck: clean - pnpm --filter @forge/graphql test: 8 files, 113 tests, 0 fails - pnpm --filter @forge/graphql generate: both schemas regenerated cleanly - pnpm --filter @forge/admin typecheck (regression check): clean * feat(web): consumer migration â�� Unit 6 (cutover + admin bearer + cache re-throw) Wire the actual cutover branch. When `FORGE_CONTENT_API === "admin"`, web fetches from admin via a bearer-aware Apollo client; admin failures throw `WatchPageAdminError`; `unstable_cache` re-throws (not caches) admin errors so they reach UB7's segment error boundary. Strapi sentinel errors keep their current inline-rendered path. **admin-client.ts:** - Module-scope bearer read: `env.WEB_ADMIN_API_KEYS?.split(",")[0]?.trim()`. Inside `typeof window === "undefined"` guard so client bundles don't trip the server-env Proxy. - timeoutFetch override constructs a fresh `Headers(init?.headers)` per request and sets `Authorization: Bearer ${ADMIN_BEARER}` only when defined. Bearer undefined → header omitted entirely (admin treats request as anonymous PUBLIC). Fresh Headers also replaces any caller-supplied Authorization rather than merging — defense-in-depth against accidental key echo. - Per-call AbortSignal.timeout(3000) preserved (outbound-timeout-shorter- than-caller-budget discipline). **content.ts changes:** 1. `WatchPageAdminError` class (near WatchVideoError) — mirrors that precedent: ``` class WatchPageAdminError extends Error { readonly code: "NOT_FOUND" | "UNAVAILABLE" constructor(code, { cause }: { cause?: Error } = {}) { ... } } ``` `code` field (not `kind` as plan text suggested) — matches existing WatchVideoError precedent in the same file. UB7 dispatches on `error.code`. 2. fetchSlugExperience 2-case branch table: - `mode === "strapi"`: byte-identical to current main. - `mode === "admin"` AND `!env.WEB_ADMIN_API_KEYS` (runtime safety net for deploy-order errors): log `forge.parity.consumer_bearer_missing`, fall back to strapi semantics for THIS request. Inline TODO references the post-Strapi-removal switch to throwing UNAVAILABLE per plan-003 Key Technical Decisions. - `mode === "admin"`: call `fetchAdminSlugExperience`. Outcome classification (`error.name`-based, never message-substring per the AWS NoSuchKey learning): * `ok:true, response==null` → log `admin_null` + throw `NOT_FOUND` * `ok:true, response!=null` → return * `ok:"timeout"` → log `admin_timeout` + throw `UNAVAILABLE` * `ok:"error"` → log `admin_fetch_error` + throw `UNAVAILABLE` with original as `cause` 3. `unstable_cache` re-throw inside `fetchResolvedWatchPage`'s catch: ``` if (error instanceof WatchPageAdminError) throw error ``` Goes BEFORE the existing sentinel return. The prior plan flagged this as P0 — without it, the cache wrapper swallows the throw and the error boundary never fires. `unstable_cache` re-throws errors from its inner function (verified by the existing comment). `resolveWatchPage` is wrapped in React `cache()` which propagates the throw to `[slug]/page.tsx`. Strapi-mode generic Errors keep the sentinel path → page.tsx renders `<ExperienceError>` inline. 4. Removed `runDualReadComparison` import + `fetchStrapiSlugExperience` function — dead code after the cutover. The parity-bridge module itself stays (U5 added the mode guard; U5 deletion follow-up PR retires it). content.ts no longer calls into the bridge. **Tests:** - admin-client.test.ts: 6 new — bearer present, bearer absent (`Authorization` header NOT set), whitespace trim on first CSV entry, console-scrub invariant (mutation-tested: removing scrub locally fails), singleton preservation, per-call AbortSignal. - content.test.ts: ~14 cutover tests covering happy paths admin + strapi, NOT_FOUND, UNAVAILABLE via Apollo error / timeout / abort, consumer_bearer_missing safety net, cache re-throw + sentinel preservation, end-to-end propagation chain. - Mocked-shape discipline: Apollo errors thrown via `Object.assign(new Error("..."), { name: "ApolloError", networkError: ... })` — REAL typed shape, not generic Error. **Note on `kind` vs `code`:** plan-003 text used `kind` for the discriminator field. Existing `WatchVideoError` precedent uses `code`. Followed the precedent for codebase consistency. UB7's classifier will dispatch on `error.code`. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 32 files, 353 + 3 todo, 0 fails - Operator manual smoke (post-deploy): FORGE_CONTENT_API=admin WEB_ADMIN_API_KEYS=<key> ADMIN_GRAPHQL_URL=<url> + curl canary slug; verify admin called, Strapi not, log stream contains no bearer string. * feat(web): consumer migration — [slug]/error.tsx Client Component (UB7) Slug-page error boundary that catches typed WatchPageAdminError from U6's admin-mode fetch. Renders one of two static UX shapes matching the Strapi-mode inline rendering — end users see no behavior difference between admin-mode and Strapi-mode failures. **Mode-aware:** the boundary only handles `WatchPageAdminError`. Strapi- mode sentinel errors continue through the existing inline path in `page.tsx` and never reach this boundary (it's additive for admin mode). **Information-disclosure discipline:** `error.message` is NEVER rendered as visible text. The classifier dispatches on `error.code`: - `NOT_FOUND` → `<ExperienceEmpty>` (matches Strapi-mode's NO_EXPERIENCE_FOUND_MESSAGE inline render) - `UNAVAILABLE` → `<ExperienceError>` with hardcoded "Service temporarily unavailable" message + reset button (mirrors locale-level error boundary pattern). ExperienceError additionally sanitizes via its KNOWN_ERRORS table; the stable string maps to its generic fallback so admin-mode UNAVAILABLE and Strapi-mode generic-error renderings stay visually consistent. **Catch-all:** non-typed errors (or future codes not in the union) re-throw to Next.js's segment-default error boundary, which emits a generic 500 page without echoing the underlying error.message. Safe contract: this boundary handles two static cases and re-throws everything else. Pattern follows `apps/web/src/app/[slug]/[locale]/error.tsx` precedent (the 2-segment error boundary that the locale-level route uses). Tests (7): - Happy path NOT_FOUND renders `<ExperienceEmpty>` (no reset button — matches Strapi-mode behavior on empty content). - Happy path UNAVAILABLE renders `<ExperienceError>` + reset button. - Reset callback fires on Try again click. - Information-disclosure: NOT_FOUND error.message NEVER appears in DOM (mutation-tested with a planted secret fragment via both `cause` and the error's own message field). - Information-disclosure: UNAVAILABLE error.message NEVER appears in DOM (same planted-secret pattern). - Catch-all: generic Error (non-WatchPageAdminError) re-throws. - Catch-all: off-band code on WatchPageAdminError (forged via Object.defineProperty bypassing the TS readonly) re-throws — guards against a future widening of the code union without updating this boundary. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 33 files, 360 + 3 todo, 0 fails * feat(graphql): consumer migration — batch verification harness (Unit 8) THE cutover gate. Offline harness that fetches every published slug from Strapi + admin in parallel, diffs the responses via existing parity primitives, applies allow-list rules, and produces a structured per-slug report. Gate passes iff every remaining diff is allow-listed; the operator never flips FORGE_CONTENT_API to admin without a green gate. **Files:** - packages/graphql/src/parity/batch-verification.ts (886 LOC): Core harness. Module-level so it's both typechecked AND unit-testable. Exports the orchestrator (`runBatchVerification`), the per-slug comparator, args parser, env-bearer reader, stratified sampler, gate logic, JSON-report builder, and supporting helpers (`postGraphQL` with 429 backoff, `sanitizeError` for bearer redaction). - packages/graphql/scripts/run-batch-verification.ts (345 LOC): Thin CLI shim. Parses env + args, builds real GraphQL fetchers against Strapi + admin, calls runBatchVerification, writes report, emits exit code. Same scripts/-outside-tsconfig pattern as capture-parity-fixture. - packages/graphql/src/parity/batch-verification.test.ts (49 new tests): parseArgs, readBearerFromEnv hard-fail, sanitizeError redaction, allow-list combiner, stratified sample determinism + bucket distribution, per-slug compare across happy/admin-missing/strapi- missing/both-failed paths, bearer-redaction in errors, value-diff, allow-list application, runBatchVerification PASSED/FAILED gate paths, CONCURRENCY CAP ENFORCEMENT (instrumented counter), --since filter, no-crash-on-throwing-fetcher, backoffDelayMs cap, postGraphQL 429-then-200 retry, max-retries RateLimitExhaustedError, bearer header presence/absence, and a LOCKED JSON inline snapshot for the downstream report shape. **CLI surface:** --sample <n> representative-sample-first count (default null = full corpus). Stratified by updatedAt: oldest 30 / middle 40 / newest 30. --concurrency <n> bounded-parallel admin/strapi fetches (default 5). --out <path> report JSON output (default .tmp/batch- verification-<UTC-YYYYMMDD-HHmmss>.json). --allow-list <path> operator-supplied additional allow-list entries (JSON) — combined with DEFAULT_ALLOW_LIST from packages/graphql/src/parity/allow-list.ts. --since <iso> delta filter (only slugs with updatedAt > ISO) — for editorial-freeze workflows where the operator re-runs immediately before env-flip. --anonymous explicit opt-out of the bearer auto-read for local-dev use. Without this AND without WEB_ADMIN_API_KEYS set, the CLI HARD-FAILS with a clear error explaining the self-DoS risk (anonymous traffic hits the public:${ip} bucket that real end-user SSR is about to share). --help / -h **Bearer discipline:** - Auto-read from WEB_ADMIN_API_KEYS env (first CSV entry). Hard-fail when unset unless --anonymous opt-out. - sanitizeError strips the bearer token from all error messages before they hit the report or stderr. - Per-fetch AbortSignal.timeout(3000ms) matches U6's outbound budget. - HTTP 429 retry: 500ms base, exponential backoff capped at 30s, 3 attempts before RateLimitExhaustedError. **Gate + exit code:** - PASSED (exit 0): all four diff channels + errors total ZERO across the corpus. Allow-listed diffs don't count. - FAILED (exit 1): any non-allow-listed diff or error. - Misconfig (bad args, missing env, malformed allow-list JSON, unparseable --since) → exit 2. **Report JSON shape (downstream contract — locked by inline snapshot):** { generatedAt: ISO, totals: { slugs, withStructural, withValue, withOrder, withSemantic, withErrors, allowListed }, gate: "PASSED" | "FAILED", slugs: [{ slug, locale, structural / value / order / semantic / allowListed: { count, paths[] }, timingMs: { strapi, admin, compare }, error?: { side, message }, }] } **Dependencies:** added `p-limit ^7.3.0` (matches workspace versions in apps/admin and apps/manager). **Unresolved (operator-side):** - Strapi corpus query uses `filters: { isTemplate: { eq: false }, publishedAt: { notNull: true } }`. Not gql.tada-typechecked (scripts/ outside tsconfig). Operator validates on first run; if the Strapi deployment uses different filter literals (e.g. `$notNull`), the inline query string is patched in-place. The orchestration loop is decoupled via the Fetchers interface so the patch is one edit. - `I18NLocaleCode` GraphQL variable type hardcoded; same patch path. **Editorial-freeze coordination** is documented in U9's runbook (not this CLI). The --since flag exists for the alternative path where a freeze isn't operationally possible. Verification: - pnpm --filter @forge/graphql typecheck: clean - pnpm --filter @forge/graphql test: 9 files, 162 tests (49 new), 0 fails - pnpm --filter @forge/graphql lint: clean * feat(web): consumer migration — cutover runbook + route-disable feature flag (Unit 9) Final PR-B unit. Ships the cutover runbook + emergency route-disable mechanism (rollback layer 1). **docs/admin-core-migration/cutover-runbook.md (315 lines):** Comprehensive operator-facing doc. Status header marks `draft until measured` until the MTTR numbers are observed. Sections: 1. Pre-cutover checklist (8 items: PR-A deployed, WEB_ADMIN_API_KEYS set on both Doppler projects, ADMIN_GRAPHQL_URL healthy, SDL regenerated, batch verification gate green, editorial freeze coordinated for the 24-48h window between gate-green and env-flip). 2. Concurrent-backend exposure note (admin for slug-page, Strapi for homepage + watch-video during the cutover window; both must outlast TV burn-in completion before Strapi shuts down). 3. Mean-time-to-rollback measurement procedure (TODO placeholders until operator runs two test flips on forge-web; measures deploy timing + user-visible 5xx rate + cache thrash duration + maintenance-fallback response time; escalates if worst-case deploy > 10min OR user-impact 5xx > 5%). 4. Cutover procedure (5 sequential steps with shell-block commands). 5. Rollback layers in escalation order (4 layers): - Layer 1: FORGE_DISABLE_WATCH_ROUTES on forge-web Doppler (seconds). - Layer 2: revert FORGE_CONTENT_API to strapi (if Strapi live). - Layer 3: code-revert PR-B + redeploy (5-15 min). - Layer 4: revert admin SDL — last resort; sequenced ONLY after Layer 3 reverts web to Strapi so admin's blast radius is contained. 6. No-degraded-hybrid-mode escalation reminder. 7. Monitoring queries (forge.parity.* events, Apollo error rate, 5xx). 8. Planned bearer-key rotation R8a (90-day cadence, additive-then- remove-old, 7 steps). 9. Emergency bearer-key revocation (remove-first ordering, 5 steps, distinct from rotation). 10. Unbounded-cycles contingency (R6 + T-7 threshold; 3 ordered contingencies; explicitly NOT phased ramp). 11. TODO(U7) markers (canonical-plan U7: no-redeploy mechanism, parity-diff CI gate, GraphQL Armor recalibration). 12. Document history table. **apps/web/src/env.ts:** Adds `FORGE_DISABLE_WATCH_ROUTES: z.string().optional()` to server schema + runtimeEnv. Docstring: EMERGENCY ONLY, runbook cross-ref. **apps/web/src/components/MaintenanceFallback.tsx (26 lines):** Static Client Component mirroring ExperienceEmpty/ExperienceError layout shape. Zero fetching, zero client interactivity, zero imports beyond JSX. Failsafe contract. **apps/web/src/app/[slug]/page.tsx (45 lines added):** Module-scope IIFE parses CSV into `DISABLED_ROUTES: ReadonlySet<string>`. Whitespace tolerance, empty filter, warn-and-continue on entries missing leading `/` (well-formed siblings on the same CSV still take effect). At top of SlugPage, before resolveWatchPage is awaited: `DISABLED_ROUTES.has(/${slug})` returns <MaintenanceFallback />. Reads from module-scope set — no headers()/cookies() (would defeat ISR per the institutional learning). **apps/web/src/app/[slug]/page.test.tsx (7 tests):** Match → short-circuits (resolveWatchPage NOT called); whitespace tolerance; non-match falls through; unset env; empty string; malformed CSV warns + falls through; malformed sibling does not break well-formed entry. Note: test environment switched to node (not jsdom). @t3-oss/env-nextjs throws when window is defined, so the page module can't be imported under jsdom. react-dom/server's renderToStaticMarkup works fine in node. Documented inline in the test header. **MTTR numbers remain TODO** until operator runs two real flips and records observed timing. The runbook stays at status: draft until measured. Verification: - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/web test: 34 files, 367 + 3 todo, 0 fails * fix(graphql): harness reliability + perf fixes from ce-code-review Address Group A findings from the consumer-migration ce-code-review: - REL-03 (run-batch-verification.ts): add pagination: { limit: -1 } to STRAPI_EXPERIENCE_QUERY's blocks selection. Strapi v5 silently caps nested relations at 10 rows, which would surface as false structural diffs in the cutover gate for any experience with >10 blocks. - PERF-01 (batch-verification.ts compareSlug): fetch Strapi and admin in parallel via Promise.allSettled instead of sequential await chains. Halves per-slug wall time; materially shortens the editorial-freeze window across a 1000-slug corpus. - REL-01 (batch-verification.ts postGraphQL): add isTimeoutOrAbortError discriminator before retry. AbortError/TimeoutError indicate the request hit the outbound budget — a persistent admin slowness is not a transient blip, so 3×3s + backoff is wasted gate-convergence time. Now fast-fails the per-slug fetch. - REL-02 (batch-verification.ts backoffDelayMs): add full-jitter via injectable RNG (defaults Math.random). Kills the thundering-herd where N concurrent workers hitting 429 all resume at the same ms. Test rewritten to bound delay within the exponential envelope and assert distribution non-determinism. Verification: pnpm --filter @forge/graphql typecheck clean, 9 test files, 164 tests pass (added 2 jitter tests). No api shape changes. * fix(web): bearer alignment + Apollo error log scrub from ce-code-review Address Group B (security) findings: - C2 (correctness): align bearer-presence check in fetchSlugExperience's safety-net branch with admin-client.ts. Both must classify whitespace- only or empty-first-CSV-entry values as 'unset' so the safety net fires consistently. admin-client uses `.split(",")[0]?.trim() || undefined`; safety net now uses the same parse via local `bearerFirstEntry`. - sec-002 (security): scrub the bearer key from causeError.message BEFORE logging forge.parity.admin_fetch_error. Apollo's response-error formatter can include downstream-server-controlled response body text in error.message — a hostile or misconfigured admin echoing the Authorization header in a 500 response would leak the bearer to Railway logs without this scrub. Mutation-tested by the cross- reviewer agreement between security and kieran-typescript. Verification: pnpm --filter @forge/web typecheck clean, 34 test files, 396 tests + 3 todo, 0 fails. * fix: type safety + docs cleanup from ce-code-review - AC-04 (api-contract + kieran-typescript, cross-reviewer): add `as const satisfies Record<Block["t"] | SectionContentBlock["t"] | ContainerContentBlock["t"], string>` annotation on T_TO_TYPENAME in apps/admin/src/graphql/types/blocks.ts. Now tsc enforces that every Zod `t` discriminator across all 3 unions has a matching Pothos typename entry at compile time. Pairs with drift-CI's runtime three-way bijection (typo-detection on typename values). - M-01 (maintainability): comment on WatchPageAdminError clarifying the discriminator field is `code` (matches WatchVideoError precedent; plan-003 used "kind" interchangeably). - M-04 (maintainability): update parity-bridge deletion checklist — the "every callsite of runDualReadComparison in content.ts" item is now zero callsites (U6's cutover removed the import). - W2 (agent-native): add `run-batch-verification` script alias to packages/graphql/package.json so the cutover runbook's `pnpm --filter @forge/graphql run-batch-verification` invocation works. Verification: pnpm --filter @forge/admin typecheck clean. * test(admin): add CONSUMER_BEARER case to isEditorOrAdmin (T-01 ce-code-review) principal.test.ts case table previously covered null/PUBLIC/VIEWER/EDITOR/ ADMIN/SYSTEM/WORKFLOW_TRIGGER but not CONSUMER_BEARER. Per the testing reviewer, this gap meant a future typo widening isEditorOrAdmin to include CONSUMER_BEARER would silently pass every test — and silently leak templates / drafts to anonymous web SSR traffic via the rate-limit bearer. Now every bearer role is tested explicitly. A regression surfaces here at test time, not at the rate-limit dashboard. * chore: trim verbose inline comments per the comment contract Apply the project's inline-comment contract across PR-B's surface: default to no comment; when warranted, lead with WHY in one short line; cut filler; remove U-ID / PR-ID references that belong in PR descriptions and git blame, not in code. Net: 21 files, ~616 lines of comment cruft removed (821 deletions, 205 new — mostly tightened header docstrings). Zero functional code changes. Preserved (load-bearing): - Zod construct audit in blocks.ts header (institutional learning for future drift) - T_TO_TYPENAME satisfies annotation rationale - Security notes (timing-safe assertion, log-scrub threat model, hostile admin echo, info-disclosure invariant) - Deletion checklists in parity-bridge.ts + content-api-mode.ts (the fast-follow deletion PR uses them as a source of truth) - TODO(post-strapi-removal) lifecycle marker on the runtime safety net - snapshot-locked report contract on SlugReport - Cross-references to docs/solutions/ and docs/plans/ that cite load-bearing rationale Verification: - pnpm --filter @forge/admin typecheck: clean - pnpm --filter @forge/graphql typecheck: clean - pnpm --filter @forge/web typecheck: clean - pnpm --filter @forge/graphql test: 164/164 pass - pnpm --filter @forge/web test: 396/396 pass - pnpm --filter @forge/admin test: clean (one pre-existing fs-race flake in unrelated s3.test.ts not introduced here) * fix(graphql): unbreak batch-verification gate query + per-side timings F1 — ADMIN_EXPERIENCE_QUERY was broken in two ways that would cause every slug to fail with a GraphQL parse error: 1. Selected `description` (admin's field is `metaDescription`). 2. Selected bare `blocks` on a [ExperienceBlock!]! union — invalid GraphQL syntax for non-leaf types. Rewritten with: - `description: metaDescription` alias at top level so the response shape matches AdminExperienceLocaleInput's Strapi-vocab `description` key (the normalizer's adapted input type). - Full inline fragments on all 17 block kinds + nested SectionContentBlock and ContainerContentBlock unions, using admin's NATIVE field names (not the Strapi-vocab aliases the renderer fragments use). Required because normalize-admin.ts runs BlocksSchema.safeParse(), which validates against admin's Zod discriminated union with original field names. The harness query now parses cleanly. Schema-level field validity is enforced server-side when the harness runs against a live admin. F20 — compareSlug captured tStrapiStart=tAdminStart=Date.now() outside Promise.allSettled and tStrapiEnd=tAdminEnd=Date.now() after the join. Both timingMs values were identical = max(strapi, admin). Operators couldn't diagnose which side regressed. Timings now captured INSIDE each promise via tagged outcomes — strapiDurationMs and adminDurationMs reflect actual per-side wall-clock. Findings from the external review pass on PR #933. F1 was gate-blocking (harness would have failed loudly on every slug); F20 was a regression introduced in the earlier PERF-01 fix. * fix(web): apply ce-code-review batch (F2 F6 F7 F10 F12 F15) Five high-value findings from the external review, fixed before PR-B lifts from draft. - F2 fetchResolvedWatchPage now takes mode as an arg so unstable_cache derives the cache key from it. Was a stable cross-mode key before, which would serve stale cross-mode hits during the flip. - F6 ADMIN_BLOCK_TYPENAMES split into typed list const plus derived ReadonlySet. Default switch branch dev-warns when typename is in the set but not handled. - F7 error.tsx checks WatchPageAdminError shape by duck-type rather than instanceof. Next SSR to client serialization loses class identity. - F10 runbook Layer 1 wording corrected. Module-scope reads need a Railway redeploy; advantage is surgical scope not raw speed. - F12 experienceToMetadata handles admin flat ogImageUrl alongside Strapi nested ogImage. Without this branch admin-mode pages lose open-graph imagery. - F15 negative test where Apollo error message echoes the bearer; asserts log payload has redacted and zero bearer occurrences. Skipped: F14 (17 per-kind smoke tests) and F9 (cache-test rewrite needs integration setup) — too large for one batch. Verified: web typecheck, web test (15/15 content.test.ts), graphql typecheck + test (164/164). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): apply A3 F13 F18 from ce-code-review Pre-draft-lift cleanup batch on PR-B. - F13 bearer scrub now redacts EVERY CSV entry, not just the first. Mid- rotation both keys are live and either can show up in an echoed error body. ~10 LOC swap from split/join-once to a reduce over all entries. - F15 test extended to construct an error message containing both keys and assert neither survives. 2x aaa + 1x bbb = 3 redactions. - A3 added a post-rotation test where WEB_ADMIN_API_KEYS holds only the second key (operator removed the leaked first entry). Asserts admin path authenticates normally with no consumer_bearer_missing fallback. The Emergency revocation runbook section already exists at line 258 and covers the operator-facing procedure; A3 collapses to the test. - F18 MTTR gate removed from the runbook. Staging is decommissioned so the pre-flip-measurement-in-staging story doesn't apply. First prod flip doubles as the measurement flip; operator captures observed timings into the table during the flip and commits the doc same session. Status flips from "draft until measured" to "live". - Document-history table updated with F13 + F18 entries. Verified: web typecheck, content.test.ts 16/16. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(admin): add parity-bearer role + experienceTemplates query (PR-C, A1) (#935) * feat(admin): add parity-bearer role + experienceTemplates query (PR-C, A1) R9 hides template Experiences from CONSUMER_BEARER at the service layer so web/mobile/TV SSR can never render a template as a real page. The pre-cutover batch-verification harness needs to enumerate templates to prove parity with Strapi, so it needs an identity that CAN see templates without inheriting CONSUMER_BEARER's surface. This PR adds a narrow R9 carve-out: - New PARITY_BEARER role + factory in apps/admin/src/auth/principal.ts. Mirrors CONSUMER_BEARER bearer shape; minted at request time when Authorization matches PARITY_API_KEYS (new env var, distinct from WEB_ADMIN_API_KEYS and WORKFLOW_API_KEYS). - New permission key read:experience-templates. Granted to PARITY_BEARER via PARITY_BEARER_PERMISSIONS (single-entry set, CI-asserted) and to EDITOR+ via the tier ladder. PUBLIC and CONSUMER_BEARER stay locked out so R9's defense-in-depth holds. - New Pothos field experienceTemplates(locale: String!) returning ExperienceLocale list of published non-archived template locales. Service method listTemplateLocales mirrors getBySlug's shape. - R9 hide-rule in getBySlug widened to allow PARITY_BEARER to see templates by direct slug too (the harness needs to fetch each template's full content for comparison, not just enumerate slugs). - Context auth chain extended: workflow -> parity -> consumer -> public. Session wins all bearers. - Three CI surfaces enforce role isolation: 1. permissions.test.ts walks every PermissionKey and asserts PARITY_BEARER returns true ONLY for read:experience-templates. 2. Same test asserts the three bearer source files reference only their own env vars (PARITY != CONSUMER != WORKFLOW invariant). 3. parity-bearer.test.ts mirrors consumer-bearer.test.ts: timing- safe matching, Buffer.byteLength UTF-8 guard, log scrubbing. - Regenerated apps/admin/schema.graphql and packages/graphql/src/admin-graphql-env.d.ts. Operator setup before the harness can use this against admin: generate a fresh key (openssl rand -base64 32) and set PARITY_API_KEYS on forge-admin Doppler. Symmetric setup on whatever machine runs the harness. Out of scope for this PR: the batch-verification harness consumer wiring lives on feat/admin-consumer-migration-pr-b, not on the integration branch. One small follow-up commit on PR-B once both land will wire run-batch-verification.ts to use this new field. Verified: admin typecheck, admin test (2189/2189 across 137 files), graphql typecheck, graphql test (113/113). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(graphql): wire batch-verification harness to use parity-bearer + template corpus (PR-C, A1) Closes the A1 follow-up identified when PR-C was opened. PR-C's admin side shipped the PARITY_BEARER principal + the experienceTemplates field; this commit teaches the harness to consume them. - readBearerFromEnv now returns { bearer, templatesPermitted, source }. Precedence: PARITY_API_KEYS (templates included) -> WEB_ADMIN_API_KEYS (templates excluded, fallback) -> error. - Two corpus queries: STRAPI_CORPUS_QUERY_NO_TEMPLATES (current shape) vs STRAPI_CORPUS_QUERY_WITH_TEMPLATES (drops the isTemplate eq false filter). buildFetchers picks based on templatesPermitted. - main() destructures the new return shape, surfaces the bearer source to stderr at startup so operators know whether templates are covered, and warns explicitly when running in CONSUMER fallback mode. - Tests cover the precedence, both happy paths, both empty-CSV throw cases, and the BOTH-vars-named error message. Backward-compat path: running with only WEB_ADMIN_API_KEYS still works, emits a warning that templates are excluded, otherwise behaves identically. Set PARITY_API_KEYS to unlock template coverage with no other config change. Verified: graphql typecheck, graphql lint, graphql test (168/168 across 9 files — was 113 before this commit; +55 in batch-verification.test.ts includes the new readBearerFromEnv branch coverage). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(admin): quick wins from PR-C code review (P3-15/17/20/21, P2-9) Documentation-only / inline-comment / trivial-test additions from the multi-persona code review on PR-C. No behavior change. - P3-15: add WEB_ADMIN_API_KEYS + PARITY_API_KEYS placeholders to apps/admin/.env.example with the three-disjoint-CSVs invariant documented inline. - P3-17: update HELP_TEXT in batch-verification to name both bearer env vars and the template-coverage implication of each. - P3-20: principal.ts rateLimitBucketKey JSDoc was stale (said "set only on CONSUMER_BEARER" — PARITY_BEARER also sets it). Rewritten to cover both, with the per-role bucket-namespace contract called out so future readers see consumer:/parity: are independent quotas. - P3-21: PARITY_BEARER added to the isEditorOrAdmin negative-test matrix. Defense-in-depth: a typo widening editorial-tier to include the parity bearer would silently leak drafts via Experience.locales relation paths. - P2-9: R9 carve-out comment in experience.service.ts getBySlug rewritten to enumerate WHICH roles pass through and why (VIEWER for editorial staff inspection; PARITY_BEARER for the harness's full- record comparison). Removed WORKFLOW_TRIGGER from the explanation — it cannot reach this code path anyway. Verified: admin typecheck, principal.test.ts 10/10. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(admin): p1 fixes from PR-C code review - P1-1: rate-limit identifyFn now buckets PARITY_BEARER as parity:<key>. Without this branch the principal fell through to public:<ip> and the harness would self-DoS — exactly the scenario BearerMissingError exists to prevent. Distinct namespace from consumer:<key> keeps quotas separate as defense-in-depth against the P1-2 invariant ever breaking. - P1-2: runtime CSV-value disjointness via new exported assertBearerCsvsDisjoint(). The auth chain is workflow → parity → consumer → public, so a shared key value silently widens to the higher-tier role. Fires at module load on every env import; error message names the conflicting env vars but redacts the key value. - P1-3: schema description fixed from "EDITOR+" to "VIEWER+". The read:experience-templates key is registered at VIEWER tier (consistent with read:experiences), so VIEWER staff sessions are legitimately allowed. Prefixed with "INTERNAL — pre-cutover parity verification only; will be removed at R8 cutover." (clearer than @deprecated which would mislead the harness team). - P1-4a: createContext parity branch tests. Existing tests passed only because env.PARITY_API_KEYS was undefined; mocked isValidParityBearer and added four cases — mints PARITY_BEARER on valid header, workflow wins over parity on shared key, parity wins over consumer on shared key, session wins over parity. - P1-4c: listTemplateLocales service tests. Auth gate proven at the service layer (PUBLIC + CONSUMER_BEARER → ForbiddenError, PARITY/VIEWER/EDITOR/ADMIN pass through). Where-clause and orderBy locked in. - R9 getBySlug carve-out test for PARITY_BEARER: PUBLISHED + archivedAt:null still apply, but isTemplate filter is dropped. - P1-4b descoped: Pothos end-to-end integration test would require new test infrastructure (no createTestContext helper exists in admin's GraphQL test surface). Three layers of coverage — permissions.test.ts matrix, service-layer hasPermission, and scope-auth library enforcement — provide sufficient defense. Adding a Pothos integration harness is a separate ce-work scope. Verified: admin typecheck, six affected test suites 193/193 (env 13, context 15, experience.service 50, rate-limit 12, permissions 93, principal 10). Schema regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(admin): p2 fixes from PR-C code review - P2-5: experienceTemplates is now paginated. limit (default 50, max 200) + offset (default 0) mirror the sibling experiences field. An authenticated PARITY/VIEWER caller can no longer drive an unbounded scan; costLimitPlugin caps field count not rows, so this is the load-bearing bound. - P2-6: harness now actually uses experienceTemplates. After corpus enumeration, when running in PARITY mode, iterates each locale present in the corpus and calls admin's experienceTemplates(locale) to surface admin-only template slugs that the Strapi-sourced corpus comparison cannot catch. Informational only (logged to stderr); does not change gate PASS/FAIL — content-drift is still caught by the per-slug compare path. - P2-8: outer error catch now routes through sanitizeError with the best-effort bearer reconstruction from process.env. Inner catches in main() already sanitize; this is the last-resort safety net for unanticipated throws. - P2-11: experienceTemplates list is now non-null ([ExperienceLocale!]!). Empty corpus and null become distinguishable for the harness; service always returns an array so this matches reality. Note: the sibling experiences field still has the nullable-list asymmetry — fixing it is a follow-up to avoid scope creep here. - P2-14: parity-bearer.test.ts beefed up to match consumer-bearer coverage — uppercase BEARER prefix, multi-whitespace between prefix and key, whitespace-only key, empty-string env (distinct from undefined), and the byteLength-guard test now asserts both no-throw AND {valid:false}. - P3-16 (folded in): schema description prefixed with INTERNAL — pre-cutover parity verification only; will be removed at R8 cutover. Clearer than @deprecated which would mislead the harness team. Verified: admin typecheck, graphql typecheck, two affected admin suites 67/67 (parity-bearer 14, experience.service 53), graphql suite 168/168. Schema regenerated; diff is the field signature only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(solutions): compound the parity-bearer pattern + soften consumer-bearer doc Captures the architecture decision in PR-C as a durable learning. ce-compound ran in Full mode with session-history search; three research subagents + the session historian converged on: - HIGH overlap (5/5 dimensions) with consumer-bearer-rate-limit-identity- pattern-20260513.md — the external code reviewer's prompt anticipated PARITY_BEARER as a "second worked instance" of that doc. - Three pieces deserve their own home rather than an inline append: (1) three-way disjointness invariant across N bearer CSVs (the existing doc's two-way !== assertion doesn't generalize), (2) narrow non-empty permission allowlist (breaks the existing doc's load-bearing "literally empty set" invariant — needs explicit re-derivation), (3) distinct rate-limit namespace per role (parity:<key> vs consumer:<key>) so siblings don't share quotas. New doc: parity-bearer-narrow-carveout-pattern-20260513.md. 8-section guidance with code quoted from the actual PR-C files. Covers test surface, operational follow-through (receiver-first deploy, rotation, threat-model delta), and a counter-example. Updated doc: consumer-bearer-rate-limit-identity-pattern-20260513.md. - Added last_updated: 2026-05-13. - Softened "Why This Matters" first bullet — empty-set is specific to CONSUMER_BEARER's rate-limit-bucketing purpose, not a universal property of the bearer-identity pattern. - Softened "When to Apply" condition 3 — now mentions minting a new bearer role with a narrow allowlist as a third option alongside extending WORKFLOW_TRIGGER_PERMISSIONS or using a different existing role. - Added PARITY pattern as the first entry in Related Patterns with the "second worked instance" framing. - Notes-section entry recording the 2026-05-13 update. - Cleaned up trailing </content></invoke> XML garbage from the original Write tool invocation. Discoverability check passed without edit — root CLAUDE.md already references docs/solutions/ 18 times in its Known Patterns section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(solutions): compound learning on local admin web UI being impractical for dev Captures the multi-symptom local-dev arc hit during the consumer-migration debug: - /dashboard infinite-redirects because admin's proxy treats localhost:3003 as both auth-host and admin-host (in prod they're separate hostnames). - /api/* returns 404 from the same proxy host-collision (original U5 smoke finding, now expanded). - The 127.0.0.1 workaround triggers Next.js dev-origin guard, blocking the JS bundle, causing the login form to fall back to HTML GET method with credentials in the URL. Browser history + admin server log both capture the password. - ADMIN_AUTH_MODE=oauth compounds the issue: even successful login at apps/auth on :3004 lands back on /dashboard which still loops. Net resolution: local admin UI is impractical for dev work without divergent local-only patches. Use the in-process CLI workflows (pnpm run-sync, run-experience-dump, run-embeds, trigger-enrichment) which bypass the auth surface entirely, or point at prod admin for visual QA needs. Supersedes the earlier framing in the personal auto-memory entry that recommended 127.0.0.1 as a workaround without warning about the credential-leak failure mode. Cross-references the local-embed-pipeline and admin-sso-oauth docs that supply the positive alternatives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ur-imazing
added a commit
that referenced
this pull request
May 14, 2026
…n scaffolding revert + Video widenings) (#939) * docs: capture web→admin rebuild brainstorm, plan, and harness defects writeup Three durable artifacts that motivate and structure the rebuild: - Brainstorm requirements doc (R1-R20 + deferred Open Questions from ce-doc-review) — adopts admin's schema as the only data source for apps/web, with admin's prod widening untouched and packages/graphql frozen for mobile/TV continuing Strapi access. - Plan (22 implementation units across 7 phases) — branch foundation, 4 admin widenings inside the branch (deviation from R17), new packages/admin-graphql package, web data-layer rebuild, local fixture seeding, admin→web revalidation webhook, final verification. - Parity-harness defects writeup — the three stacked blockers from the 2026-05-14 prod-cutover smoke attempt that triggered the pivot away from migration framing. * feat(ce): pause apps/web UI feature work on main during rebuild (U1) Adds an Active Freeze section at the top of CLAUDE.md so contributors see the freeze before reading project overview. References the plan file for scope and names the rebase trigger (critical fixes touching apps/web/src/lib/, apps/web/src/app/, shared types, or packages/graphql). * feat(web): revert migration scaffolding (U2) Strips the web-side scaffolding that exists only because of the Strapi→admin migration framing. After this commit, web is back on Strapi-only reads through packages/graphql's graphql() factory — clean baseline before the rebuild begins. Deleted: - parity-bridge + tests (canary log emitter) - content-api-mode + tests (FORGE_CONTENT_API mode reader) - admin-client + tests (the canary's separate Apollo client; rebuilt in U13 against the new admin-only package) - fragments/admin-experience.ts (the only admin-bound web fragment) - __tests__/content-mode-regression.test.ts - docs/admin-core-migration/cutover-runbook.md (origin R4) Also removes files orphaned by the env-var + class deletions: - MaintenanceFallback.tsx + slug-page test that exercised FORGE_DISABLE_WATCH_ROUTES - slug-segment error.tsx + test (added in PR #933 specifically for WatchPageAdminError; dead with that class gone) Modified: - content.ts — strips fetchAdminSlugExperience, fetchSlugExperience dispatcher, WatchPageAdminError, mode-keyed unstable_cache, related event logging; collapses resolveSlugPage back to direct Strapi reads - content.test.ts — drops admin mocks and the cutover describe block - fragments/index.ts — drops adminExperienceBySlugOperation re-export - [slug]/page.tsx — drops dual-read branching + maintenance fallback - env.ts — removes FORGE_CONTENT_API, FORGE_PARITY_DEBUG, FORGE_DISABLE_WATCH_ROUTES blocks and host-allowlist helpers; ADMIN_GRAPHQL_URL and WEB_ADMIN_API_KEYS stay .optional() for now (flipped to required in U13) apps/admin/src/domain/package.json (the ESM/CJS workaround) stays untracked; it deletes in U3 alongside the parity directory that depends on it (doc-review F1). * feat(graphql): trim to Strapi-only (U3) Strips every admin-side artifact and the parity harness from the shared GraphQL package. After this commit, packages/graphql emits only the Strapi graphql() factory and its types. Mobile and TV continue consuming it unchanged; web has nothing in here for now (it gets the new packages/admin-graphql in U9). Deleted: - src/parity/ entire directory (15 source files + fixtures + tests) - scripts/capture-parity-fixture.ts, scripts/run-batch-verification.ts - src/admin.ts (the adminGraphql() factory) - src/admin-graphql-env.d.ts (admin gql.tada introspection output) - src/fragments/admin/ entire directory (19 admin block fragments + barrel + watch-experience root; relifted into the new package in U9 from git history) - src/__tests__/dual-client.types.ts (type-isolation guard; replaced by single-package equivalent in U11) - apps/admin/src/domain/package.json (ESM/CJS workaround; its only consumer was src/parity/normalize-admin.ts, gone above) - Now-empty parent directories (scripts/, src/__tests__/, src/fragments/admin parent) cleaned for tidiness Modified: - src/index.ts — collapsed to Strapi exports only (graphql, readFragment, FragmentOf, ResultOf, VariablesOf) - package.json — dropped ./admin, ./admin/fragments, ./parity exports; dropped @forge/admin devDep, zod devDep, p-limit dep; dropped run-batch-verification script. @forge/cms kept (build- graph signal even though no direct imports) - tsconfig.json — dropped the 'admin' schema entry from the gql.tada plugin config; only 'strapi' remains - CLAUDE.md, AGENTS.md — rewritten for single-schema state with forward pointers to the upcoming @forge/admin-graphql Pre-flight grep confirmed mobile and TV import only the Strapi graphql() factory + ResultOf type from @forge/graphql — no admin references anywhere. R18 holds. Verified: pnpm --filter @forge/graphql typecheck passes; codegen regenerates only graphql-env.d.ts; mobile, TV, web all typecheck against the trimmed package. Out of scope for U3 (handled later): - turbo.json generate task still lists admin schema inputs/outputs (U10 owns the per-package CI split) - apps/admin/AGENTS.md and src/scripts/print-schema.ts still reference adminGraphql / admin-graphql-env (retargeted to the new package in U9-U12) * feat(admin): tear down PARITY_BEARER scaffolding (U4) With the parity harness deleted in U3, PARITY_BEARER has no consumer. Admin's posture toward CONSUMER_BEARER, PUBLIC, EDITOR, and ADMIN principals stays exactly the same — this commit removes dead code authenticating a caller that no longer exists. Deleted: - src/auth/parity-bearer.ts + parity-bearer.test.ts Modified: - src/auth/principal.ts — dropped PARITY_BEARER role + PARITY_BEARER_PRINCIPAL factory; refreshed JSDoc - src/auth/permissions.ts — dropped read:experience-templates key, matrix entry, PARITY_BEARER_PERMISSIONS set, and the PARITY_BEARER early-return branch - src/auth/permissions.test.ts, principal.test.ts — drop PARITY_BEARER coverage - src/graphql/context.ts — drop parity-bearer branch from resolution chain (now workflow → consumer → public) - src/graphql/context.test.ts — drop 5 PARITY_BEARER tests - src/graphql/plugins/rate-limit.ts — drop the role === 'PARITY_BEARER' branch. CONSUMER_BEARER routing to consumer:<key> is unchanged (doc-review S1 verification) - src/graphql/plugins/rate-limit.test.ts — drop 3 PARITY_BEARER tests; CONSUMER_BEARER + anonymous + authenticated coverage preserved (9 tests pass) - src/graphql/types/experience.ts — drop the experienceTemplates Pothos field (PARITY-only) - src/services/experience.service.ts — drop listTemplateLocales method + PARITY_BEARER bypass comment in getBySlug - src/services/experience.service.test.ts — drop PARITY_BEARER getBySlug test + listTemplateLocales describe block - src/config/env.ts — drop PARITY_API_KEYS from Zod schema + runtimeEnv; revert assertBearerCsvsDisjoint to two-way (WORKFLOW_API_KEYS vs WEB_ADMIN_API_KEYS) - src/config/env.test.ts — collapse three-way disjointness tests to two-way - .env.example — drop PARITY_API_KEYS references - schema.graphql — regenerated; experienceTemplates field gone Doppler follow-up (operator runs manually): - doppler secrets delete PARITY_API_KEYS --project forge-admin --config dev - doppler secrets delete PARITY_API_KEYS --project forge-admin --config prd Verified: pnpm --filter @forge/admin typecheck passes; pnpm --filter @forge/admin test passes (137 files, 2186 tests + 1 todo); zero PARITY_BEARER references in apps/admin/src. Reference: docs/solutions/architecture-patterns/parity-bearer-narrow-carveout-pattern-20260513.md (the pattern doc framed this as throwaway scaffolding and is the canonical teardown recipe). * feat(admin): widen Video.parents, Video.children, and Video.locales(locale) (U5, U6) Three additive PUBLIC-surface widenings on the Video type, all needed by the upcoming web rebuild: - Video.parents: [VideoRelation!] — new relation through the VideoRelation join table. Anonymous callers see only relations whose target Video has at least one PUBLISHED locale and deletedAt IS NULL; EDITOR/ADMIN see all. Powers the web watch- page sibling carousel. - Video.children: [VideoRelation!] — same shape as parents. - Video.locales(locale: String) — added optional locale arg to the existing relation. Filters to a single locale when provided; default behavior (all locales) preserved when omitted. Lets web fetch single-locale fields (description, snippet, imageAlt) without overfetching every locale. Chose Option A (relation arg) over Option B (new videoBySlug(slug, locale) overload) — Option A composed cleanly with no nullable-shape friction. Pattern: mirrors the principal-aware relation filter at apps/admin/src/graphql/types/experience.ts:92-99 (Experience.locales). Filter logic extracted to exported helpers (videoLocalesFilter, videoParentsFilter, videoChildrenFilter) so the new principal- aware regression test can exercise them as pure functions — Pothos doesn't expose query callbacks via introspection. New: VideoRelation Pothos prismaObject — minimal projection (id, order, parent, child) of the join table. Preserves relation ordering for downstream renderers. Classified as public-shape. Changes: - src/graphql/types/video.ts — add parents/children relations + VideoRelation type; locale arg on locales; export filter helpers - src/graphql/classification.test.ts — register the new public- shape relations and VideoRelation in RELATION_TARGETS - schema.graphql — regenerated (additive only) - src/graphql/types/video.principal-filter.test.ts (new) — 18 unit tests per principal × per relation × per locale-arg variant Reference: docs/solutions/graphql/pothos-public-widening-multi-layer-coordination-20260511.md Doc-review S1 verification: identifyForRateLimit unchanged; CONSUMER_BEARER still routes to consumer:<key>; all 9 rate-limit tests pass. Verified: pnpm --filter @forge/admin typecheck passes; pnpm --filter @forge/admin test passes (138 files, 2219 tests, 1 todo). Public-resolvers regression manifest unchanged (field-level relations don't appear in the top-level resolver manifest). * feat(web): refactor template/homepage routing to watchSetting (U7, U8) Two web-side cleanups in apps/web/src/lib/content.ts that remove web's reliance on the isTemplate and isHomepage Experience fields. Web's user-facing rendering is unchanged — only the field used to decide template-vs-experience routing and homepage resolution. U7 — Option B (selected for security posture preservation): - resolveSlugPage now reads watchSetting first and uses watchSetting.defaultTemplateExperience.slug as the single source of truth for template-route decisions. The previous asNonTemplateExperience / asTemplateExperience helpers and the INVALID_DEFAULT_TEMPLATE / INVALID_HOMEPAGE_EXPERIENCE validation throws are gone — we trust admin's data model rather than re-asserting field-flag invariants. - isTemplate dropped from the WatchExperience fragment selection. - Admin-side Experience.isTemplate stays STRIPPED_FOR_PUBLIC (verified via grep) — admin's auth posture is preserved per doc-review S3. U8 — Drop the legacy isHomepage fallback in resolveHomepage: - resolveHomepage reads watchSetting.homepageExperience as the only source. The getExperienceByFilters(locale, isHomepage: eq: true) fallback path is gone. - isHomepage dropped from the WatchExperience fragment selection. Behavior invariant: a slug that previously rendered as a template still does; the home route renders the same homepage Experience. Only the routing-decision code path changes. Tests rewritten in content.test.ts: - New: homepage Experience returned from watchSetting - New: missing-homepage error when watchSetting.homepageExperience is null - New: explicit-experience match when slug differs from template - New: template-route routing when slug matches defaultTemplateExperience.slug - Removed: INVALID_DEFAULT_TEMPLATE / INVALID_HOMEPAGE validation tests (no longer applicable) Surfaced but intentionally left alone: - apps/web/src/app/api/revalidate/route.ts retains an isTemplate? field on the Strapi webhook payload type. Not used by any logic; the whole route gets reshaped to admin's webhook in U21. Out of scope here. - apps/web/src/components/ExperienceError.tsx still has dictionary entries for the deleted error strings. Dead text, no cost; not worth widening scope to remove. Verified: pnpm --filter @forge/web typecheck passes; pnpm --filter @forge/web test passes (400 tests, 33 files, 3 todo). * fix: address ce-code-review P1 findings + small cleanups Two P1 findings from ce-code-review on the branch + three small cleanups bundled together. P1 — fix CI: stale admin-graphql-env.d.ts references would fail the graphql-generate job (project-standards ps-001, anchor 100): - turbo.json: drop apps/admin/schema.graphql + admin-graphql-env.d.ts from the generate task's inputs/outputs. The Strapi side stays; admin codegen returns when packages/admin-graphql lands in U9-U10. - .github/workflows/ci.yml: drop admin-graphql-env.d.ts from the graphql-generate diff check. CI now matches the trimmed package shape post-U3. P1 — fix security: web still reads Strapi, which has no read-side filter for isTemplate. U7's Option B refactor removed the client-side asNonTemplateExperience guard (security sec-001 + adversarial adv-001, cross-reviewer agreement, anchor 75): - apps/web/src/lib/content.ts: re-introduce template-exclusion at the Strapi query layer via isTemplate: { eq: false } in the experiences filter. Defense-in-depth lives until U13 cuts web over to admin (which strips isTemplate from PUBLIC). P3 cleanups (maintainability M-001, M-002): - apps/admin/src/services/experience.service.ts: stale comment referenced the deleted asNonTemplateExperience helper. Rewritten to describe the current contract (templates hidden from PUBLIC + CONSUMER_BEARER via the service-layer where-clause). - apps/web/src/app/api/revalidate/route.ts: drop unused isTemplate field from StrapiWebhookPayload type. No logic ever read it; left over from the dual-read era. Deferred to U13 (advisory, not blocking): - adv-002 (WEB_ADMIN_API_KEYS declared but unused) — env var stays for U13's Apollo wiring; the existing comment in env.ts documents its intended consumer. Removing now and re-adding in U13 is churn. Verified: pnpm --filter @forge/web typecheck passes; pnpm --filter @forge/web test passes (400 tests, 33 files, 3 todo). Admin tests unaffected; admin SDL unchanged. * chore: regenerate pnpm-lock.yaml after U3 dep removals packages/graphql/package.json dropped @forge/admin, zod, p-limit in U3 (parity harness removal) but the lockfile wasn't regenerated, causing CI's pnpm install --frozen-lockfile to fail with ERR_PNPM_OUTDATED_LOCKFILE across commit-lint, format, and affected jobs. Pure lockfile-only diff: removes the 9 entries those three packages contributed to packages/graphql's resolved deps. * fix: address ce-code-review findings (proper run) ce-code-review (proper Stage 1-6 flow this time) on the branch surfaced 20 actionable findings + 7 advisory acknowledgements. Auto-resolve with best-judgment applied 13 fixes; 7 acknowledged as intentional design. P1 fixes: - apps/web/src/env.ts — ADMIN_GRAPHQL_URL and WEB_ADMIN_API_KEYS now carry inline 'Placeholder — wired in U13' comments so they don't read as dead code (M-01) - apps/web/src/lib/content.ts — WatchExperience type anchored to FragmentOf<typeof watchExperienceFragment>. Both GET_WATCH_EXPERIENCE and GET_WATCH_SETTINGS query projections now reference the same fragment-derived type, eliminating gql.tada cast drift between two ResultOf paths (KT-01) P2 fixes — documentation drift: - apps/admin/AGENTS.md — SDL-emission section rewritten to drop the stale admin-graphql-env.d.ts regen step; notes the admin codegen consumer lands in U9 (PS-001) - CLAUDE.md (root) — 'Admin-side change flow' rewritten to reflect packages/graphql is Strapi-only; admin codegen will live in future packages/admin-graphql under U9 (PS-002) - packages/graphql/CLAUDE.md + AGENTS.md — forward references to @forge/admin-graphql annotated as '(future — lands in U9)' (adv-003) P2 fixes — runtime correctness + safety: - apps/web/src/lib/fragments/watch-experience.ts — isTemplate re-added to the fragment for defensive validation - apps/web/src/lib/content.ts — INVALID_HOMEPAGE_EXPERIENCE_MESSAGE and INVALID_DEFAULT_TEMPLATE_MESSAGE runtime guards restored in resolveHomepage and resolveSlugPage. The watchSetting-based routing logic stays; isTemplate is consulted defensively only. Prevents silent template-as-homepage / non-template-as-template misconfigurations from rendering (adv-002) - apps/web/src/lib/content.ts — TODO(U14) prepended to the isTemplate Strapi-filter workaround comment so the deletion trigger is in-code (M-03) - apps/web/src/lib/content.ts — templateSlug comparison normalized via toLowerCase() on both sides; case-mismatched template slugs no longer silently mis-route (adv-005) P2 fixes — type safety: - apps/admin/src/graphql/types/video.ts — explicit return type annotations on videoLocalesFilter / videoParentsFilter / videoChildrenFilter using Prisma.VideoLocaleWhereInput / Prisma.VideoRelationWhereInput unioned with Record<string, never>. Prisma where shape now machine-checked (KT-04) P3 fixes: - apps/admin/src/graphql/types/video.ts — videoLocalesFilter locale predicate tightened to typeof+length>0 so empty-string locale arg behaves like 'no filter' (C3) Test additions: - apps/web/src/lib/content.test.ts — two new tests covering resolveSlugPage's null-template and null-streamingUrl return branches that were untested after U7/U8 (T-01, T-02) - apps/admin/src/graphql/types/video.principal-filter.test.ts — ADMIN+locale test case + empty-string locale edge case (T-03) Acknowledged (no code change — intentional): - adv-001: deploy-ordering CONSUMER_BEARER fall-through (operational, documented in plan U13) - KT-03: admin-graphql-env.d.ts CI drop (intentional per U3; new CI job lands in U10) - M-04: filter helpers co-located with Pothos schema (observational) - M-02: unconditional getWatchSettings prefetch (intentional simplicity; cached at route level) - AC-004: codegen gap until U9 (temporary, documented) - C2: slug==template-slug Experience-lookup skip (intentional; related concern fixed by adv-005) - adv-004: Video.parents cross-locale enumeration (pre-existing pattern consistent with Video root resolver gates) Verified: pnpm --filter @forge/web typecheck passes; pnpm --filter @forge/web test passes (402 tests, 33 files, 3 todo); pnpm --filter @forge/admin test passes (2221 tests + 1 todo). Pre-existing finding KT-02 (Apollo result cast bypasses Apollo v4 types) carried forward as a known pre-existing pattern across the codebase; not addressed in this PR. Review artifact: /tmp/compound-engineering/ce-code-review/20260514-152723-dc41ab8f/ * docs(solutions): compound gql.tada cast-drift learning When two queries spread the same gql.tada fragment, the consumer type alias should anchor on FragmentOf<typeof fragment> rather than ResultOf<typeof QueryA>[path]. gql.tada projects fragment types through each query's selection set independently, so query anchored aliases produce structurally-equivalent but nominally distinct types — and the as <Type> cast that bridges them silently masks future query-level drift. Cross-package applicability: same risk re-emerges in U9's @forge/admin-graphql when admin fragments spread across multiple admin queries. Session history (4 prior sessions) showed the fragment-anchoring principle was established in packages/graphql/src/parity/ normalizers but never applied to content.ts; U5b adapter work where correct anchoring belonged was deferred out of PR #915. Surfaced by ce-code-review (KT-01 finding) and fixed in commit 37c5e2d on this branch. File path: docs/solutions/logic-errors/ gql-tada-fragment-anchor-cast-drift-same-fragment-multi-query-20260514.md Related: dual-client-gql-tada-multi-schema-codegen-pattern (narrow additive refresh recommended).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR-B of the web slug-page cutover from Strapi to admin (plan-003). Stacked on top of #932 (PR-A) and targets the `feat/admin-consumer-migration` integration branch.
Opened as DRAFT until PR-A (#932) merges into the integration branch. Then this branch gets rebased (which drops PR-A's commits from this PR's diff, leaving only U4-U9's ~5.3K LOC) and marked ready-for-review.
What this ships
U4 — ContentApiMode collapse + regression snapshot (`apps/web/src/env.ts`, `apps/web/src/lib/content-api-mode.ts`, regression test). `ContentApiMode` collapses to `"strapi" | "admin"`. Legacy values (`dual-read`, `admin-with-fallback`) soft-fallback to strapi with a distinct warn. Adds `WEB_ADMIN_API_KEYS` env (optional, CSV). First commit is the regression snapshot test that locks strapi-mode behavior byte-identical across U4-U9.
U5 — admin-shape fragments + dispatch + parity infra (20 new files under `packages/graphql/src/fragments/admin/`, ~740 LOC). Root `WatchExperience` fragment + 17 per-block-kind fragments + 2 nested-union fragments. Renderer dispatch in `apps/web/src/components/sections/index.tsx` adds 17 admin-typename cases alongside Strapi (additive — keeps every commit shippable). Parity normalizer narrows blocks type from `unknown` to `ReadonlyArray`. Parity-bridge mode guard kills canary emission post-cutover.
U6 — fetchSlugExperience cutover + bearer-aware admin client + cache re-throw (`apps/web/src/lib/admin-client.ts`, `apps/web/src/lib/content.ts`). Bearer auto-read at module scope from `WEB_ADMIN_API_KEYS`. `WatchPageAdminError` class with NOT_FOUND/UNAVAILABLE codes. `unstable_cache` re-throw inside catch block (prior plan flagged P0 if missing). Runtime safety net for bearer-missing case (logs `forge.parity.consumer_bearer_missing` + serves strapi for that request).
UB7 — `[slug]/error.tsx` Client Component (mode-aware). Catches only `WatchPageAdminError`. NOT_FOUND → ``; UNAVAILABLE → `` + reset button. Information-disclosure invariant: `error.message` NEVER renders.
U8 — batch verification harness (`packages/graphql/scripts/run-batch-verification.ts` + `packages/graphql/src/parity/batch-verification.ts` ~1.8K LOC + 992 LOC of tests). The cutover gate. Stratified-sample-first (100 slugs), concurrency cap 5, 429 backoff, JSON report. Bearer auto-read from env (hard-fails without `--anonymous` opt-out — self-DoS guard). Locked JSON report shape via inline snapshot.
U9 — cutover runbook + route-disable feature flag (`docs/admin-core-migration/cutover-runbook.md` 315 lines + `FORGE_DISABLE_WATCH_ROUTES` env + ``). Pre-cutover checklist, 4-layer rollback (route-disable → mode-flip → code-revert → admin-SDL-revert), MTTR procedure with user-impact metrics, planned + emergency bearer rotation procedures, unbounded-cycles contingency.
Pre-merge gates
Verification
Cutover is NOT automatic
Merging this PR does NOT change user behavior. `FORGE_CONTENT_API` stays at `strapi` by default. The cutover is a separate operator action (flip the env var to `admin` on forge-web Doppler) gated on the batch verification report being green. Full procedure in the runbook.
Open Questions (carry forward)
Plan: `docs/plans/2026-05-11-003-feat-web-admin-direct-cutover-plan.md` (committed in PR-A)