feat(admin): classify NoSuchKey as artifact_missing + emit missingArtifacts list (feat-119 PR1)#892
Merged
Conversation
…ifacts list (feat-119 PR1)
Replace the regex-on-error-message classifier in
apps/admin/src/services/manager-artifacts.service.ts with a typed-error
helper (`isArtifactMissing`) that branches on AWS SDK v3 typed surface
first (`error.name === "NoSuchKey" | "NotFound"`), legacy
`error.Code` second, then a tightened regex backstop
(`/not found|does not exist|ENOENT/i`). Drops the `missing` and
`no such key | NoSuchKey` regex tokens that over-matched (e.g.,
`"missing field 'foo'"` was silently demoted to skipped).
Add a deduped, sorted `missingArtifacts: ReadonlyArray<{ assetId, coreId, kind }>`
field to both R1 (sceneEmbeddingBackfill) and R2 (transcriptEmbeddingBackfill)
workflow reports. Pure projection over `outcomes[]` filtered to
`skipped { reason: "artifact_missing" }`, deduped by assetId via
Map<id, Entry> first-seen-wins, sorted ascending. The cascade still
emits L outcomes per missing (video, edition) for L locales; the
projection collapses those L copies into 1 entry per unique upstream
gap. PR2 of feat-119 will introduce a `triggerManagerEnrichment`
mutation that consumes this list directly.
Add `--report-out=<path>` flag to `pnpm run-embeds` so PR2's
`pnpm trigger-enrichment --from-report=<path>` has a stable input
file. Side-channel write failure does NOT alter exit code; report is
already on stdout.
Quality gates: 100 admin test files / 1564 tests pass (+29 new
across classifier coverage, missingArtifacts projection, --report-out
helpers). Typecheck + lint clean.
Local smoke against real S3 (manager bucket, read-only):
- 2_0-Crushing (cmsVideoId=790): pre-fix baseline 12 R2 `failed`,
post-fix 12 `skipped { artifact_missing }`, missingArtifacts.length=1.
- 2_0-ComingHome (cmsVideoId=787): same shape, distinct entry.
- Mixed run: 2 entries sorted ascending (787, 790), 24 R1 succeed
unaffected.
- Idempotency proven: re-run produces byte-identical missingArtifacts.
Behavior change worth flagging operationally: production embed-backfill
report's `failed` count drops sharply on first run after deploy (was
dominated by NoSuchKey false-failures), `skipped` rises correspondingly.
Any external dashboard or alert gated on `failed > N` should be
re-pointed to consider `failed + (skipped where artifact_missing)`.
Compounding: also captures three new solutions docs and updates two
existing ones with bidirectional cross-references:
- docs/solutions/runtime-errors/aws-s3-nosuchkey-classification-pattern-20260506.md
(PR1's typed-error pattern at the storage seam)
- docs/solutions/best-practices/workflow-report-operator-actionable-projection-pattern-20260506.md
(the deduped-projection-alongside-count-triple pattern, generalizable
beyond embed-backfill)
- docs/solutions/best-practices/mocked-shape-vs-real-contract-discipline-20260506.md
(META doc consolidating the "tests must throw the real typed shape"
rule across 4 prior worked instances: pgvector bulk-insert lesson,
verify-infra-writes-via-independent-read-path, parallel-workflow-error-robustness,
and PR1's own AWS error classification)
Refs: feat-119 (https://github.com/JesusFilm/forge/blob/main/docs/roadmap/content-discovery/feat-119-embed-backfill-artifact-missing-classification-and-opt-in-enrichment.md)
🤖 Generated with [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🚅 Deployed to the forge-pr-892 environment in forge 4 services not affected by this PR
|
lumberman
added a commit
that referenced
this pull request
May 6, 2026
* fix(manager): prevent stale live jobs poll overwrites (#835)
* docs: capture manager live jobs sse brainstorm
* feat(manager): add live jobs sse fallback
* fix(manager): preserve live jobs resync state
* fix(manager): prevent stale live jobs poll overwrites
* fix(manager): remove redundant live jobs refresh UI
* feat(admin): add media asset library (#877)
* feat(web): redesign watch download UX + same-origin streaming proxy (#868)
* feat(web): redesign watch download UX + same-origin streaming proxy
Watch-page download experience is rebuilt around a same-origin proxy
that hands the file to the browser's download manager (instead of
opening it in a new tab) and a redesigned modal with poster preview,
3-tier dropdown, ToS-gated download button, and Bible Quotes
always-on promo CTA.
Highlights:
- New `/watch/api/download` Node-runtime route that streams the
upstream `ReadableStream` directly into the response with
`Content-Disposition: attachment`. Hardened: redirect: "manual" so
the allowlist can't be bypassed via a 30x on an allowlisted host;
`AbortController` + `request.signal` so a slow CDN can't pin a Node
worker and a client-cancel actually frees the upstream connection;
filename sanitization strips control chars, RTL-override codepoints,
shell-meta chars, and forces a media-extension allowlist; conditional
request headers (Range / If-Range / If-None-Match / etc.) forwarded
as a unit so resumable downloads work; Content-Range round-tripped
on 206; JSON error bodies; origin+path-only error logging so signed
URL JWTs don't bleed into log retention; `maxDuration = 600` caps
the worst-case route lifetime.
- DownloadModal redesign: thumbnail (Mux Image API + Cloudflare
cinematic-still cascade), title, language pill, dropdown that
buckets the 8 CMS quality keys into Highest / High / Low (sized to
available count: 1 -> [Highest], 2 -> [Highest, Low], 3+ -> all
three). ToS checkbox is a circular pill with overlaid white check
on filled state. Click-outside + Escape-first close on the
dropdown; aria-controls/listbox-id wired; double-click guard on
the Download button; selectedTier reset only on modal-open
transition (preserves manual pick across parent re-renders).
- Bible Quotes carousel always renders, even with empty
bibleCitations; trailing promo card has a "Join our Bible study"
CTA to https://join.bsfinternational.org/?utm_source=jesusfilm-watch.
- Pill button system unified via Button variant="pill" with a
compoundVariants override so the default size doesn't clobber the
pill's chunky padding.
- WatchBody header restructured: Download button at the top of the
left column, vertically aligned with RELATED QUESTIONS / ASK YOURS.
- Watch-video fragment: added `duration` to variants and `thumbnail`,
`mobileCinematicHigh`, `mobileCinematicLow` to images. The
`mobileCinematicHigh` field is what other parts of the app use as
the editorial poster (the bare `images[].url` is a misshaped
Cloudflare Images URL that 400s).
- buildBibleQuotesBlock now always returns a block (was nullable)
so the carousel surfaces on every video page.
Tests added: 14 route handler tests covering allowlist 403, missing
url 400, upstream 4xx pass-through, 502 on fetch throw, redirect
rejection, 206 + Content-Range round-trip, conditional-header
forwarding, filename CRLF/RTL-override/extension-spoof handling,
response header allowlist (Set-Cookie stripped). Plus 3 new
DownloadModal tests pinning bucketing 1/2/3+ count, ToU link
target/rel, modal-closes-after-download. Plus updated content-merge
tests for the always-on BibleQuotes block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(solutions): capture SSRF defense + CodeQL FP handling from PR #868
Documents the 6-layer SSRF defense pattern for the watch download proxy
and the CodeQL js/request-forgery false-positive remediation matrix for
GitHub-hosted Default Setup. Includes a paste-ready dismissal-comment
template for alert #51.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): fix English variant selection + redesign watch share modal + harden hero loading (#878)
* feat(web): fix English variant selection + redesign watch share modal + harden hero loading
Search results were silently landing on non-English variants because Strapi
GraphQL caps relation pagination at 10 by default; videos with 100+ variants
(e.g. mary-visit-to-elizabeth has 242) had English beyond that window so the
watch page fell back to "first playable" non-English. Add `pagination:
{ limit: -1 }` to the watch-video fragment + GET_ROUTE_VIDEO query so all
variants are returned, then strip per-variant `downloads` / `muxVideo` /
`duration` from non-selected entries before RSC serialization to keep the
payload reasonable.
Watch hero (HeroPlayer) was collapsing to ~200px during the buffer phase
because the wrapper had no intrinsic height. Lock it with `aspect-video`,
add an `onCanPlay`-driven loading spinner, and clear the spinner from
`onError` so playback failures don't leave a permanent black box.
`videoReady` resets when `variant.documentId` changes (language switch).
Move the ResizeObserver setup to `useLayoutEffect` so the sticky-top calc
doesn't flash to `0px` on the first paint.
Share modal now mirrors the DownloadModal style: thumbnail + title +
description, Facebook + X social buttons (X uses an inline brand glyph,
not lucide's deprecated bird icon), segmented Share Link / Embed Code
tabs with an auto-fitting textarea (capped at 40vh), and a
white-with-red-hover Copy pill button. Localhost canonical origin is
substituted with `https://jesusfilm.org` for share intents only so
Facebook can crawl the URL; Copy Link still copies the configured
origin. FB + X buttons render disabled with a "share works once
deployed" hint when the origin isn't publicly shareable.
Embed snippet builder validates `playbackId` against a Mux-shaped
regex before interpolating into the iframe `src` (prevents stored XSS
on partner sites that paste the snippet) and uses inline
`aspect-ratio:16/9` styles instead of a `<style>` block so partners
can paste the snippet inside their own `<style>` tag without breakout.
Hardened `NEXT_PUBLIC_CANONICAL_ORIGIN` validation in env.ts with a
warn-only `z.refine` against a soft allowlist (jesusfilm.org subdomains,
localhost, *.local, *.railway.app) so misconfigured/leaked env vars
surface in logs without blocking unrelated deployments.
Refactors: extract `SpinnerIcon` to ui/spinner.tsx (shared with
SearchOverlay), URL helpers to lib/url.ts, share builders to
lib/share.ts. The fallback chain for `posterUrl` is extracted in
WatchPageClient. `variantsForLanguagePicker` is memoized.
5 new tests: HeroPlayer spinner lifecycle, GET_ROUTE_VIDEO pagination
assertion, F20 disabled-button on localhost origin, F22 embed-snippet
shape (aspect-ratio:16/9, no `<style>` block). 245 total passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(web): reset videoReady via render-phase setState, not useEffect
The useEffect-driven reset on `[variant.documentId, playbackId]` tripped
the React Compiler ESLint rule ("Calling setState synchronously within
an effect can trigger cascading renders"). Switch to the canonical
"adjust state during render" pattern: track the last-rendered variant
key and reset videoReady inline when it changes. The new state is
queued before commit, so no cascade.
Drops the now-unused useEffect import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): clear search query when JFP logo is clicked
Both the desktop floating-bar logo (FloatingSearchProvider) and the
mobile in-overlay logo (SearchOverlay) now call `search("")` on click,
which clears the query state, strips the `?q=` URL param, and resets
cached results. The Link's navigation to "/" still fires — the click
handler runs first, so the home page lands on a fresh search bar
instead of inheriting the previous query.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(solutions): capture watch-page pagination fix + refresh mux chrome pattern
Two docs from PR #878's compound-engineering follow-up.
New: docs/solutions/logic-errors/strapi-graphql-pagination-cap-
wrong-language-watch-page-20260504.md captures the watch-page
wrong-language-playback bug from a logic-error angle. Same root
cause as the existing manager-side learning (Strapi v5 GraphQL
silently caps relations at 10 rows when no `pagination` arg is
supplied; REST `maxLimit` does not apply). The manager doc's
symptom was missing coverage data; this doc's symptom is
wrong-language video playback (37/95 search hits before fix; 0
after). Cross-references the manager doc as the canonical
root-cause explanation.
Refresh: docs/solutions/design-patterns/mux-player-custom-react-
chrome-pattern-20260430.md.
- Section 5a: useEffect to useLayoutEffect for the ResizeObserver
setup (now load-bearing because aspect-video provides real
height before first paint), and add aspect-video to the wrapper
className example with the rationale.
- New section 5f: full loading-spinner state machine pattern --
videoReady + onCanPlay + onError recovery + render-phase reset
on prop change. Covers the three load-bearing pieces, the right
event choice (canplay vs loadedmetadata vs playing), the
stale-spinner-on-language-switch trap, and the React Compiler
cascading-renders rule that mandates render-phase setState over
useEffect-driven reset.
- Related: pointer to the new logic-errors doc + PR #878.
- Frontmatter last_updated: 2026-05-01 to 2026-05-04.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(admin): harden Core dub-sync after flat-pagination cutover (#879)
* fix(admin): keyword-first recall on CamelCased brand queries
Two coordinated changes to admin's keyword-first hybrid search
retrievers so q="the bible project" returns the full BibleProject
series (was 6 hits, now 20 — matching Algolia's stg corpus).
Root cause: descriptions write the brand as one word
("BibleProject"), Postgres `to_tsvector('simple', ...)` keeps it as
the single lexeme `bibleproject`, and `websearch_to_tsquery('simple',
'the bible project')` produces `'the' & 'bible' & 'project'` (three
tokens, ANDed) which never matches the lexeme. 14 of 20 BibleProject
series videos (Lord's Prayer, Shema, YHWH, Beatitudes, Sermon on the
Mount, etc.) silently failed.
Fix 1 — CamelCase-split before tokenizing. Generated columns
`title_tsv` / `description_tsv` now wrap `to_tsvector` around
`regexp_replace(coalesce(<col>, ''), '([a-z])([A-Z])', '\1 \2', 'g')`.
`BibleProject` → `Bible Project` → tokens `bible` + `project`. Both
joined-form and split-form spellings match. The `[a-z][A-Z]` form is
conservative — preserves all-caps acronyms like YHWH/LORD intact.
Fix 2 — Trigram retriever extended to title+description. Adds
`video_locale_description_trgm_idx` (operator-class GIN with
`gin_trgm_ops`); `searchByTrigram` now UNIONs title-side and
description-side `%>` matches, ranked by GREATEST similarity.
Hybrid mode stays byte-identical — the legacy R4
`video_locale_fulltext_search_idx` is untouched (it's expression-based
on raw `title`/`description`, not on the regenerated `*_tsv` columns).
`hybrid-search.regression.test.ts` confirms hybrid mode's
byte-identity-against-mocks. The CamelCase recall improvement is
keyword-first-mode-only.
Migration 0010 follows the documented DROP COLUMN CASCADE + ADD
COLUMN GENERATED pattern; byte-parity invariant between TS
`*_GENERATED_EXPR` constants and the migration SQL preserved
(`hybrid-search-sql.test.ts` enforces).
Verified locally:
- 1282/1282 tests green (incl. hybrid regression)
- psql returns 25 keyword-weighted matches for "the bible project"
(was 6 pre-fix)
- /watch/demo-keyword-search canary shows 20 keyword-first results,
"Algolia ∩ Keyword" tile jumps from ~4 to 14, "In all 3" 0 → 6
Plan: docs/plans/2026-05-02-001-fix-keyword-first-camelcase-recall-plan.md
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(admin): address ce:review findings on PR #872 follow-up (#873)
Eight clarity / documentation cleanups surfaced by ce:review on the
keyword-first CamelCase recall fix (PR #872). All non-behavioural —
test count goes from 156 -> 168 from the new regex behaviour cases.
CLAUDE.md keyword-first section (maint-001, conf 0.92):
- Schema bullet now names 0010 as the live owner of the generated
columns + weighted index expression
- searchByTrigram bullet rewritten — was "Title-only (description
trigram index would balloon)", now describes title+description
with GREATEST ranking and the new description trigram index
- Operational runbook updated: applies both 0009 + 0010, EXPLAIN
probe shows BitmapOr over both trigram indexes, post-R0 warning
on manual re-application
Migration 0010 comment expansions:
- Document ASCII-only regex limit (Cyrillic/Greek/etc. CamelCase
not split; future POSIX [[:lower:]] alternative noted)
- Document description trigram size revisit threshold (~500MB or
>20% INSERT/UPDATE latency regression at R0)
- Document AccessExclusiveLock posture for any future
same-shape migration on a populated table
Code clarity:
- TrigramResult.similarity now has a JSDoc clarifying it's
GREATEST(title, description) post-0010, not title-only (kt-2)
- searchByTrigram docstring reconciles "UNION" wording with
the actual OR + DISTINCT ON shape (testing finding)
- VIDEO_LOCALE_DESCRIPTION_TRGM_INDEX_NAME JSDoc honest about
having no production consumer (test-only export) (maint-003)
- Drop unused __isAttribution marker on AttributionFixture (kt-1)
New tests:
- 12 behavioural cases for the CamelCase-split regex
(BibleProject, JesusFilm, MacOS, iPhone, multi-segment, YHWH /
LORD preserved, iOS, ABCDef leading all-caps run, idempotent
on already-split, empty, Cyrillic ASCII-only-limit lock)
Verified: 168 / 168 tests green, lint + typecheck clean.
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(admin): harden Core dub-sync after flat-pagination cutover
The previous nested videos { variants {} } batched query tripped Core's
~50s statement_timeout on megavideos like JFP-Classic and only synced
~125-175 of 1,088 videos before aborting. This change refactors the
video-dubs phase onto Core's flat top-level videoVariants(offset, limit,
input) query unified across since/full branches, fixes a soft-delete
bind-variable overflow that aborted the cleanup tail on a 209k-row
catalogue, and hardens the periphery (per-page error isolation, circuit
breaker, structured GraphQL error logs, schema field-mapping fix).
Validated: 1,088 / 1,088 videos with dubs, 209,297 dub rows, 0 page
errors on local sync (~35 min wall time, JFP-Classic dominates). All
53 core-sync tests + typecheck + lint clean.
Key fixes:
- Soft-delete via $executeRaw + toPgArray + ::text[] cast (avoids
Postgres's 32,767 prepared-statement parameter cap).
- Enum literal case ('core' not 'CORE') — Prisma high-level API maps
TS variant names to @map'd DB values, raw SQL doesn't.
- @map("video_source") on Video.videoSource (column was created snake
case in 0001_init; field was camelCase, breaking video upserts).
- Per-page try/catch + max-consecutive-error circuit breaker so one
Core hiccup doesn't kill the whole phase.
- CoreGraphQLError type widened with path / extensions for diagnosis;
per-page logger surfaces extensions.code and path[].
Capture:
- docs/plans/2026-05-04-001-refactor-harden-core-dub-sync-after-flat-pagination-cutover-plan.md
- docs/solutions/platform/core-graphql-unbounded-relation-fan-out-20260504.md
- docs/solutions/database-issues/postgres-prepared-statement-bind-variable-limit-32767-20260504.md
- docs/solutions/database-issues/prisma-raw-sql-enum-mapping-seam-20260504.md
- apps/admin/CLAUDE.md gains a Core sync — video-dubs phase subsection.
Generated with Claude Opus 4.7 (1M context) via Claude Code +
Compound Engineering pipeline (/ce:plan -> /ce:work -> /ce:review ->
/ce:compound).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(roadmap): mark feat-109 keyword-first search complete (#853)
PR #852 shipped to main. Flipping the roadmap ticket from in-progress
to complete.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(roadmap): add embed-backfill performance tickets feat-115..118 (#881)
Captures the four-PR sequence to make R1 (scene) + R2 (transcript)
embed-backfill workflows fast at 200k-variant scale and incremental
on re-runs.
- feat-115 (P0, 1d) — bounded parallelism on per-target loop
(p-limit + Promise.allSettled per the workflow-robustness
solutions doc; expected 5-10× R1, 10-20× R2)
- feat-116 (P0, 2d) — S3 artifact memoization across locales
(~85% S3 read reduction) + batched OpenRouter calls per
(video, locale) (20-50× fewer API calls on R1)
- feat-117 (P0, 2d) — bulk DB writes via $executeRaw with
INSERT … ON CONFLICT DO UPDATE (10-50× write throughput;
references the bind-var and enum-mapping solutions docs)
- feat-118 (P1, 3d) — content-hash skip on (videoSceneId, locale)
ports the R3 cms_content_hash pattern; re-runs of unchanged
content cost $0 OpenRouter and finish in seconds
Bidirectional dependency chain wired: 115 → 116 → 117 → 118.
Each ticket body includes Problem framing, Entry Points, Grep These,
What To Build (with code sketches), Constraints (cross-referencing the
parallel-workflow-error-robustness, bind-var, and enum-mapping seam
solutions docs), and Verification gates.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): show chapter carousel on watch pages with chapter-aware data (#880)
* feat(web): show chapter carousel on watch pages with chapter-aware data
Resurface the SiblingCarousel that was previously stubbed with `return
null` "until thumbnail-image plumbing is restored." Now wired correctly
for parent videos (JESUS surfaces its 61 chapter segments) and chapter
videos (siblings within the parent).
Data flow
- watchVideoFragment gains a top-level `children(pagination: limit:-1)`
projection so parent/collection videos can show their own children
in the carousel. The existing nested `parents[].children` selection
also gains the same `pagination: limit:-1` argument so the chapter
fallback path doesn't silently truncate at Strapi's 10-row default
for collections with more than 10 segments. Image variant fields
(thumbnail, mobileCinematicHigh, mobileCinematicLow) are added to
both projections.
- buildSiblingCarouselBlock prefers `video.children` when the current
video is itself a parent (>=2 own children); otherwise falls back to
the canonical parent's children. The synthesized virtualParent uses
a new `CarouselParent` structural subtype instead of the full
WatchParent — drops the `as WatchParent` cast and the cross-path
filter predicate that silently bypassed the type checker.
UI
- 16:9 cards with full-bleed thumbnail, "CHAPTER" caption + title
overlay, gradient for legibility. Active card carries `border-4
border-red-600` (drawn inside the box so CarouselContent's
overflow-hidden doesn't clip it). Inactive cards have no border so
the body-zone backdrop doesn't read as a halo. Hover on inactive
cards reveals a centered red Play overlay.
- Carousel moved out of TOP_ZONE_KINDS into the body zone so it sits
in normal flow under the sticky hero rather than co-pinning with it.
Body-zone padding tightened (gap-10 -> gap-6, pt-4 -> pt-2) and the
carousel section uses pt-2/pb-2 for tight visual coupling.
- href shape switched from 3-segment `/{parent}/{child}/{locale}` to
the 2-segment `/{slug}/{locale}` route shape that matches the flat
watch route. Both segments are URL-encoded defensively. Children
without a `slug` are filtered out — an unclickable card is worse
than an omitted one.
- Parent-page mode (current video has no chapter to highlight) shows
"{N} chapters" instead of misleading "Clip 1 of N" and exposes a
`data-mode="parent"|"chapter"` attribute on the section root for
agents.
- `<Image priority={index < 5}>` on the first 5 above-the-fold cards
to LCP-optimize the visible strip.
Image URL handling
- New `resolvePosterUrl(image, playbackId?)` helper in lib/url.ts
centralizes the cinematic-still > thumbnail > Mux fallback chain
shared by SiblingCarousel and WatchPageClient. The legacy `url`
field is dropped from the chain entirely — it returns a misshaped
Cloudflare Images URL that 400s, so falling through to it only
ever produces broken thumbnails.
Reactivity
- Snap-to-active effect now depends on `[api, activeIndex]` instead
of `[api]` only, so a variant switch that flips the active chapter
re-snaps the carousel rather than leaving it on the previous one.
Tests (251 passing, +6 new)
- Image priority chain regression: mobileCinematicHigh > Low >
thumbnail; placeholder when only legacy `url` is present.
- buildSiblingCarouselBlock virtualParent branch (parent-mode): block
carries video identity, currentVideoDocumentId can never match a
child, no Playing-now badge fires.
- Fragment regression for top-level `children(pagination: limit:-1)`.
- DOM-containment assertion: rendered carousel is a descendant of
watch-body-zone (not a sibling of hero-player-wrapper).
- WatchSectionRenderer makeVideo fixture gains the new `children: []`
default so the fixture matches WatchVideoRecord's required shape.
Reviewed via /compound-engineering:ce-code-review (10 reviewer
agents). All applied findings landed; 4 policy decisions skipped per
maintainer preference (chapter-label hardcode policy, 60s cache
rollout window, CMS image-host trust, RSC payload fan-out).
Run artifact at /tmp/compound-engineering/ce-code-review/
20260504-163701-1a235887.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): smooth click-and-drag timeline scrubbing on watch hero
Adds pointer-based timeline scrubbing with rAF coalescing so high-frequency
pointermove events latch to <=1 seek per frame, plus decoupled visual state
(scrubPct/displayTime) so the thumb tracks the cursor while the underlying
seek catches up. Mirrors the volume slider state machine, recovers from
lostPointerCapture, and resumes play on drag end if it was playing.
Includes 13 review-driven hardenings: atomic snapshot+zero of
wasPlayingBeforeScrubRef to defend against the releasePointerCapture
re-fire path, synchronous ref writes to close the pointermove drop window,
displayTime-driven aria-valuenow/valuetext, rect snapshot at pointerdown,
touch-pan-y so vertical page scroll still works, keyboard early-return
during drag, and unmount cleanup that resumes play. Six new tests cover
data-dragging transitions, displayTime, pause/resume, rAF coalescing,
lostPointerCapture recovery, and no-stray-seek on unmount.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): always-render Related Questions section with placeholder fallback
The right column was previously gated on prompts.length > 0, which hid the
entire section -- including the always-relevant Ask Yours CTA -- on videos
with no editorial study questions in the CMS. WatchStudyQuestions now
falls back to a single placeholder row ("If you could ask the creator of
this video a question, what would it be?") when prompts is empty, and the
section always renders.
Bundles the review-driven cleanups: removed the now-dead
data-has-right-column attribute, dropped a stale doc comment that
referenced a col-span-8 wrapper torn out during layout iteration,
refreshed the test-file docblock, renamed PLACEHOLDER_PROMPT to
PLACEHOLDER_QUESTION for domain consistency, and pinned the placeholder
branch against the same UX-regression contract (no false-affordance
chevrons, decorative SVG only, singleton button count).
Captures the pattern in
docs/solutions/design-patterns/always-render-cta-section-with-placeholder-row-20260505.md
so future work that combines editorial CMS content with a CTA can gate
on the CTA's relevance, not on the editorial content's presence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): align Download/Related Questions row with watch-page title
Move the Download pill into a flex row with the h1 title (instead of
sitting alone above the SEGMENT label) and tune the right column's
WatchStudyQuestions section pt to land its header on the same Y axis.
With Download out of the left column's vertical flex flow, increasing
pt on a sibling can no longer push it out of alignment with the title.
The right column section pt now scales with breakpoints to track the
h1 size scale: pt-0 mobile (columns stack, no extra gap), md:pt-9 for
md:text-4xl, xl:pt-11 for xl:text-5xl. Mobile previously had ~84 px of
unwanted whitespace between the stacked columns (pt-11 + grid gap-10);
pt-0 closes that.
Bundles review-driven hardenings: data-testid on the new title-row
flex wrapper plus tests asserting Title and Download share that
parent, min-w-0 on the h1 so long titles wrap cleanly inside the
justify-between row, and tests pinning pt-0 md:pt-9 xl:pt-11 + mb-4
as alignment-load-bearing tokens. Verified live via getBoundingClientRect
through the Chrome MCP -- title/Download/Related Questions/Ask Yours
all within 1 px of the same Y at xl=1920.
Captures the iteration approach in
docs/solutions/developer-experience/measurement-driven-layout-iteration-chrome-mcp-20260505.md
so future alignment work skips the eyeball-then-guess loop and goes
straight to getBoundingClientRect measurement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): widen watch-page gutters and inter-column gap (#883)
* feat(web): widen watch-page gutters and inter-column gap
Add xl/2xl tiers to CONTENT_WIDTH_CLASSES and the matching
CAROUSEL_BLEED/CONTENT_PADDING/END_SPACER constants so wide
viewports get proportional left/right gutters. Bump WatchBody's
md+ grid gap (md:gap-12 / xl:gap-16 / 2xl:gap-20) so the
title/description and Related Questions columns breathe at >=md.
Move BibleQuotesSection's previously-hardcoded right-trim onto
the same ladder so the carousel stays symmetric at lg/xl/2xl.
CONTENT_WIDTH_CLASSES is shared with every Experience section,
so this widens gutters site-wide -- intentional for cross-section
horizontal alignment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(solutions): grep inline tier copies before bumping shared layout tokens
Captures the practice surfaced by code review on PR #883: when the
four-constant lockstep tuple in apps/web/src/lib/content-width.ts is
bumped, grep for the OLD per-breakpoint tier values across apps/web/src
to catch inline open-coded copies (e.g. BibleQuotesSection.tsx's
trailing pr-* ladder) that won't follow via the constants. The cost
of drift is silent visual asymmetry; the cost of the grep is one second.
Cross-links the same-day measurement-driven-layout-iteration doc and
the prior nextjs-route-shape contract-drift doc (same family, different
domain).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* perf(admin): parallelize embed-backfill per-target loop (feat-115) (#882)
* perf(admin): parallelize embed-backfill per-target loop (feat-115)
R1 (sceneEmbeddingBackfill) and R2 (transcriptEmbeddingBackfill) now
fan out their per-target loops via `pLimit(N) + Promise.allSettled`
instead of sequential `for…of`, with concurrency tunable via the new
`SCENE_EMBEDDING_CONCURRENCY` and `TRANSCRIPT_EMBEDDING_CONCURRENCY`
env vars (default 10). Per-target error isolation is preserved — the
documented "no Promise.all" rule (parallel-workflow-error-robustness-
20260420.md) is enforced by added regression tests on both workflows.
Stage 1 of the embed-backfill performance plan
(docs/plans/2026-05-04-002-refactor-admin-embed-backfill-performance-plan.md).
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context, extended thinking) <noreply@anthropic.com>
* refactor(admin): address ce-review findings on bounded-parallelism PR
- Lower default concurrency 10 → 5 on both R1 and R2 to leave headroom
on admin's documented `connection_limit=10` Prisma pool. Aligns code
with CLAUDE.md's "start prod at 5" rollout posture.
- Stream `logOutcome` per completion (inside the `limit()` callback)
instead of bursting after `Promise.allSettled` resolves. Restores
the operational visibility the sequential `for…of` had so operators
see real-time progress on long-running backfills.
- Track real elapsed `durationMs` on the synthetic `failed` outcome
produced when a settled `Promise.allSettled` result rejects. Avoids
polluting `outcomes[].durationMs` dashboards with `0`s.
- Emit a structured `event=start` log per workflow at dispatch
carrying the resolved concurrency, total targets, and locale /
language filter. Closes the agent-native gap where only the CLI
logged this (now observable from the GraphQL trigger path too).
- Strengthen tests: the prior `Promise.allSettled` regression guard
was a no-op (rejections were caught inside the step before reaching
the workflow boundary). Replaced with a `vi.spyOn(_internals,
"stepIndexEditionLocale")` test that genuinely exercises the
synthetic-failed branch — a `Promise.all` regression now fails this
test. Concurrency-cap test now asserts `observedMaxInFlight === 2`
(catches a sequential regression too) and drops the flaky
wall-clock timing assertion.
- Export `concurrencyEnvSchema` from env.ts as a shared zod fragment;
bind env.test.ts to that real export instead of duplicating the
schema literal.
- Drop `CLI_DEFAULT_CONCURRENCY` and `parsePositiveInt` from
run-embeds.ts. The CLI now imports `DEFAULT_*_EMBEDDING_CONCURRENCY`
from the workflow modules and resolves the effective value via the
validated `env` directly — single source of truth.
- Fix CLAUDE.md note: `Promise.allSettled` preserves input order, so
`outcomes[i]` IS index-aligned to `targets[i]`. Update both R1 and
R2 sections to default to `?? 5` and document the streaming-log
shape.
Resolves ce-review findings 1–10 on PR #882.
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context, extended thinking) <noreply@anthropic.com>
* docs(roadmap): add feat-119 for NoSuchKey classification + opt-in enrichment
Surfaced by the feat-115 smoke run (PR #882): admin's
manager-artifacts service classifies S3 NoSuchKey errors as
artifact_read_failed (→ failed outcome) instead of artifact_missing
(→ skipped outcome) because the regex matches against the error
message rather than the AWS SDK's typed error name. Result: 4,169
"failures" in a 10-minute scoped smoke run that were really benign
"manager hasn't enriched this asset yet" data-readiness signals.
Ticket structure:
- Phase 1 (P2, ~1 day): replace inline regex with a typed-error-aware
isArtifactMissing(error) helper. Branches on error.name === "NoSuchKey"
with regex fallback for non-AWS error sources.
- Phase 2 (design-grade, ~3 days, split into its own ticket if pursued):
opt-in --auto-enrich-on-missing flag that dispatches manager-side
scene-analysis when the embed job hits a missing artifact, polls for
completion, then resumes embedding. Default stays opt-out to preserve
cost/blast-radius and the architectural seam.
Architectural context captured: scene-analysis is manager's
enrichment pipeline, NOT the embed job. The embed job is a downstream
consumer reading via S3. NoSuchKey means "manager hasn't enriched
yet" — re-running the embed job alone never fixes it.
feat-118's frontmatter updated to depend_on feat-119 because Stage 4's
skipped_unchanged outcome accuracy depends on accurate skipped accounting.
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context, extended thinking) <noreply@anthropic.com>
* docs(solutions): capture bounded-parallelism pattern from feat-115
New solutions doc:
docs/solutions/best-practices/bounded-parallelism-per-target-workflow-pattern-20260505.md
Captures the canonical `pLimit(N) + Promise.allSettled` shape that
emerged from PR #882 (feat-115 / Stage 1 of the embed-backfill
performance plan). Complements the existing 04/20
parallel-workflow-error-robustness doc — that one establishes the WHY
(don't let one rejection kill the batch); this one specifies the HOW.
Key reusable knowledge:
- Bounded-parallelism workflow body shape with concurrency env var
sized below the documented Prisma `connection_limit=10` pool to
leave headroom for concurrent GraphQL/REST traffic.
- `_internals.stepX` indirection from the workflow body so tests can
`vi.spyOn` to genuinely trip the outer `Promise.allSettled` rejected
branch — without it, a `Promise.all` regression silently passes the
per-target isolation tests because the inner step's try/catch
absorbs every test-injected rejection before the outer `await` sees it.
- Streaming `logOutcome` per-completion inside the `limit()` callback
preserves operator visibility on long runs (instead of bursting at
the end after `allSettled` resolves).
- Synthetic-failed defensive branch on settled `rejected` results
carries real elapsed `durationMs` (not 0) so latency dashboards
don't get polluted.
- Concurrency-cap test asserts `observedMaxInFlight === N` (exact
equality, not `<= N`) to catch BOTH a regression to sequential
(`=== 1`) AND a regression that drops the cap (`>= N+1`).
- "When NOT to apply" section calls out R3 (experienceContentDump.ts)
as the deliberate sequential counter-example.
Cross-link added from 04/20 doc → new doc so future readers find
both the WHY and the HOW from either entry point.
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context, extended thinking) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context, extended thinking) <noreply@anthropic.com>
* feat(admin): add localized image enrichment workflow (#884)
* feat(admin): add localized image enrichment workflow
* fix(admin): harden image text model fallback
* perf(admin): per-(video, edition) S3 cache + batched OpenRouter (feat-116) (#885)
* perf(admin): per-(video, edition) S3 cache + batched OpenRouter (feat-116)
Stage 2 of the embed-backfill performance plan
(docs/plans/2026-05-04-002-refactor-admin-embed-backfill-performance-plan.md,
ticket feat-116). Internal reshape of R1 (sceneEmbeddingBackfill) and
R2 (transcriptEmbeddingBackfill); GraphQL trigger mutations + CLI
output stay byte-identical to Stage 1 modulo outcome ordering.
Two compounding wins on top of Stage 1's bounded parallelism:
1. **Per-(video, edition) artifact memoization.** Workflow groups flat
(video, edition, locale) targets by (video, edition), fetches the
manager-artifacts S3 JSON ONCE per group, and threads the loaded
artifact into each per-locale indexer via a new first-class
`loadedArtifact` parameter (renamed from the previously test-only
`artifactOverride`). S3 reads collapse from N×L to N. Group-level
load failures cascade to per-locale outcomes preserving the
artifact_missing → skipped / other → failed classification.
2. **Batched OpenRouter (R1 only).** New
`generateExperienceEmbeddings(inputs[])` issues ONE provider call
per (video, locale) target with input-position-stable output
contract (`embeddings[i]` ↔ `inputs[i]`). Typed `EmbeddingsBatchError`
with 7-code literal union for caller branching. Singular
`generateExperienceEmbedding(text)` delegates with [normalizedText]
and preserves the back-compat error message for hybrid-search /
experience-embedding / search-health callers. R2 keeps vector reuse
(no provider call).
pLimit boundary moves up one level: cap is now over (video, edition)
GROUPS, not flat targets. Per-locale work runs sequentially within a
group so the loaded artifact stays scoped to one stack frame.
Smoke (local Postgres, 2 videos × 8 locales × 2 editions = 24 targets):
- R1: 24 succeeded / 4 groups / 0 failures / 14.7s wall-time. 4 S3
reads + 24 batched OpenRouter calls (vs Stage 1's 24 + ~84). DB:
60/60 + 24/24 locale rows have non-NULL embeddings.
- R2: group-level cascade verified — identical durationMs across all
6/8 outcomes per group (479ms × 5/7 one edition, 492ms × 5/7 the
other). Pre-existing feat-119 NoSuchKey gap classifies as failed
(out of scope; will be fixed before feat-118 ships).
Two new solutions docs capture the reusable patterns:
- per-parent-child-memoization-loadedartifact-pattern-20260505.md
- batched-provider-input-position-stable-contract-20260505.md
With bidirectional cross-refs to the bounded-parallelism + parallel-
workflow-error-robustness docs and both R1/R2 platform docs.
feat-115 ticket flipped to complete with Resolution section. feat-116
flipped to in-progress (will go complete on merge). README rows updated.
Tests: 99 files, 1504 tests + 1 todo, all green.
Residual risks (deferred per plan / review):
- EMBEDDING_REQUEST_TIMEOUT_MS=30s sized for singular calls; batched
payloads may need a higher ceiling — needs empirical p99 data.
- No retry/backoff on transient OpenRouter failures; one 429/5xx now
fails the whole target. Net call-count reduction (~20×) means net
rate-limit pressure dropped, but the failure-mode shift is real.
- processGroup's S3 read is NOT inside a "use step" boundary
(deliberate — avoids persisting ~250 KB JSON per group); replay
re-fetches the artifact. Documented inline.
- feat-119: NoSuchKey doesn't always classify as artifact_missing for
embeddings.json; group cascade makes the gap noisier (visible in R2
smoke). Out of scope for Stage 2; blocks feat-118.
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(admin): satisfy useworkflow build plugin for Stage 2 group reshape
CI build was failing — useworkflow's `workflow-node-module-error`
plugin rejected `s3.ts` Node-module use because Stage 2's
`processGroup` (called from workflow scope as a plain function)
transitively reached `s3.ts` through `_internals.stepIndexEditionLocale`
→ `indexEditionScenes` → `readSceneAnalysisArtifact` → `s3.ts`.
The plugin treats any module reachable from workflow scope via plain
(non-step) functions as workflow scope. Stage 1 worked because the
workflow body called `_internals.stepIndexEditionLocale` directly —
that step boundary cleanly demarcated scope. Stage 2 introduced
`processGroup` between the workflow body and the per-locale step,
which the plugin saw as workflow scope.
Fix:
- Make `processGroup` a `"use step"` boundary so the entire group
worker (artifact load + per-locale fan-out) runs in step scope.
The build plugin's reachability check stops at step boundaries.
- Move artifact-load step wrappers (`stepLoadSceneAnalysisArtifact`
/ `stepLoadEmbeddingsArtifact`) into a separate
`_steps/load-manager-artifact.ts` module so the workflow files
don't directly import the underlying readers — keeps the workflow
module's import graph clean.
Test impact: none — `_internals.stepIndexEditionLocale` is still
spy-able (vi.spyOn intercepts at the JS property-access level; the
"use step" directive is a build-time hint that's a no-op in vitest).
All 1,504 tests still pass.
Production impact: the per-group artifact + outcomes[] now get
journaled by useworkflow at the step boundary. Operators monitoring
journal size should expect ~280 KB per group on top of the per-step
results that already existed in Stage 1.
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(roadmap): mark feat-116 complete (PR #885 merged) (#888)
Stage 2 of the embed-backfill performance plan shipped via PR #885
(commit d5e8ec0d on main). Roadmap closure:
- feat-116 status: in-progress → complete
- Resolution section added referencing PR #885, what shipped (per-(video,
edition) S3 memoization + batched OpenRouter), the CI fix (processGroup
as "use step" + separate _steps/load-manager-artifact.ts module),
smoke evidence, the two new compound docs, and residual risks
- README row flipped
feat-117 (Stage 3 — bulk SQL writes) auto-unblocks per the viewer's
blocked-by-deps rule. feat-119 (NoSuchKey classification gap) became
more noisy under Stage 2's group cascade and should land before
feat-118 Stage 4 ships.
🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* perf(admin): bulk DB writes via INSERT … unnest(...) ON CONFLICT (feat-117) (#889)
* perf(admin): bulk DB writes via INSERT … unnest(...) ON CONFLICT (feat-117)
Stage 3 of the four-stage embed-backfill performance plan
(docs/plans/2026-05-04-002-...). Collapses per-row Prisma upserts +
per-row $executeRaw vector UPDATEs into one bulk INSERT … unnest(...)
ON CONFLICT … DO UPDATE per (video, edition, locale) write batch.
GraphQL trigger surface byte-identical to Stage 2.
R1 (scene-embedding.service.ts) — three round-trips per target:
1. Bulk INSERT video_scene rows (ids client-generated via
crypto.randomUUID; @default(cuid()) only fires when id is omitted)
ON CONFLICT (video_edition_id, scene_index) DO NOTHING.
2. One follow-up SELECT id, scene_index recovers the full
scene_index → id mapping for both new and pre-existing parents
(DO NOTHING doesn't return rows for existing matches; rerun path
needs them).
3. Bulk INSERT video_scene_locale rows ON CONFLICT
(video_scene_id, locale) DO UPDATE SET … with Way A discipline:
bind text[] arrays via toPgArray, cast per-row at the SELECT
seam (u.embedding_text::vector(1536), and ARRAY(SELECT
json_array_elements_text(u.themes_json::jsonb)) for the multi-
value text[] columns). Avoids the PG18 ?::vector(1536)[] /
?::jsonb[] array-param cast paths that aren't well-trodden in
pgvector or Postgres.
R2 (transcript-embedding.service.ts) — one bulk chunk INSERT per target:
Shape A: 12 parallel typed arrays (text/int/double precision/...)
unnested at the SELECT seam, with per-row ::vector(1536) cast
on the embedding column. Parent videoTranscript.upsert stays as
a Prisma call (one row per target — bulk wouldn't save a round-
trip and complicates the parent-id type story).
toPgArray (db/pgvector.ts):
Extended to accept readonly (string | null)[] and emit the unquoted
NULL token for nullish elements. PG18 § 8.15.6 confirms quoted
"NULL" is the literal three-char string, not SQL NULL — necessary
for R2's nullable startSeconds / endSeconds / chunkId columns.
Existing string[] callers are unaffected (additive widening).
Length-equality preflight (load-bearing):
Per PG18 functions-array.html, unnest(arr1, arr2, ...) silently
NULL-pads unequal-length arrays — no error. Both R1 and R2 assert
parallel-array length equality BEFORE issuing $executeRaw and throw
a typed `artifact_invalid` error otherwise. Regression test mocks
$executeRaw and asserts it is NEVER called on a length mismatch.
Tests (53 net new + rewritten across 3 files):
- SQL-shape invariants for R1 parent INSERT, R1 locale INSERT,
R2 chunk INSERT (assert each contains INSERT INTO,
unnest(, ON CONFLICT, DO UPDATE / DO NOTHING, the right
type casts, json_array_elements_text for Way A unfold,
EXCLUDED.embedding for the locale path, etc.).
- Bind-count regression (mock.calls[0].args.length is constant
at N=3 vs N=30 for all three INSERTs — guards against re-
introduction of per-row binding which would re-trip the 32,767
prepared-statement parameter cap).
- Length-equality preflight (mismatch throws BEFORE $executeRaw).
- Vector position-stability through the bulk INSERT (Stage 2's
position-stable contract carries through Stage 3 unchanged).
- Mixed insert + update fixture (half new, half pre-existing).
- toPgArray nullable round-trip + brace rejection.
- R2 nullable timecode binding via the unquoted NULL token.
Plan deepening: docs/plans/2026-05-04-002-...-plan.md gained the
locked-in Stage 3 decisions inline (deepened: 2026-05-05 in
frontmatter): JSONB framing correction (themes/etc. are text[],
not jsonb), R1 parent collapse two-statement shape, R2 Shape A,
toPgArray nullable extension, length-equality preflight rule,
Way A vector cast, and three new risk entries.
Solutions doc: docs/solutions/database-issues/pgvector-bulk-insert-
on-conflict-pattern-20260505.md captures the bulk-insert + bind-as-
array pattern, per-row Way A ::vector(1536) cast, length-equality
preflight, JSONB-vs-text[] correction, brace rejection, parent
DO NOTHING + follow-up SELECT rationale, and the explicit "HNSW
maintenance is per-row internally; operator-mode DROP INDEX → bulk
INSERT → CREATE INDEX is a future enhancement only."
apps/admin/CLAUDE.md: R1 + R2 sections updated with Stage 3 bullets.
Roadmap: feat-117 flipped to in-progress (will flip to complete on
merge).
Verification:
pnpm --filter @forge/admin typecheck ✓
pnpm --filter @forge/admin lint ✓
vitest run on the 3 touched test files: 53/53 passing in 1.68s
Full admin test suite: 1,517 tests passing (1 todo unchanged)
Stage 4 (feat-118 — content-hash skip) auto-unblocks. Note: feat-119
(NoSuchKey classification gap) became more visible under Stage 2's
group cascade and should land before feat-118 ships.
🤖 Generated with Claude Opus 4.7 (1M context) via Claude Code
+ Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(admin): address feat-117 review findings (sanitize errors, lift helper, test gaps)
Apply 9 gated_auto / safe_auto review fixes to the Stage 3 bulk SQL
write reshape (feat-117):
1. Hoist `assertParallelArrayLengthsMatch` from both indexer services to
the shared `apps/admin/src/db/pgvector.ts`. Takes an error factory
`(message: string) => Error` so each caller throws its own typed
error class. Direct unit tests added in `pgvector.test.ts` covering
the equal-lengths happy path, mismatch-with-factory path, custom
error class propagation, first-mismatch short-circuit, and empty-
arrays no-op.
2. Wrap the scene-embedding service `prisma.$transaction(...)` in a
try/catch that remaps Prisma runtime errors to
`SceneIndexError("storage_failed", ...)`. Mirrors the transcript
service's posture so the bound vector literal + per-row text[]
payloads cannot leak through `error.message` into per-target
outcomes / GraphQL response bodies. New `storage_failed` arm added
to `SceneIndexError`'s code union (additive widening; workflow
classifier doesn't switch exhaustively on the union). The
`isPrismaRuntimeError` + `sanitizePrismaErrorMessage` helpers move
to a shared `apps/admin/src/db/prisma-errors.ts` module so neither
service duplicates the implementation. Regression test mirrors the
transcript-side sanitization test.
3. Add a `parent video_scene id not found` regression test: the parent
SELECT stub returns a partial id set (one sceneIndex omitted); the
indexer must throw `SceneIndexError("artifact_invalid", ...)` and
the locale `$executeRaw` must NEVER fire.
4. Rename the mislabeled "length-equality preflight throws BEFORE
$executeRaw when batched provider returns wrong count" test —
the body asserts the embeddings.length-vs-scenes.length check, NOT
`assertParallelArrayLengthsMatch`. Renamed to
"embedding count mismatch from provider throws BEFORE $executeRaw".
Direct tests of the lifted helper land in `pgvector.test.ts`.
5. Add a `$transaction` 30s-timeout assertion to the scene service test
file mirroring the transcript service's existing test.
6. Add a `generateExperienceEmbedding`/`generateExperienceEmbeddings`
not-called assertion to the transcript empty-artifact test for
structural symmetry with the scene-embedding empty-artifact test.
7. Drop the redundant `Number(row.scene_index)` coercion — Prisma maps
`Int` columns to `number` honestly via the typed `$queryRaw` shape;
the coercion implied a runtime distrust the type assertion didn't
admit.
8. Update `apps/admin/CLAUDE.md`'s R2 memory-budget paragraph to
reflect Stage 3's transient bulk-INSERT footprint
(~6-12 MB transient per active language, ~30-60 MB at concurrency=5)
instead of the prior steady-state ~1.25 MB figure.
9. Solutions-doc cross-references:
- Append `feat-117`-related siblings to
`docs/solutions/database-issues/pgvector-bulk-insert-on-conflict-pattern-20260505.md`'s
`related:` block.
- Add the bulk-insert pattern path to the `related:` block of the
three Stage 1/2 best-practice docs (per-parent memoization,
batched-provider, bounded-parallelism).
- Append `feat-117` to `related_features:` on the batched-provider doc.
- Add a "Stage 3 (feat-117) update" intro section to both
`admin-scene-embeddings-indexer-pattern.md` and
`admin-transcript-embeddings-vector-reuse-pattern.md` describing
the bulk SQL reshape and pointing at the canonical pattern doc.
Bonus (low-priority cleanup): hoist the `videoSceneIds.map` invariant
check into a named helper `resolveVideoSceneIds(prepared,
sceneIndexToId)` so the parent-id resolution is separated from the
bulk-INSERT array-build pass.
GraphQL trigger surface byte-identical (no schema changes). Per-target
`prisma.$transaction` boundary preserved. The brace-rejection P1
finding in `toPgArray` was deliberately NOT touched — it is deferred
for human judgment.
Verification:
pnpm --filter @forge/admin typecheck ✓
pnpm --filter @forge/admin lint ✓
Targeted vitest run (db/pgvector + scene-embedding + transcript-
embedding services): 61/61 passing.
Workflow tests (scene + transcript backfill): 46/46 passing.
🤖 Generated with Claude Opus 4.7 (1M context) via Claude Code
+ Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(admin): use jsonb_array_elements_text in feat-117 bulk INSERT
PG has json_array_elements_text(json) and jsonb_array_elements_text(jsonb)
— but NOT json_array_elements_text(jsonb). The R1 bulk INSERT cast each
text[] payload to ::jsonb and then called json_array_elements_text(...),
which threw 42883 "function json_array_elements_text(jsonb) does not exist"
on every scene-embedding write.
Caught by local smoke against forge_admin (PR #889 verification step):
2/2 R1 targets failed pre-fix; 2/2 succeeded post-fix with text[] columns
round-tripping correctly through Way A unfold.
Updated all sites:
- apps/admin/src/services/scene-embedding.service.ts (4 SQL sites + 2 comments)
- apps/admin/src/services/scene-embedding.service.test.ts (1 assertion + 1 comment)
- apps/admin/CLAUDE.md (1 reference)
- docs/plans/2026-05-04-002-...-plan.md (4 SQL sketch sites + 1 §Resolved)
- docs/solutions/database-issues/pgvector-bulk-insert-on-conflict-pattern-20260505.md (5 sites)
- docs/solutions/platform/admin-scene-embeddings-indexer-pattern.md (1 reference)
🤖 Generated with Claude Opus 4.7 (1M context) via Claude Code
+ Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: compound the json_array_elements_text trap from feat-117 smoke
Local smoke against forge_admin caught a parse-error 42883 regression
in the Stage 3 R1 bulk INSERT — `json_array_elements_text(jsonb)` does
NOT exist in Postgres. PG has two DISTINCT functions, NOT overloaded:
- `json_array_elements_text(json)`
- `jsonb_array_elements_text(jsonb)`
The Stage 3 PR shipped with `json_array_elements_text(... ::jsonb)`,
all mocked SQL-shape tests passed (the regex `/jsonb?_array_elements_text/`
matched both spellings), and the smoke run was the first signal.
Compounded into the existing solutions doc as:
- "What didn't work" entry calling out the function-name trap
- Prevention bullet stating real-DB smoke is mandatory before merging
any new bulk-INSERT site (mocked tests catch SHAPE, not PG function
resolution / enum case / NULL-pad behavior)
Also added a back-link from root CLAUDE.md's "Known Patterns" entry on
PG18 jsonb cast to the new bulk-insert pattern doc, plus a sibling
"Known Pattern" entry capturing the json_array_elements_text trap so
future agents grepping CLAUDE.md hit it before writing raw SQL.
Cross-references already in place from prior commits:
- Stage 2 sibling docs (per-parent-child memoization, batched-provider,
bounded-parallelism) all back-link to the new pattern doc
- Platform indexer docs (admin-scene-embeddings, admin-transcript-
embeddings) carry Stage 3 update notes pointing at the new doc
🤖 Generated with Claude Opus 4.7 (1M context) via Claude Code
+ Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(devcontainer): add Codex workspace defaults (#890)
Updates the devcontainer identity for Forge, installs the ChatGPT VS Code extension alongside Claude Code, and locks the GitHub CLI devcontainer feature to a resolved digest.
* chore(roadmap): mark feat-117 complete (PR #889 merged) (#891)
Stage 3 of the embed-backfill performance plan landed in PR #889
(merged ed2c1b65). This commit updates the roadmap surface to match:
- docs/roadmap/content-discovery/feat-117-...md: status flipped
in-progress → complete; Resolution section appended documenting
the 3-round-trip parent + locale write shape, Way A vector cast,
text[] unfold via jsonb_array_elements_text, length-equality
preflight, smoke-caught json_array_elements_text(jsonb) doesn't-
exist trap (parse error 42883), and the residual risks deferred
for follow-up (P1 toPgArray brace rejection, R2 prod-readiness
smoke, 30s txn timeout sizing).
- docs/roadmap/README.md: feat-117 row flipped in-progress → complete.
feat-118 (Stage 4 — content-hash skip) auto-unblocks; feat-119
(NoSuchKey classification gap) should ship before feat-118.
Mirrors the closure-PR shape used for feat-115 (PR #882) and
feat-116 (PR #888).
🤖 Generated with Claude Opus 4.7 (1M context) via Claude Code
+ Compound Engineering v2.52.0
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(admin): classify NoSuchKey as artifact_missing + emit missingArtifacts list (feat-119 PR1) (#892)
Replace the regex-on-error-message classifier in
apps/admin/src/services/manager-artifacts.service.ts with a typed-error
helper (`isArtifactMissing`) that branches on AWS SDK v3 typed surface
first (`error.name === "NoSuchKey" | "NotFound"`), legacy
`error.Code` second, then a tightened regex backstop
(`/not found|does not exist|ENOENT/i`). Drops the `missing` and
`no such key | NoSuchKey` regex tokens that over-matched (e.g.,
`"missing field 'foo'"` was silently demoted to skipped).
Add a deduped, sorted `missingArtifacts: ReadonlyArray<{ assetId, coreId, kind }>`
field to both R1 (sceneEmbeddingBackfill) and R2 (transcriptEmbeddingBackfill)
workflow reports. Pure projection over `outcomes[]` filtered to
`skipped { reason: "artifact_missing" }`, deduped by assetId via
Map<id, Entry> first-seen-wins, sorted ascending. The cascade still
emits L outcomes per missing (video, edition) for L locales; the
projection collapses those L copies into 1 entry per unique upstream
gap. PR2 of feat-119 will introduce a `triggerManagerEnrichment`
mutation that consumes this list directly.
Add `--report-out=<path>` flag to `pnpm run-embeds` so PR2's
`pnpm trigger-enrichment --from-report=<path>` has a stable input
file. Side-channel write failure does NOT alter exit code; report is
already on stdout.
Quality gates: 100 admin test files / 1564 tests pass (+29 new
across classifier coverage, missingArtifacts projection, --report-out
helpers). Typecheck + lint clean.
Local smoke against real S3 (manager bucket, read-only):
- 2_0-Crushing (cmsVideoId=790): pre-fix baseline 12 R2 `failed`,
post-fix 12 `skipped { artifact_missing }`, missingArtifacts.length=1.
- 2_0-ComingHome (cmsVideoId=787): same shape, distinct entry.
- Mixed run: 2 entries sorted ascending (787, 790), 24 R1 succeed
unaffected.
- Idempotency proven: re-run produces byte-identical missingArtifacts.
Behavior change worth flagging operationally: production embed-backfill
report's `failed` count drops sharply on first run after deploy (was
dominated by NoSuchKey false-failures), `skipped` rises correspondingly.
Any external dashboard or alert gated on `failed > N` should be
re-pointed to consider `failed + (skipped where artifact_missing)`.
Compounding: also captures three new solutions docs and updates two
existing ones with bidirectional cross-references:
- docs/solutions/runtime-errors/aws-s3-nosuchkey-classification-pattern-20260506.md
(PR1's typed-error pattern at the storage seam)
- docs/solutions/best-practices/workflow-report-operator-actionable-projection-pattern-20260506.md
(the deduped-projection-alongside-count-triple pattern, generalizable
beyond embed-backfill)
- docs/solutions/best-practices/mocked-shape-vs-real-contract-discipline-20260506.md
(META doc consolidating the "tests must throw the real typed shape"
rule across 4 prior worked instances: pgvector bulk-insert lesson,
verify-infra-writes-via-independent-read-path, parallel-workflow-error-robustness,
and PR1's own AWS error classification)
Refs: feat-119 (https://github.com/JesusFilm/forge/blob/main/docs/roadmap/content-discovery/feat-119-embed-backfill-artifact-missing-classification-and-opt-in-enrichment.md)
🤖 Generated with [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(manager): polish dashboard Tailwind surfaces (#887)
* feat(manager): polish dashboard tailwind surfaces
* fix(manager): compact user menu panel
* fix(manager): tighten user menu row text
* fix(manager): balance user menu line spacing
* chore(graphql): update generated introspection
* docs(manager): plan admin backend migration
* feat(manager): route backend contracts through admin
* fix(manager): harden admin backend auth
---------
Co-authored-by: Tataihono Nikora <tataihono.nikora@gmail.com>
Co-authored-by: Urim Chae <urim027@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Nisal Cottingham <114973713+Kneesal@users.noreply.github.com>
Kneesal
added a commit
that referenced
this pull request
May 6, 2026
feat-119 shipped across three stacked PRs: - PR #892 (PR1): NoSuchKey reclassification + missingArtifacts projection + run-embeds --report-out flag - PR #893 (PR2): decoupled enrichment-trigger endpoint (admin GraphQL + manager REST + CLI). First admin → manager outbound dispatch in the repo. - PR #894: compound docs follow-up — four solutions docs from formal /ce:compound Roadmap closure: status → complete, Resolution section added with PR-by-PR scope table + 8 compounded patterns + feat-118 unblock note, README row flipped. 🤖 Generated with [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
apps/admin/src/services/manager-artifacts.service.tsnow classify asskipped { reason: "artifact_missing" }instead offailed, via a new typed-error helper (isArtifactMissing) — typederror.namefirst, legacyerror.Codesecond, tightened regex/not found|does not exist|ENOENT/ias backstop.missingArtifacts: ReadonlyArray<{ assetId, coreId, kind }>field — deduped byassetId(cascade emits L outcomes per missing group; projection collapses to 1 entry), sorted ascending. Pure projection overoutcomes[], no new state.pnpm run-embedsgains--report-out=<path>so PR2'spnpm trigger-enrichment --from-report=<path>has a stable input file. Side-channel write failure does NOT alter exit code.PR1 of feat-119. PR2 (decoupled
triggerManagerEnrichmentmutation + manager REST endpoints + admin CLI) will stack on this branch.Why
Surfaced by feat-115's smoke run: 4,169
scene_index_failedoutcomes vs 1,780scene_index_complete— every single failure was the same NoSuchKey shape, meaning manager hadn't yet enriched those assets. The legacy regex matched the error MESSAGE (/not found|missing|no such key|ENOENT|NoSuchKey/i) but AWS SDK v3's textual is"The specified key does not exist."— none of those tokens match. Every NoSuchKey fell through toartifact_read_failed.Two consequences fixed:
report.failedis now meaningful (real failures only);report.skippedrises with deduped upstream gaps surfaced viamissingArtifacts.skipped_unchangedoutcome can now roll into askippedbucket that's not dishonestly empty.Key technical decisions (full plan: docs/plans/2026-05-05-001-feat-embed-backfill-classification-and-enrichment-trigger-plan.md)
instanceof ManagerArtifactError && error.code === \"artifact_missing\". The literal-unioncodeis unchanged — no new code variant needed.missing(over-matched "missing field 'foo'" bugs),no such key,NoSuchKey(typed branch covers AWS verbatim).missingArtifactsfrom outcomes; don't track separately. Pure projection — no new state, no consistency-with-outcomes question.JSONscalar return. Additive-only contract; no new Pothos type, no classification JSDoc, no leak-guard surgery.Quality gates
pnpm --filter @forge/admin typecheckcleanpnpm --filter @forge/admin lintcleanpnpm --filter @forge/admin test: 100 test files / 1564 tests pass (+29 new across classifier coverage, missingArtifacts projection, --report-out helpers)Local smoke evidence (real S3, manager bucket read-only)
2_0-Crushing(cmsVideoId=790)2_0-ComingHome(cmsVideoId=787)Pre-fix baseline for
2_0-Crushingwas 12 R2failedoutcomes with reason\"failed to read embeddings artifact for assetId=790: The specified key does not exist.\"Post-fix: those 12 cascade asskipped { reason: \"artifact_missing\" }, dedup to 1missingArtifactsentry. Classification flip proven end-to-end against real S3.Both confirmed-missing assets are missing only
embeddings.json(R2 transcript), notscene-analysis.json(R1) — so all 24 R1 targets succeed cleanly while all 24 R2 targets surface as upstream gaps with deduped, sortable entries ready for PR2's--from-reportconsumer.Smoke artifacts captured in
apps/admin/.tmp/smoke-pr1-{crushing,cominghome,mixed,mixed-2}.json(gitignored).Code review (1 round)
Spawned 7 parallel reviewers (correctness, testing, maintainability, reliability, api-contract, kieran-typescript, learnings-researcher) + agent-native-reviewer.
Fixed inline (P2 + P3 safe_auto/verifiable gated_auto):
Server returned HTTP 500withname: NoSuchKey,HTTP 404withname: NotFound) so deleting Tier 1 fails tests. Plus a Tier 3-only case and a regression-pin for the droppedno such keytoken.deriveMissingArtifacts, properly typedMockInstance<typeof s3.readManagerArtifact>(droppedany+ eslint-disable), tightened ENOTDIR/EEXIST assertion, refactored brittlemockRejectedValueOncechains to assetId-keyedmockImplementation, added "Events emitted" JSDoc section torun-embeds.ts, JSDoc clarifyingassetId === BackfillTarget.cmsVideoId.Deferred (advisory / out-of-scope):
isStorageMissingErrorincore-id-mapping.service.ts(real consolidation opportunity but out-of-scope for this PR; flag as follow-up).writeReportToPath(theoretical NFS hang; report is on stdout anyway).MissingArtifactto a shared module — defer to PR2, which is the consumer that will tell us if shared shape is ergonomic.Production embed-backfill report's
failedcount will drop sharply on first run after deploy (was dominated by NoSuchKey false-failures),skippedrises correspondingly. If any external dashboard or alert is gated onreport.failed > N(especially auto-retry or paging), re-point it to considerfailed + (skipped where reason=artifact_missing)to preserve old semantics during the rollout.In-repo: searched
apps/manager(the trigger proxy callers) and the GraphQL mutation layer — no in-repo threshold logic onfailedis wired up today. The risk is in operator-built dashboards outside the repo.Compounding (3 new docs + 5 cross-link updates)
/ce:compoundran during this PR and captured the patterns while fresh:docs/solutions/runtime-errors/aws-s3-nosuchkey-classification-pattern-20260506.md— typed-name first / legacy Code second / regex backstop third, tests-must-throw-real-shape rule, mocked-vs-real lesson.docs/solutions/best-practices/workflow-report-operator-actionable-projection-pattern-20260506.md— generalizable beyond embed-backfill: when a count triple accumulates duplicate signals via cascade, surface a deduped+sorted projection by stable id alongside the count.docs/solutions/best-practices/mocked-shape-vs-real-contract-discipline-20260506.md— META doc consolidating the "mocks prove BRANCH SHAPE; real fixtures prove PRODUCTION CONTRACT" rule across 4 worked instances (PG function resolution, AWS error shapes, in-house typed errors, Railway MCP staging).CLAUDE.mdKnown Patterns updated with both new top-level entries.Pre-merge prod-readiness checklist
schema.test.tsembed|vector|similaritleak guard stays greenreport.failed > Nreviewed (seePost-Deploy Monitoring & Validation
event=scene_index_failedandevent=transcript_index_failedcounts in admin's structured logs; expect dramatic drop (was dominated by NoSuchKey).triggerSceneEmbeddingBackfill/triggerTranscriptEmbeddingBackfillreturn value should now includemissingArtifacts: [...].gh pr view <pr> --json title,bodyto confirm PR description landed.event=startwithgroupCount,event=scene_index_skipped/transcript_index_skippedwithreason: \"artifact_missing\"(NOT*_failed).missingArtifactsfield; dedup proven bylength <= unique-cmsVideoId-count.report.failednear zero (only genuine errors);report.skippedmatches the corpus of assets manager hasn't yet enriched.missingArtifacts.lengthdeduped per asset across all locales/languages.report.failedrises unexpectedly post-deploy → revert PR (pure code revert).missingArtifactsis undefined or empty whenskipped > 0 with reason=artifact_missing→ projection bug, revert.Test plan
pnpm --filter @forge/admin typecheck && lint && testclean (1564 tests, +29 new)2_0-Crushing(confirmed-missing):failed=0,skipped=12,missingArtifacts.length=12_0-ComingHome(confirmed-missing): same shapemissingArtifacts--report-out=<path>produces a JSON file consumable by PR2's--from-report🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via Claude Code