feat(cms): add experiences to search results (feat-086) by Kneesal · Pull Request #777 · JesusFilm/forge

Kneesal · 2026-04-15T11:34:52Z

Summary

Extends the hybrid search API to return experience results alongside videos. Searching Easter no longer just returns videos about Easter bunnies — the dedicated Easter experience page surfaces too.

4-list RRF fusion: orchestrator runs up to 4 parallel retrievals (video semantic, video keyword, experience semantic, experience keyword) via Promise.allSettled. Empty lists are filtered before fusion so RRF normalization stays accurate.
Compound identity key: fuseRankedLists now keys by \${resultType}:\${resultId} instead of videoId, so a video with id=4 and an experience with id=4 no longer collide.
Type guard on dedup: the 3-layer video dedup (core_id prefix, title match, embedding similarity) skips non-video results — experiences pass through untouched.
Type filter parameter on REST (?type=video|experience) and GraphQL (semanticSearch(type: ...)). Omitted = both. Invalid = 400 / BAD_USER_INPUT. When a single type is requested only its retrievals fire.
No backward-compat breakage: existing consumers see no breaking changes. startSeconds and playbackId were already nullable on SearchResult.

Plan: docs/plans/2026-04-15-001-feat-experience-search-integration-plan.md
Roadmap: feat-086 (status → complete)

Testing

113 search-related tests pass (33 fusion + 20 orchestrator + 8 experience-semantic + 10 experience-keyword + 13 controller + 16 GraphQL + 7 video keyword + 6 video semantic)
Full CMS suite: 284/284 passing — no regressions
TypeScript: clean
ESLint: clean

Key new test coverage:

fusion.test.ts: compound key prevents video↔experience id collision; dedup type guard preserves experiences with shared titles; mixed video+experience dedup still drops video duplicates
search.test.ts: `contentTypes` filter (video-only, experience-only, both, empty array fallback); empty lists filtered before fusion; mixed result mapping; collision case
controllers/search.test.ts: `?type=video`, `?type=experience`, omitted, invalid (400), empty string (defaults)
graphql/search.test.ts: `type` argument forwarded; invalid type throws `GraphQLError` with `BAD_USER_INPUT`

Manual verification (post-deploy)

```bash

Both content types (default)

curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en'

Videos only — backward-compatible behavior matches v1

curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en&type=video'

Experiences only

curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en&type=experience'

Invalid type — should return 400

curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en&type=invalid'

GraphQL

curl -X POST https://cms.jesusfilm.org/graphql -H 'Content-Type: application/json' \
-d '{"query":"{ semanticSearch(query:\"Easter\", locale:\"en\", type:\"experience\") { results { type id slug title } } }"}'
```

Expected: experience search returns the `easter` and `christmas` experiences (the two embedded by feat-095/096) when the query is thematically relevant; videos still return as before; mixed results interleave by RRF score.

Out of scope

Experience image URL is `null` in v1 — `og_image` is a Strapi media relation requiring a multi-table join through `files_related_morphs`. Plan defers this; downstream consumers can treat `imageUrl` as nullable as already documented in the contract.
Within-experience dedup (currently skipped — only 2 experiences exist; revisit if volume grows)
`schema.graphql` is auto-generated by Strapi at build time and will reflect the new `type` argument and updated `SearchResult` descriptions on next deploy. Not committed here because regeneration requires a live DB connection.

Post-Deploy Monitoring & Validation

What to monitor/search
- Logs: `[search]` prefixed log lines in CMS Railway logs. Watch for `semantic-experience` and `keyword-experience` retrieval failures.
- Metrics/Dashboards: CMS request latency for `/api/search` (p95 < 500ms target).
Validation checks (queries/commands)
- `curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en'\` — should include `type: "experience"` results
- `curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en&type=video'\` — should be byte-equivalent to pre-merge response
- `curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en&type=invalid'\` — should return 400
Expected healthy behavior
- "Easter" / "Christmas" queries return the corresponding experience as a top result alongside thematically relevant videos
- Response time p95 stays < 500ms (4 parallel queries should not exceed 2-query baseline by much; experience SQL is simpler than video SQL)
- `type=video` returns identical results to pre-feat-086 (regression check)
Failure signal(s) / rollback trigger
- p95 latency > 1s sustained for 5+ minutes → rollback
- `semantic-experience` or `keyword-experience` retrieval failure rate > 5% over 10 minutes → investigate (may indicate `experience_embeddings` table missing or pgvector index issue)
Validation window & owner
- Window: 24h post-deploy
- Owner: nisal

🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via Claude Code

Extends the hybrid search API to return experience results alongside videos. The orchestrator now runs up to 4 parallel retrievals (video semantic, video keyword, experience semantic, experience keyword) and fuses them via RRF with a compound `${resultType}:${resultId}` identity key so video id=4 and experience id=4 no longer collide. The 3-layer video dedup (core_id prefix, title match, embedding similarity) skips non-video results — experiences pass through untouched. Adds an optional `type` filter to both REST (`?type=video|experience`) and GraphQL (`semanticSearch(type: ...)`). Omitted = both. Invalid = 400 / `BAD_USER_INPUT`. When a single type is requested only its retrievals fire, and empty result lists are filtered before fusion so RRF score normalization stays accurate. Plan: docs/plans/2026-04-15-001-feat-experience-search-integration-plan.md 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>

railway-app · 2026-04-15T11:35:07Z

🚅 Deployed to the forge-pr-777 environment in forge

Service	Status	Web	Updated (UTC)
@forge/cms	✅ Success (View Logs)		Apr 15, 2026 at 10:23 pm

2 services not affected by this PR

@forge/web
@forge/manager

…ation Discovered while validating feat-086 against production: every search query returns rank-1 score = 0.500 exactly, with no scene-level data. That's the mathematical signature of single-list RRF when keyword contributes alone (semantic returning empty or non-overlapping). Local runs against the same code return rich semantic results, so the issue is environment-specific (likely OPENROUTER_API_KEY in Railway). Tracked: #778 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>

Round 1 of /ce:review surfaced 9 safe_auto items; round 2 surfaced 1. All applied here. Substantive findings (discriminated union refactor, HNSW EXPLAIN ANALYZE, graphql-env.d.ts regeneration) are tracked as follow-ups, not blockers for this PR. - fusion.ts: rename inner-loop `key` to `propKey` to avoid shadowing the compound identity key (correctness P3) - experience-keyword-search.ts: fix misleading comment that cited UNIQUE constraint on the wrong table (correctness P3) - search.ts: extract `isContentType` and `ALL_CONTENT_TYPES` so REST + GraphQL share one source of truth (maintainability P2) - controllers/search.ts + graphql/search.ts: import the shared guard instead of redefining locally - new tests: empty-string GraphQL `type` arg, experience-only + embedQuery failure combo, experience-semantic knex.raw rejection propagation, score rounding with non-rounded value, resultType / resultId survival on property merge - test mocks: use vi.mock with importOriginal so isContentType (newly shared) is preserved alongside the search() mock - rename misleading test (round 2 fix) All 287 CMS tests pass. Lint and typecheck clean. 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>

The previous shape — `WHERE locale = ? ORDER BY embedding <=> ?::vector LIMIT ?` directly on `experience_embeddings` — defeats the planner's HNSW cost model. Locally on a 10K-row synthetic table, the planner picked Seq Scan + Top-N Sort (19.8ms) instead of the HNSW index (1.5ms when forced). Adding a JOIN to `experiences` made it worse (178ms). This was flagged as a P2 gated_auto finding in /ce:review and confirmed empirically with EXPLAIN ANALYZE. Two coordinated changes fix it without changing the SQL itself: 1. `bootstrap/ensure-pgvector.ts` — add per-locale partial HNSW indexes for the Phase 1 locales (en, es, fr). The planner picks these automatically when the WHERE clause matches, so no SQL hints are needed. The existing global HNSW is kept as a fallback for unknown locales (graceful degradation to seqscan + still functional). 2. `config/database.ts` — extend the knex `afterCreate` hook to set `hnsw.iterative_scan = relaxed_order` and `hnsw.max_scan_tuples = 20000`. These let the partial index keep fetching past the default ef_search window so LIMIT can be satisfied even when the inner HNSW pass returns fewer rows than requested. Set once per connection, zero per-query overhead. Verified with `EXPLAIN ANALYZE` against a 10K-row table + JOIN to experiences: - locale=en (with partial index): Index Scan + Nested Loop, 1.90ms - locale=de (no partial index): Seq Scan + Hash Join, 17.55ms - baseline (before fix): Seq Scan + Sort, 178.00ms To support a new locale efficiently, add another partial index in ensure-pgvector.ts. 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>

… + HNSW filter perf Two compounding learnings from PR #777 (feat-086): - best-practices/rrf-fusion-heterogeneous-content-types-20260415.md Two non-obvious traps when extending RRF from one content type to many: ID collisions across types (e.g., video id=4 vs experience id=4) and score dilution from empty input lists. Both produce silent ranking bugs. Fix: compound `${resultType}:${resultId}` Map key, filter empty lists before fusion. Generalizes to any N-way score aggregator (Borda, Comb-SUM). - performance-issues/pgvector-hnsw-index-bypass-with-where-filter-20260415.md pgvector's HNSW index is silently bypassed when WHERE filters on the same table — planner cost model is too pessimistic. Documents 5 strategies tested with EXPLAIN ANALYZE on a 10K-row synthetic table; the working fix is per-locale partial HNSW indexes + `hnsw.iterative_scan = relaxed_order` GUC at connection level. 94× faster than baseline, no SQL changes needed. Both surface refresh candidates in the existing pgvector best-practice docs that recommend HNSW without discussing filter degradation. A follow-up `ce:compound-refresh pgvector` would add cross-references. 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>

… caveats Cross-reference and caveat sweep across 4 existing pgvector docs to incorporate the HNSW + WHERE filter learning shipped in PR #777. None of these docs were wrong — they were incomplete in a way that would mislead future readers building locale-filtered (or otherwise column-filtered) HNSW queries. Updates applied: - best-practices/pgvector-embedding-indexing-strapi-v5.md Section 5 ("Use HNSW over IVFFlat") gained a "Caveat — filtered queries" subsection pointing to the partial-index fix. Adds the new perf doc to `related:`. - best-practices/pgvector-recommendation-query-locale-graphql-strapi-v5.md The "45ms ... HNSW handles efficiently" claim now explains *why* it stays efficient (locale filter is on a JOINed table, not the embedding table) and adds an explicit watch-out for schemas where locale lives on the embedding row. - best-practices/experience-embedding-pipeline-pgvector-strapi-v5-20260414.md The HNSW index note now explains that the global index is sufficient for the write side but query-side locale filtering needs partial indexes; references the perf doc and the RRF heterogeneous-types doc. - best-practices/hybrid-semantic-search-api-strapi-v5-pgvector.md The "Locale-aware via link-table join chain" decision now explains why the JOIN approach keeps HNSW happy and what changes when a new content type stores locale on the row. Updates the related-docs section with both new compound learnings. All 4 docs gained `last_updated: 2026-04-15` in frontmatter. 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>

github-actions Bot added cms feat labels Apr 15, 2026

github-actions Bot assigned Kneesal Apr 15, 2026

railway-app Bot temporarily deployed to forge / forge-pr-777 April 15, 2026 11:35 Destroyed

Kneesal mentioned this pull request Apr 15, 2026

[feat-097] Production semantic search query embedding silently degraded — keyword-only fallback #778

Open

Kneesal and others added 2 commits April 15, 2026 21:35

railway-app Bot temporarily deployed to forge / forge-pr-777 April 15, 2026 21:58 Destroyed

railway-app Bot temporarily deployed to forge / forge-pr-777 April 15, 2026 22:19 Destroyed

Kneesal and others added 2 commits April 15, 2026 22:29

Kneesal merged commit 65aa99d into main Apr 15, 2026
28 checks passed

Kneesal deleted the feat/experience-search-integration branch April 15, 2026 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cms): add experiences to search results (feat-086)#777

feat(cms): add experiences to search results (feat-086)#777
Kneesal merged 6 commits into
mainfrom
feat/experience-search-integration

Kneesal commented Apr 15, 2026

Uh oh!

railway-app Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kneesal commented Apr 15, 2026

Summary

Testing

Manual verification (post-deploy)

Both content types (default)

Videos only — backward-compatible behavior matches v1

Experiences only

Invalid type — should return 400

GraphQL

Out of scope

Post-Deploy Monitoring & Validation

Uh oh!

railway-app Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

railway-app Bot commented Apr 15, 2026 •

edited

Loading