feat(cms): add experiences to search results (feat-086)#777
Merged
Conversation
Extends the hybrid search API to return experience results alongside
videos. The orchestrator now runs up to 4 parallel retrievals (video
semantic, video keyword, experience semantic, experience keyword) and
fuses them via RRF with a compound `${resultType}:${resultId}` identity
key so video id=4 and experience id=4 no longer collide. The 3-layer
video dedup (core_id prefix, title match, embedding similarity) skips
non-video results — experiences pass through untouched.
Adds an optional `type` filter to both REST (`?type=video|experience`)
and GraphQL (`semanticSearch(type: ...)`). Omitted = both. Invalid =
400 / `BAD_USER_INPUT`. When a single type is requested only its
retrievals fire, and empty result lists are filtered before fusion so
RRF score normalization stays accurate.
Plan: docs/plans/2026-04-15-001-feat-experience-search-integration-plan.md
🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0
Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>
|
🚅 Deployed to the forge-pr-777 environment in forge
2 services not affected by this PR
|
…ation Discovered while validating feat-086 against production: every search query returns rank-1 score = 0.500 exactly, with no scene-level data. That's the mathematical signature of single-list RRF when keyword contributes alone (semantic returning empty or non-overlapping). Local runs against the same code return rich semantic results, so the issue is environment-specific (likely OPENROUTER_API_KEY in Railway). Tracked: #778 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>
Round 1 of /ce:review surfaced 9 safe_auto items; round 2 surfaced 1. All applied here. Substantive findings (discriminated union refactor, HNSW EXPLAIN ANALYZE, graphql-env.d.ts regeneration) are tracked as follow-ups, not blockers for this PR. - fusion.ts: rename inner-loop `key` to `propKey` to avoid shadowing the compound identity key (correctness P3) - experience-keyword-search.ts: fix misleading comment that cited UNIQUE constraint on the wrong table (correctness P3) - search.ts: extract `isContentType` and `ALL_CONTENT_TYPES` so REST + GraphQL share one source of truth (maintainability P2) - controllers/search.ts + graphql/search.ts: import the shared guard instead of redefining locally - new tests: empty-string GraphQL `type` arg, experience-only + embedQuery failure combo, experience-semantic knex.raw rejection propagation, score rounding with non-rounded value, resultType / resultId survival on property merge - test mocks: use vi.mock with importOriginal so isContentType (newly shared) is preserved alongside the search() mock - rename misleading test (round 2 fix) All 287 CMS tests pass. Lint and typecheck clean. 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>
The previous shape — `WHERE locale = ? ORDER BY embedding <=> ?::vector LIMIT ?` directly on `experience_embeddings` — defeats the planner's HNSW cost model. Locally on a 10K-row synthetic table, the planner picked Seq Scan + Top-N Sort (19.8ms) instead of the HNSW index (1.5ms when forced). Adding a JOIN to `experiences` made it worse (178ms). This was flagged as a P2 gated_auto finding in /ce:review and confirmed empirically with EXPLAIN ANALYZE. Two coordinated changes fix it without changing the SQL itself: 1. `bootstrap/ensure-pgvector.ts` — add per-locale partial HNSW indexes for the Phase 1 locales (en, es, fr). The planner picks these automatically when the WHERE clause matches, so no SQL hints are needed. The existing global HNSW is kept as a fallback for unknown locales (graceful degradation to seqscan + still functional). 2. `config/database.ts` — extend the knex `afterCreate` hook to set `hnsw.iterative_scan = relaxed_order` and `hnsw.max_scan_tuples = 20000`. These let the partial index keep fetching past the default ef_search window so LIMIT can be satisfied even when the inner HNSW pass returns fewer rows than requested. Set once per connection, zero per-query overhead. Verified with `EXPLAIN ANALYZE` against a 10K-row table + JOIN to experiences: - locale=en (with partial index): Index Scan + Nested Loop, 1.90ms - locale=de (no partial index): Seq Scan + Hash Join, 17.55ms - baseline (before fix): Seq Scan + Sort, 178.00ms To support a new locale efficiently, add another partial index in ensure-pgvector.ts. 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>
… + HNSW filter perf Two compounding learnings from PR #777 (feat-086): - best-practices/rrf-fusion-heterogeneous-content-types-20260415.md Two non-obvious traps when extending RRF from one content type to many: ID collisions across types (e.g., video id=4 vs experience id=4) and score dilution from empty input lists. Both produce silent ranking bugs. Fix: compound `${resultType}:${resultId}` Map key, filter empty lists before fusion. Generalizes to any N-way score aggregator (Borda, Comb-SUM). - performance-issues/pgvector-hnsw-index-bypass-with-where-filter-20260415.md pgvector's HNSW index is silently bypassed when WHERE filters on the same table — planner cost model is too pessimistic. Documents 5 strategies tested with EXPLAIN ANALYZE on a 10K-row synthetic table; the working fix is per-locale partial HNSW indexes + `hnsw.iterative_scan = relaxed_order` GUC at connection level. 94× faster than baseline, no SQL changes needed. Both surface refresh candidates in the existing pgvector best-practice docs that recommend HNSW without discussing filter degradation. A follow-up `ce:compound-refresh pgvector` would add cross-references. 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>
… caveats Cross-reference and caveat sweep across 4 existing pgvector docs to incorporate the HNSW + WHERE filter learning shipped in PR #777. None of these docs were wrong — they were incomplete in a way that would mislead future readers building locale-filtered (or otherwise column-filtered) HNSW queries. Updates applied: - best-practices/pgvector-embedding-indexing-strapi-v5.md Section 5 ("Use HNSW over IVFFlat") gained a "Caveat — filtered queries" subsection pointing to the partial-index fix. Adds the new perf doc to `related:`. - best-practices/pgvector-recommendation-query-locale-graphql-strapi-v5.md The "45ms ... HNSW handles efficiently" claim now explains *why* it stays efficient (locale filter is on a JOINed table, not the embedding table) and adds an explicit watch-out for schemas where locale lives on the embedding row. - best-practices/experience-embedding-pipeline-pgvector-strapi-v5-20260414.md The HNSW index note now explains that the global index is sufficient for the write side but query-side locale filtering needs partial indexes; references the perf doc and the RRF heterogeneous-types doc. - best-practices/hybrid-semantic-search-api-strapi-v5-pgvector.md The "Locale-aware via link-table join chain" decision now explains why the JOIN approach keeps HNSW happy and what changes when a new content type stores locale on the row. Updates the related-docs section with both new compound learnings. All 4 docs gained `last_updated: 2026-04-15` in frontmatter. 🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via [Claude Code](https://claude.com/claude-code) + Compound Engineering v2.52.0 Co-Authored-By: Claude Opus 4.6 (1M context, extended thinking) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the hybrid search API to return experience results alongside videos. Searching
Easterno longer just returns videos about Easter bunnies — the dedicated Easter experience page surfaces too.Promise.allSettled. Empty lists are filtered before fusion so RRF normalization stays accurate.fuseRankedListsnow keys by\${resultType}:\${resultId}instead ofvideoId, so a video with id=4 and an experience with id=4 no longer collide.?type=video|experience) and GraphQL (semanticSearch(type: ...)). Omitted = both. Invalid = 400 /BAD_USER_INPUT. When a single type is requested only its retrievals fire.startSecondsandplaybackIdwere already nullable onSearchResult.Plan: docs/plans/2026-04-15-001-feat-experience-search-integration-plan.md
Roadmap: feat-086 (status → complete)
Testing
Key new test coverage:
fusion.test.ts: compound key prevents video↔experience id collision; dedup type guard preserves experiences with shared titles; mixed video+experience dedup still drops video duplicatessearch.test.ts: `contentTypes` filter (video-only, experience-only, both, empty array fallback); empty lists filtered before fusion; mixed result mapping; collision casecontrollers/search.test.ts: `?type=video`, `?type=experience`, omitted, invalid (400), empty string (defaults)graphql/search.test.ts: `type` argument forwarded; invalid type throws `GraphQLError` with `BAD_USER_INPUT`Manual verification (post-deploy)
```bash
Both content types (default)
curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en'
Videos only — backward-compatible behavior matches v1
curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en&type=video'
Experiences only
curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en&type=experience'
Invalid type — should return 400
curl 'https://cms.jesusfilm.org/api/search?q=Easter&locale=en&type=invalid'
GraphQL
curl -X POST https://cms.jesusfilm.org/graphql -H 'Content-Type: application/json' \
-d '{"query":"{ semanticSearch(query:\"Easter\", locale:\"en\", type:\"experience\") { results { type id slug title } } }"}'
```
Expected: experience search returns the `easter` and `christmas` experiences (the two embedded by feat-095/096) when the query is thematically relevant; videos still return as before; mixed results interleave by RRF score.
Out of scope
Post-Deploy Monitoring & Validation
🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via Claude Code