Skip to content

feat(graphile-pg-textsearch-plugin): auto-discover BM25 indexes with condition, score, and orderBy#769

Merged
pyramation merged 3 commits intomainfrom
devin/1772486156-pg-textsearch-plugin
Mar 2, 2026
Merged

feat(graphile-pg-textsearch-plugin): auto-discover BM25 indexes with condition, score, and orderBy#769
pyramation merged 3 commits intomainfrom
devin/1772486156-pg-textsearch-plugin

Conversation

@pyramation
Copy link
Copy Markdown
Contributor

@pyramation pyramation commented Mar 2, 2026

feat(graphile-pg-textsearch-plugin): auto-discover BM25 indexes with condition, score, and orderBy

Summary

New PostGraphile v5 plugin package (graphile-pg-textsearch-plugin) for pg_textsearch BM25 ranked full-text search. Follows the same architecture pattern as graphile-search-plugin (tsvector) and graphile-pgvector-plugin (vector).

Gather phase (Bm25CodecPlugin):

  • Registers a codec for the bm25query PostgreSQL type
  • Runs a discovery query against pg_am/pg_index to find all columns with BM25 indexes (WHERE am.amname = 'bm25')
  • Stores results in a module-level Map<string, Bm25IndexInfo> bridging gather → schema build

Schema build phase (Bm25SearchPlugin):

  • Adds bm25<Column> condition fields on connections (accepts { query, threshold? })
  • Adds bm25<Column>Score computed fields on output types (returns BM25 score when condition active, null otherwise)
  • Adds BM25_<COLUMN>_SCORE_ASC/DESC orderBy enum values
  • Uses WeakMap-based score slot pattern (matching tsvector/pgvector plugins) to bridge SQL generation phase to execution phase

Integration:

  • Wired into ConstructivePreset in graphile-settings
  • Includes integration tests (require pyramation/postgres:17 Docker image with pg_textsearch pre-installed)

Zero config — just add Bm25SearchPreset() to your preset and all BM25-indexed columns automatically get search capabilities.

Updates since last revision

  • Added logo header and badges to README: README now follows the same pattern as graphile-search-plugin with centered Constructive logo, CI status badge, license badge, and npm version badge. Also expanded with full usage examples (preset, plugin, GraphQL queries with variables, orderBy), requirements section with BM25 index creation SQL, and testing instructions.
  • Removed global bm25Matches connection filter operator (previous revision): The earlier version registered a bm25Matches filter operator globally on String type, which added it to StringFilter for ALL string columns — not just BM25-indexed ones. This broke introspection snapshots in graphql/test and graphql/server-test. Removed entirely; the bm25<Column> condition fields are the proper BM25 search interface and are correctly scoped to only BM25-indexed columns.
  • CI passes: All 42 checks green.

Review & Testing Checklist for Human

⚠️ Critical items to verify:

  • pg_textsearch SQL syntax correctness: Verify that column <@> to_bm25query(query, index_name) matches the actual pg_textsearch extension API. Check that BM25 scores are indeed negative (lower = more relevant) and the operator behavior matches assumptions in bm25-search.ts lines 395-460.

  • Gather phase DB access pattern: The plugin accesses the database in pgIntrospection_introspection hook via info.resolvedPreset?.pgServices (lines 110-112 in bm25-codec.ts). This is a non-standard approach not used by other codec plugins — verify it works across PostGraphile v5 environments.

  • Condition field behavior without threshold: The bm25<Column> condition field does NOT add a WHERE clause when threshold is omitted — it only adds the score to the SELECT list. This means ALL rows are returned (with scores). Is this the desired behavior, or should it filter to only matching documents by default?

  • Module-level state safety: bm25IndexStore and bm25ScoreSlots are module-level singletons. Verify these don't cause issues in multi-tenant or test environments where multiple PostGraphile instances may share the same module.

Test Plan

  1. CI verification: All 42 checks passing against pyramation/postgres:17 Docker image.
  2. Manual smoke test (if you have pg_textsearch locally):
    • Create a table with a text column + BM25 index: CREATE INDEX articles_body_idx ON articles USING bm25(body) WITH (text_config='english');
    • Verify GraphQL schema includes bm25Body condition field, bm25BodyScore computed field, and BM25_BODY_SCORE_ASC/DESC orderBy values
    • Run a search query and verify scores are negative and results are ranked correctly
  3. Multi-instance safety: If using PostGraphile in multi-tenant or test environments, verify module-level state (bm25IndexStore) doesn't leak between instances

Notes

  • BM25 scores are negative values (more negative = more relevant), unlike vector cosine distance (lower = more similar). OrderBy defaults to ASC (most negative first = best matches first).
  • The plugin gracefully skips if pg_textsearch is not installed (catches errors in gather phase).
  • pnpm-lock.yaml has massive formatting changes (~600K lines) due to YAML style normalization — this is noise from adding a new workspace package.

Session: https://app.devin.ai/sessions/7f9b6b69895d49b7b3f7fddf52f5d4f6
Requested by: @pyramation

…condition, score, orderBy, and filter

New PostGraphile v5 plugin for pg_textsearch (BM25 ranked full-text search).

Auto-discovers all text columns with BM25 indexes and provides:
- bm25<Column> condition fields on connections (BM25 ranked search)
- bm25<Column>Score computed fields on output types (negative BM25 scores)
- BM25_<COLUMN>_SCORE_ASC/DESC orderBy enum values
- bm25Matches connection filter operator for String scalar
- Bm25SearchPreset for zero-config integration

Also wires the plugin into ConstructivePreset in graphile-settings.
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@devin-ai-integration devin-ai-integration Bot changed the title feat(graphile-pg-textsearch-plugin): auto-discover BM25 indexes with condition, score, orderBy, and filter feat(graphile-pg-textsearch-plugin): auto-discover BM25 indexes with condition, score, and orderBy Mar 2, 2026
@pyramation pyramation merged commit c432832 into main Mar 2, 2026
44 checks passed
@pyramation pyramation deleted the devin/1772486156-pg-textsearch-plugin branch March 2, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant