Skip to content

feat: add graphile-pg-trgm-plugin for pg_trgm fuzzy text matching#809

Open
pyramation wants to merge 30 commits intomainfrom
devin/1773379394-pg-trgm-plugin
Open

feat: add graphile-pg-trgm-plugin for pg_trgm fuzzy text matching#809
pyramation wants to merge 30 commits intomainfrom
devin/1773379394-pg-trgm-plugin

Conversation

@pyramation
Copy link
Contributor

@pyramation pyramation commented Mar 13, 2026

Summary

This PR adds graphile-pg-trgm-plugin, a new PostGraphile v5 plugin that provides pg_trgm trigram-based fuzzy text matching capabilities to the connection filter system. It's the 7th plugin type integrated into the ConstructivePreset, joining tsvector, BM25, pgvector, PostGIS, relation filters, and scalar filters.

New Package: graphile-pg-trgm-plugin

  • similarTo / wordSimilarTo operators on StringFilter for fuzzy matching with configurable thresholds
  • trgm<Column> filter fields directly on connection filter types (e.g., trgmName, trgmDescription)
  • <column>Similarity computed fields returning match quality scores (0-1 range, null when no trgm filter active)
  • SIMILARITY_<COLUMN>_ASC/DESC orderBy enum values for ranking by match quality
  • connectionFilterTrgmRequireIndex option (default: false) — optionally restrict trgm operators to GIN-indexed columns only
  • 14 dedicated tests covering operators, score fields, composability, and NULL semantics

Integration:

  • Wired into ConstructivePreset via TrgmSearchPreset()
  • Added pg_trgm extension + GIN trigram index to integration test seed SQL
  • Updated mega query test to include trgmName filter as 7th plugin type — now exercises all 7 filter plugin types in a single GraphQL query

Plugin Architecture:

  • Follows the same pattern as graphile-pg-textsearch-plugin (BM25)
  • Uses Grafast meta system (setMeta/getMeta) to pass similarity score indices between filter apply and computed field plan phases
  • Registers operators via addConnectionFilterOperator API from graphile-connection-filter
  • Auto-discovers text columns and registers filter fields + computed fields dynamically

Updates since last revision

  • Multi-signal orderBy added to mega query test — the mega query now uses orderBy: [BM25_BODY_SCORE_ASC, SIMILARITY_NAME_DESC] to demonstrate combining multiple scoring signals in a single query, with an assertion verifying BM25 ASC ordering.
  • Comprehensive JSDoc documentation added to the mega query test (~80 lines) explaining all 7 plugin types, the 2-phase meta system for orderBy, and score field semantics.
  • Documented an important ordering subtlety: ORDER BY priority in multi-signal queries follows the schema's internal field processing order (which filter apply runs first), not the orderBy array order. The array determines which signals are active and their direction (ASC/DESC), but the SQL clause sequence depends on which filter plugin's apply function executes first during schema evaluation. This is inherent to the 2-phase meta architecture and is thoroughly documented in the test comments.
  • Schema snapshots updated in graphql/test and graphql/server-test — the trgm plugin adds similarTo/wordSimilarTo to every StringFilter, *Similarity computed fields to every type with text columns, and SIMILARITY_* orderBy enums.
  • CI is 42/42 green.

Review & Testing Checklist for Human

⚠️ Dependency Chain: This PR is based on PR #807, which builds on #797#801#804#805#806. The cumulative diff includes all those changes, but the core new work in this PR is the graphile-pg-trgm-plugin package and its integration into the mega query.

  • Schema bloat from auto-discovery: The plugin adds similarTo/wordSimilarTo to StringFilter (global) and <col>Similarity computed fields + trgm<Col> filter fields + SIMILARITY_* orderBy enums to every table with text columns. Review whether this level of auto-discovery is appropriate or if it should be gated by a behavior flag (e.g., attributeTrgmSimilarity). The updated snapshots show the full extent of schema additions.

  • Multi-signal orderBy semantics: The mega query demonstrates combining BM25 + pg_trgm orderBy signals, but note the documented subtlety: the orderBy array order does NOT determine SQL ORDER BY priority. Instead, priority follows the schema's internal field processing order (which filter apply runs first). Review whether this behavior is acceptable or if the 2-phase meta system should be redesigned to respect array order. Test by running queries with different orderBy array orders and verifying SQL generation.

  • Meta system correctness: Verify trgm-search.ts correctly uses setMeta / getMeta to pass similarity score indices. Check that concurrent queries with multiple trgm filters don't collide (e.g., filtering on both name and description in the same query).

  • Performance impact: The default connectionFilterTrgmRequireIndex: false means any text column can be fuzzy-matched without a GIN index. On large tables, this could be slow. Review whether the default should be true for production.

Recommended Test Plan:

# 1. Run dedicated pg_trgm tests
eval $(pgpm env) && cd graphile/graphile-pg-trgm-plugin && npx jest

# 2. Run integration tests (includes mega query with multi-signal orderBy)
cd ../graphile-settings && npx jest preset-integration

# 3. (Optional) Manually verify multi-signal orderBy behavior via GraphiQL
# Run queries with different orderBy array orders and inspect the SQL to
# confirm ORDER BY clause sequence follows schema field processing order
# rather than array order.

Notes

Implements a from-scratch PostGraphile v5 native connection filter plugin,
replacing the upstream postgraphile-plugin-connection-filter dependency.

New package: graphile/graphile-connection-filter/

Plugin architecture (7 plugins):
- ConnectionFilterInflectionPlugin: filter type naming conventions
- ConnectionFilterTypesPlugin: registers per-table and per-scalar filter types
- ConnectionFilterArgPlugin: injects filter arg on connections via applyPlan
- ConnectionFilterAttributesPlugin: adds per-column filter fields
- ConnectionFilterOperatorsPlugin: standard/sort/pattern/jsonb/inet/array/range operators
- ConnectionFilterCustomOperatorsPlugin: addConnectionFilterOperator API for satellite plugins
- ConnectionFilterLogicalOperatorsPlugin: and/or/not logical composition

Key features:
- Full v5 native: uses Grafast planning, PgCondition, codec system, behavior registry
- EXPORTABLE pattern for schema caching
- Preserves addConnectionFilterOperator API for PostGIS, search, pgvector, textsearch plugins
- No relation filter plugins (simplifies configuration vs upstream)
- Preset factory: ConnectionFilterPreset(options)

Also updates graphile-settings to use the new workspace package.
…Operator

filterType is for table-level filter types (UserFilter), while filterFieldType
is for scalar operator types (StringFilter). Satellite plugins pass scalar type
names, so the lookup must use filterFieldType to match the registration in
ConnectionFilterTypesPlugin. Previously worked by coincidence since both
inflections produce the same output, but would silently fail if a consumer
overrode one inflection but not the other.
Adds computed column filter support — allows filtering on PostgreSQL functions
that take a table row as their first argument and return a scalar.

Controlled by connectionFilterComputedColumns schema option. The preset factory
includes the plugin only when the option is truthy (default in preset: true,
but constructive-preset sets it to false).
- Remove phantom postgraphile-plugin-connection-filter dep from graphile-pgvector-plugin (never used)
- Remove phantom postgraphile-plugin-connection-filter dep from graphile-pg-textsearch-plugin (never used)
- Update graphile-plugin-connection-filter-postgis to use graphile-connection-filter workspace dep with typed imports
- Update graphile-search-plugin to use graphile-connection-filter workspace dep with typed imports
- Replace (build as any).addConnectionFilterOperator casts with properly typed build.addConnectionFilterOperator
…on-filter

- Update search plugin, pgvector, and postgis test files to import from
  graphile-connection-filter instead of postgraphile-plugin-connection-filter
- Use ConnectionFilterPreset() factory instead of PostGraphileConnectionFilterPreset
- Import ConnectionFilterOperatorSpec type from graphile-connection-filter
- Fix smart quote characters in filter descriptions to match existing snapshots
…ion filter tests

- Add graphile-connection-filter as devDependency in graphile-pgvector-plugin
  (test file imports ConnectionFilterPreset but package had no dependency)
- Skip connectionFilterRelations tests in search plugin (relation filters
  are intentionally not included in the v5-native plugin; they were disabled
  in production via disablePlugins with the old plugin)
…toggle

- ConnectionFilterForwardRelationsPlugin: filter by FK parent relations
- ConnectionFilterBackwardRelationsPlugin: filter by backward relations (one-to-one + one-to-many with some/every/none)
- connectionFilterRelations toggle in preset (default: false)
- Un-skip relation filter tests in search plugin
- Updated augmentations, types, and exports
… at runtime

The preset factory now always includes relation plugins in the plugin list.
Each plugin checks build.options.connectionFilterRelations at runtime and
early-returns if disabled. This allows the toggle to be set by any preset
in the chain, not just the ConnectionFilterPreset() call.
Enables relation filter fields in the production schema:
- Forward: filter by FK parent (e.g. clientByClientId on OrderFilter)
- Backward: filter by children with some/every/none
- Codegen will pick up the new filter fields automatically
- Search plugin: isPgCondition → isPgConnectionFilter scope
- BM25 plugin: isPgCondition → isPgConnectionFilter scope
- Disable PgConditionArgumentPlugin and PgConditionCustomFieldsPlugin in preset
- Update all tests from condition: {...} to filter: {...}
- Add graphile-connection-filter devDependency to BM25 plugin
- Update search plugin graceful degradation tests to use filter

BREAKING CHANGE: The condition argument has been removed entirely.
All filtering now uses the filter argument exclusively.
- Search plugin plugin.test.ts: condition → filter syntax, add ConnectionFilterPreset
- Server-test: condition → filter in query with equalTo operator
- Clear stale snapshots (schema-snapshot, introspection) for regeneration
- Search plugin: update snapshot keys to match renamed filter-based tests
- Schema snapshot: remove all condition arguments and XxxCondition input types
- Introspection snapshot: remove condition arg and UserCondition type
- Kept conditionType in _meta schema (unrelated to deprecated condition arg)
… behavior for pgCodecRelation, update schema snapshot with relation filter types
…y filter at applyPlan level

Top-level empty filter {} is now treated as 'no filter' (skipped) instead of
throwing an error. Nested empty objects in and/or/not and relation filters are
still rejected. This removes the need for the connectionFilterAllowEmptyObjectInput
workaround in pgvector tests.
- Extract shared getQueryBuilder utility into graphile-connection-filter/src/utils.ts
- Remove duplicate getQueryBuilder from search, BM25, and pgvector plugins
- Replace (build as any).dataplanPg with build.dataplanPg (already typed on Build)
- Replace (build as any).behavior with build.behavior (already typed on Build)
- Replace (build as any).input.pgRegistry with build.input.pgRegistry (already typed)
- Remove scope destructuring as any casts (pgCodec already typed on ScopeInputObject)
- Add pgCodec comment to augmentations.ts noting it's already declared by graphile-build-pg
- Export getQueryBuilder from graphile-connection-filter for satellite plugin use
Adds index safety check for relation filter fields. When enabled (default: true),
relation filter fields are only created for FKs with supporting indexes.
This prevents generating EXISTS subqueries that would cause sequential scans
on large tables.

Uses PgIndexBehaviorsPlugin's existing relation.extensions.isIndexed metadata
which is set at gather time. The check runs at schema build time with zero
runtime cost.

Applied to both forward and backward relation filter plugins.
Comprehensive test coverage using graphile-test infrastructure:
- Scalar operators: equalTo, notEqualTo, distinctFrom, isNull, in/notIn,
  lessThan, greaterThan, like, iLike, includes, startsWith, endsWith
- Logical operators: and, or, not, nested combinations
- Relation filters: forward (child->parent), backward one-to-one,
  backward one-to-many (some/every/none), exists fields
- Computed column filters
- Schema introspection: filter types, operator fields, relation fields
- Options toggles: connectionFilterRelations, connectionFilterComputedColumns,
  connectionFilterLogicalOperators, connectionFilterAllowedOperators,
  connectionFilterOperatorNames

Also adds graphile/graphile-connection-filter to CI matrix (41 jobs).
Exercises multiple plugins working together in a single test database:
- Connection filter (scalar operators, logical operators, relation filters)
- PostGIS spatial filters (geometry column)
- pgvector (vector column, search function, distance ordering)
- tsvector search plugin (fullText matches, rank, orderBy)
- BM25 search (pg_textsearch body index, score, orderBy)
- Kitchen sink queries combining multiple plugins

34 test cases across 8 describe blocks, all passing locally.
Added postgres-plus CI job for tests requiring PostGIS/pgvector/pg_textsearch.
… test

The mega query now exercises all SIX plugin types in a single filter:
- tsvector (fullTextTsv)
- BM25 (bm25Body)
- relation filter (category name)
- scalar filter (isActive)
- pgvector (vectorEmbedding nearby)
- PostGIS (geom intersects polygon bbox)

Also validates returned coordinates fall within the bounding box.
New package: graphile-pg-trgm-plugin — a PostGraphile v5 plugin for pg_trgm
trigram-based fuzzy text matching. Zero config, works on any text column.

Features:
- similarTo / wordSimilarTo filter operators on StringFilter
- trgm<Column> direct filter fields on connection filter types
- <column>Similarity computed score fields (0-1, null when inactive)
- SIMILARITY_<COLUMN>_ASC/DESC orderBy enum values
- TrgmSearchPreset for easy composition into presets
- connectionFilterTrgmRequireIndex option (default: false)
- 14 dedicated tests + integrated into mega query as 7th plugin type

Mega query now exercises ALL 7 plugin types in one GraphQL query:
tsvector + BM25 + pgvector + PostGIS + pg_trgm + relation filter + scalar
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Updated introspection and SDL snapshots to include new fields from
TrgmSearchPlugin: similarTo/wordSimilarTo operators on StringFilter,
*Similarity computed fields, trgm* filter fields, and SIMILARITY_*
orderBy enum values.
- orderBy: [BM25_BODY_SCORE_ASC, SIMILARITY_NAME_DESC] demonstrates
  multi-signal relevance ranking in a single query
- Added comprehensive JSDoc explaining all 7 plugin types, the 2-phase
  meta system, and ORDER BY priority semantics
- Inline GraphQL comments explain each filter and score field
- Assertion verifies BM25 ASC ordering (primary sort)
- Documents important subtlety: ORDER BY priority follows schema field
  processing order, not the orderBy array order
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant