Skip to content

feat(graphile-pgvector-plugin): auto-discover vector columns; remove old postgraphile-plugin-pgvector#768

Merged
pyramation merged 5 commits intomainfrom
devin/1772455386-pgvector-auto-discover
Mar 3, 2026
Merged

feat(graphile-pgvector-plugin): auto-discover vector columns; remove old postgraphile-plugin-pgvector#768
pyramation merged 5 commits intomainfrom
devin/1772455386-pgvector-auto-discover

Conversation

@pyramation
Copy link
Copy Markdown
Contributor

@pyramation pyramation commented Mar 2, 2026

feat(graphile-pgvector-plugin): auto-discover vector columns; remove old postgraphile-plugin-pgvector

Summary

Adds a new VectorSearchPlugin to graphile-pgvector-plugin that auto-discovers all vector columns across all tables — no per-table configuration required. This follows the same patterns as graphile-search-plugin (for tsvector columns).

Previously, using pgvector similarity search through PostGraphile required either:

  • Manually configuring collections in postgraphile-plugin-pgvector (the old plugin)
  • Doing cosine similarity math client-side in JavaScript after fetching all embeddings

Now, any table with a vector(n) column automatically gets:

  1. vectorEmbedding condition field on connection condition inputs — accepts { vector, metric?, distance? } to filter by distance threshold
  2. embeddingDistance computed field on output types — returns the distance when a nearby condition is active (null otherwise)
  3. EMBEDDING_DISTANCE_ASC/DESC orderBy enum values — sort results by vector distance

The plugin is wired into ConstructivePreset via graphile-settings so it's active by default.

The old postgraphile-plugin-pgvector package (which required manual per-collection config) has been deleted entirely, along with its CI job. It was fully superseded by the auto-discover approach — nothing unique to preserve.

Files changed

  • graphile/graphile-pgvector-plugin/src/vector-search.tsNew: the main plugin (~480 lines)
  • graphile/graphile-pgvector-plugin/src/types.tsNew: VectorSearchPluginOptions, VectorMetric types
  • graphile/graphile-pgvector-plugin/src/index.ts — Updated exports
  • graphile/graphile-pgvector-plugin/src/vector-codec.ts — Minor comment addition
  • graphile/graphile-pgvector-plugin/package.json — Added @dataplan/pg peer dep, optional postgraphile-plugin-connection-filter
  • graphile/graphile-settings/src/presets/constructive-preset.ts — Wires VectorSearchPlugin into ConstructivePreset
  • graphile/graphile-settings/src/plugins/index.ts — Re-exports new plugin + types
  • graphile/graphile-pgvector-plugin/src/__tests__/vector-search.test.tsNew: integration tests (~370 lines)
  • graphile/postgraphile-plugin-pgvector/Deleted: entire old plugin package
  • .github/workflows/run-tests.yaml — Removed old plugin CI job
  • GRAPHILE.md — Updated plugin listing

Updates since last revision

  • Deleted postgraphile-plugin-pgvector — The old plugin required manual per-collection config (collections: [{ schema, table, embeddingColumn }]), which is the opposite of the auto-discover approach. It was fully superseded by graphile-pgvector-plugin and had no unique code worth preserving. Removed from CI workflow and docs as well.
  • Fixed duplicate VectorMetric enum type error — The VectorMetric enum was being registered twice: once inline via new GraphQLEnumType() inside the VectorNearbyInput fields closure, and again via build.registerEnumType(). Fix: register the enum type first, then reference it via build.getTypeByName('VectorMetric').
  • Removed global closeTo filter operator — The closeTo operator was a placeholder that modified the global Vector filter type (hardcoded distance <= 1.0). Removed entirely; the vectorEmbedding condition field provides full control over metric and threshold.
  • Merged main to resolve conflicts with the recently merged BM25 plugin (PR feat(graphile-pg-textsearch-plugin): auto-discover BM25 indexes with condition, score, and orderBy #769). Both plugins are now included in ConstructivePreset.
  • CI is green: 41/41 jobs passing.

Review & Testing Checklist for Human

⚠️ High Risk — 4 critical items to verify:

  • Integration tests pass in CI ✅ — All 60 tests in graphile-pgvector-plugin passing, including new vector-search integration tests against live PostgreSQL 17 + pgvector
  • Verify build.registerInputObjectType and build.registerEnumType APIs — The plugin uses these in the init hook. CI passing suggests they work, but double-check these are stable PostGraphile v5 RC APIs and won't break in future RC updates.
  • Test with real application data — CI tests use synthetic 3D vectors. Test with actual embeddings (e.g., 1536D from OpenAI) and realistic queries to verify WeakMap distance tracking works correctly at scale.
  • Review any type assertions — The plugin uses any type assertions throughout (e.g., const $parent = $condition.dangerouslyGetParent();). These access internal PostGraphile APIs that may not be stable. Verify these work with real schema builds or consider adding stronger typing.

Test Plan

  1. CI Tests: ✅ Done — integration tests pass against PostgreSQL 17 + pgvector
  2. Local Manual Test (recommended):
    • Create a test table with a vector(1536) column and some OpenAI embeddings
    • Start the GraphQL server with ConstructivePreset
    • Query with allItems(condition: { vectorEmbedding: { vector: [...], metric: COSINE, distance: 0.5 } }) { nodes { id embeddingDistance } }
    • Verify the distance field returns non-null numbers and filtering works correctly
    • Try ordering: allItems(... orderBy: EMBEDDING_DISTANCE_ASC)
    • Verify results are sorted by distance (closest first)
  3. Schema introspection: Run cnc get-graphql-schema and verify that VectorNearbyInput, VectorMetric enum, vectorEmbedding condition field, embeddingDistance output field, and EMBEDDING_DISTANCE_ASC/DESC orderBy values are present without duplicate type errors

Known Limitations

  • Tests use 3D vectors; real-world embeddings (1536D from OpenAI, 768D from sentence transformers, etc.) are untested in this PR
  • The WeakMap-based distance tracking relies on internal PostGraphile step APIs (selectAndReturnIndex, getMetaRaw, dangerouslyGetParent) that may not be stable across RC versions
  • Combining vectorEmbedding condition with standard column conditions in complex ways (e.g., nested and/or) may produce unexpected results — use separate filters if needed

Notes


Open with Devin

…dition, distance, orderBy, and filter

Adds VectorSearchPlugin that auto-discovers all vector columns across all tables
and adds the following capabilities with zero configuration:

- Condition fields: vectorEmbedding nearby condition on connections
  (accepts query vector, metric, optional distance threshold)
- Computed distance fields: embeddingDistance on output types
  (returns distance when nearby condition is active, null otherwise)
- OrderBy enum values: EMBEDDING_DISTANCE_ASC/DESC for sorting by distance
- Connection filter operator: closeTo for Vector scalar

Follows the same patterns as graphile-search-plugin (for tsvector columns).
Wired into ConstructivePreset via graphile-settings.
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@devin-ai-integration devin-ai-integration Bot changed the title feat(graphile-pgvector-plugin): auto-discover vector columns with condition, distance, orderBy, and filter feat(graphile-pgvector-plugin): auto-discover vector columns with condition, distance, and orderBy Mar 2, 2026
@devin-ai-integration devin-ai-integration Bot changed the title feat(graphile-pgvector-plugin): auto-discover vector columns with condition, distance, and orderBy feat(graphile-pgvector-plugin): auto-discover vector columns; remove old postgraphile-plugin-pgvector Mar 2, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

@pyramation pyramation merged commit 1506166 into main Mar 3, 2026
43 checks passed
@pyramation pyramation deleted the devin/1772455386-pgvector-auto-discover branch March 3, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant