feat(codegen): pluggable embedding system with --auto-embed flag [CI] by pyramation · Pull Request #924 · constructive-io/constructive

pyramation · 2026-03-28T01:25:21Z

Summary

Adds a pluggable text-to-vector embedding system for generated CLI commands. When a table has pgvector embedding fields (Vector or [Float] gqlType), the codegen now:

Generates an embedder.ts runtime module using @agentic-kit/ollama — provides resolveEmbedder() (env vars → appstash config → null) and autoEmbedWhere() to convert text values in vector where-clause fields to real embedding vectors.
Injects --auto-embed flag into list and search handlers via a new buildAutoEmbedBlock() AST builder. When passed, the CLI resolves an embedder and converts text queries to vectors before sending to the GraphQL server.
Extends --auto-embed to create and update mutations via autoEmbedInput() and buildAutoEmbedInputBlock(). When --auto-embed is passed on a create/update command, text strings in vector fields are converted to embeddings before the ORM call. This is a CLI-only convenience until server-side triggers/job queues handle it.
Generates field-specific search examples in docs (README, skills) showing --auto-embed usage for vector fields (read and write paths), replacing the previous generic search examples.
Adds CLI e2e tests — verifies error without EMBEDDER_PROVIDER, plus conditional real Ollama/nomic-embed-text integration tests.
Renames --fields to --select across all CLI codegen, templates, docs, and e2e tests for consistency with the TS SDK's select parameter naming.

Embedding is opt-in via --auto-embed — no implicit behavior or field name clobbering. Configuration via EMBEDDER_PROVIDER, EMBEDDER_MODEL, EMBEDDER_BASE_URL env vars (or appstash config equivalents).

Updates since last revision

--fields → --select rename: All CLI usage text, doc examples, runtime parseSelectFlag() (now reads argv.select), generated snapshots, and e2e tests updated for TS SDK consistency. The internal variable names (select, parseSelectFlag) were already correct — only the user-facing flag name changed.

Review & Testing Checklist for Human

Verify @agentic-kit/ollama API: embedder.ts imports OllamaClient as default export and calls client.generateEmbedding(text, model). If this doesn't match the actual @agentic-kit/ollama@1.0.3 API, every generated CLI embedder will break at runtime. This is the highest-risk item since it was written without runtime verification.
Validate { vector: query } key for embedding fields: buildSearchHandler changed the where-clause key from value to vector for embedding category fields (~line 693 of table-command-generator.ts). Confirm the graphile pgvector plugin expects vector not value as the input key for similarity search.
Verify autoEmbedInput mutation integration: The auto-embed block is injected into the mutation handler's tryBody after cleanedData is created but before the ORM call. autoEmbedInput mutates data in-place (converts string values in vector fields to embedding arrays). Confirm the generated code positions the block correctly and that mutating cleanedData before passing it to the ORM is safe with the TypeScript type assertions (as CreateXxxInput[...] / as XxxPatch).
Run codegen on a table with vector fields and inspect output: Verify the generated if (argv['auto-embed']) { ... } block appears in handleList, handleSearch, handleCreate, and handleUpdate — and that handleDelete does not include it.
Spot-check --select rename: Verify that --select id,name works in a generated CLI command (replaces the old --fields flag). The runtime reads argv.select now — confirm minimist parses --select foo into { select: 'foo' } without conflicts.

Notes

The Ollama e2e tests skip gracefully when Ollama isn't running — only the "no provider" error test runs unconditionally.
The lockfile diff is large due to @agentic-kit/ollama resolution cascading through pnpm deduplication — not a code concern but worth a glance.
autoEmbedInput is intentionally a temporary CLI-only convenience. The long-term plan is server-side triggers/job queues for embedding generation.
buildSearchExamples / buildSearchExamplesMarkdown in docs-utils.ts now include create/update examples alongside the existing search examples.
The --fields → --select rename was done for consistency with the TS SDK's select parameter. All 7 affected snapshots were regenerated and pass.

Link to Devin session: https://app.devin.ai/sessions/c92c3a11450342f8875625a60fa1be28
Requested by: @pyramation

Tests generated CLI commands (codegen → transpile → execute) against a running PostgreSQL database with a real GraphQL server. - Uses ts.transpileModule to strip types without resolving imports - Uses async spawn (not execFileSync) to keep event loop unblocked - Sets up appstash context pointing at test server endpoint - Resolves NODE_PATH for pnpm's strict module isolation 5 focused corner-case tests: 1. Paginated list with --where (dot-notation) + --fields 2. Cursor-based forward pagination (--after) 3. find-first with --where.name.equalTo 4. Combined --where + --orderBy + --fields 5. Empty result set handling

…SE_DIR, bump appstash 0.7.0 - Type buildAnimalsTable() with proper Table interface from codegen - Replace internal path.join require hacks with proper package exports - Use APPSTASH_BASE_DIR env var instead of overriding HOME - Bump appstash to ^0.7.0 for APPSTASH_BASE_DIR support - Fix cliEntryPoint -> entryPoint config property

Suite 2 — Search CLI (6 tests): - tsvector search via --where.tsvTsv (dot-notation passthrough) - trgm fuzzy matching via --where.trgmTitle - composite fullTextSearch filter - search + pagination (--limit) - pgvector similarity (conditional, skip if unavailable) - _meta query from live server (MetaSchemaPlugin verification) Uses search-seed fixture (5 articles with tsvector, pg_trgm, optional pgvector). All search tests use list --where dot-notation to pass filter field names directly to the server, testing the full pipeline: codegen -> transpile -> spawn child process -> ORM findMany -> GraphQL -> real PostgreSQL.

- pgvector test: vector arrays can't be passed via CLI dot-notation (they become strings, not JSON arrays). Changed test to verify the CLI reports a clear GraphQL type error rather than crashing silently. - _meta test: replaced with schema introspection test since the search-seed server doesn't load MetaSchemaPlugin (enableServicesApi is false). New test verifies Article type exposes expected search fields (tsvRank, titleTrgmSimilarity, bodyTrgmSimilarity, searchScore, and conditionally pgvector fields).

The CLI exits with code 0 even on GraphQL errors, returning { ok: false, errors: [...] }. Updated the test to check the response content instead of expecting the promise to reject.

…eral for runner script - Moved runCli() from inline per-suite to a shared module-level function that takes (distDir, tmpHome, ...args) — eliminates 60 lines of duplication - Replaced string[] array joined with newlines with a readable template literal (RUNNER_SCRIPT constant) - Fixed duplicate JSDoc comment on setupAppstashContext - Removed stale Suite 3 reference from file header

When a table has search-capable fields (tsvector, trgm, BM25, pgvector), the generated README and skill references now include concrete CLI examples showing the exact dot-notation flags for each search type: - tsvector: --where.<field> "query" - trgm: --where.trgm<Base>.value "query" --where.trgm<Base>.threshold 0.3 - BM25: --where.bm25<Base>.query "query" - pgvector: --where.<field>.vector '[...]' --where.<field>.distance 1.0 - composite: --where.fullTextSearch "query" Also adds a combined search + pagination example. Field-name derivation mirrors buildSearchHandler so examples always match the generated code. Integrated into all four generators: single-target README, single-target skills, multi-target README, and multi-target skills.

Adds a pluggable text-to-vector embedding system for CLI search commands: - New embedder.ts template using @agentic-kit/ollama for Ollama provider - resolveEmbedder() resolves from env vars or appstash config - autoEmbedWhere() converts text values to vector embeddings in where clauses - --auto-embed flag on list and search commands for tables with vector fields - Embedder module conditionally generated when tables have embedding fields - Updated docs-generator with --auto-embed examples for vector fields - CLI e2e tests for embedder: error without provider, real Ollama integration

devin-ai-integration · 2026-03-28T01:25:23Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

… config Replace old grant_roles/grant_privileges/policy_* fields with unified: - grants[]: array of { roles: string[], privileges: unknown[] } objects - policies[]: array of { $type, data, privileges, policy_role, permissive } objects Updated files: - generate-types.ts: BlueprintEntityTableProvision + BlueprintTable - relation-many-to-many.ts: parameter schema for junction table - export-utils.ts: secure_table_provision column types - export.test.ts: test data + snapshots Companion to constructive-db PR #929 (grants[]) and PR #924 (policies[]).

pyramation added 10 commits March 27, 2026 13:17

Merge branch 'main' into devin/1774614615-cli-e2e-tests

57a6366

Merge branch 'main' into devin/1774614615-cli-e2e-tests

f74576a

fix(server-test): pgvector test checks ok:false instead of rejection

a4ea714

The CLI exits with code 0 even on GraphQL errors, returning { ok: false, errors: [...] }. Updated the test to check the response content instead of expecting the promise to reject.

devin-ai-integration Bot assigned pyramation Mar 28, 2026

pyramation added 2 commits March 28, 2026 01:34

feat(codegen): add --auto-embed support for create/update mutations

ceba37c

refactor(codegen): rename --fields to --select for TS SDK consistency

6059956

pyramation merged commit aa11b42 into main Mar 28, 2026
44 checks passed

pyramation deleted the devin/1774660034-pluggable-embeddings branch March 28, 2026 02:02

pyramation mentioned this pull request Apr 20, 2026

feat!: unify grants[] and policies[] in node type registry and export config #1016

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(codegen): pluggable embedding system with --auto-embed flag [CI]#924

feat(codegen): pluggable embedding system with --auto-embed flag [CI]#924
pyramation merged 12 commits intomainfrom
devin/1774660034-pluggable-embeddings

pyramation commented Mar 28, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pyramation commented Mar 28, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Updates since last revision

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration Bot commented Mar 28, 2026

🤖 Devin AI Engineer

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pyramation commented Mar 28, 2026 •

edited by devin-ai-integration Bot

Loading