Skip to content

feat(codegen): pluggable embedding system with --auto-embed flag [CI]#924

Merged
pyramation merged 12 commits intomainfrom
devin/1774660034-pluggable-embeddings
Mar 28, 2026
Merged

feat(codegen): pluggable embedding system with --auto-embed flag [CI]#924
pyramation merged 12 commits intomainfrom
devin/1774660034-pluggable-embeddings

Conversation

@pyramation
Copy link
Copy Markdown
Contributor

@pyramation pyramation commented Mar 28, 2026

Summary

Adds a pluggable text-to-vector embedding system for generated CLI commands. When a table has pgvector embedding fields (Vector or [Float] gqlType), the codegen now:

  1. Generates an embedder.ts runtime module using @agentic-kit/ollama — provides resolveEmbedder() (env vars → appstash config → null) and autoEmbedWhere() to convert text values in vector where-clause fields to real embedding vectors.

  2. Injects --auto-embed flag into list and search handlers via a new buildAutoEmbedBlock() AST builder. When passed, the CLI resolves an embedder and converts text queries to vectors before sending to the GraphQL server.

  3. Extends --auto-embed to create and update mutations via autoEmbedInput() and buildAutoEmbedInputBlock(). When --auto-embed is passed on a create/update command, text strings in vector fields are converted to embeddings before the ORM call. This is a CLI-only convenience until server-side triggers/job queues handle it.

  4. Generates field-specific search examples in docs (README, skills) showing --auto-embed usage for vector fields (read and write paths), replacing the previous generic search examples.

  5. Adds CLI e2e tests — verifies error without EMBEDDER_PROVIDER, plus conditional real Ollama/nomic-embed-text integration tests.

  6. Renames --fields to --select across all CLI codegen, templates, docs, and e2e tests for consistency with the TS SDK's select parameter naming.

Embedding is opt-in via --auto-embed — no implicit behavior or field name clobbering. Configuration via EMBEDDER_PROVIDER, EMBEDDER_MODEL, EMBEDDER_BASE_URL env vars (or appstash config equivalents).

Updates since last revision

  • --fields--select rename: All CLI usage text, doc examples, runtime parseSelectFlag() (now reads argv.select), generated snapshots, and e2e tests updated for TS SDK consistency. The internal variable names (select, parseSelectFlag) were already correct — only the user-facing flag name changed.

Review & Testing Checklist for Human

  • Verify @agentic-kit/ollama API: embedder.ts imports OllamaClient as default export and calls client.generateEmbedding(text, model). If this doesn't match the actual @agentic-kit/ollama@1.0.3 API, every generated CLI embedder will break at runtime. This is the highest-risk item since it was written without runtime verification.
  • Validate { vector: query } key for embedding fields: buildSearchHandler changed the where-clause key from value to vector for embedding category fields (~line 693 of table-command-generator.ts). Confirm the graphile pgvector plugin expects vector not value as the input key for similarity search.
  • Verify autoEmbedInput mutation integration: The auto-embed block is injected into the mutation handler's tryBody after cleanedData is created but before the ORM call. autoEmbedInput mutates data in-place (converts string values in vector fields to embedding arrays). Confirm the generated code positions the block correctly and that mutating cleanedData before passing it to the ORM is safe with the TypeScript type assertions (as CreateXxxInput[...] / as XxxPatch).
  • Run codegen on a table with vector fields and inspect output: Verify the generated if (argv['auto-embed']) { ... } block appears in handleList, handleSearch, handleCreate, and handleUpdate — and that handleDelete does not include it.
  • Spot-check --select rename: Verify that --select id,name works in a generated CLI command (replaces the old --fields flag). The runtime reads argv.select now — confirm minimist parses --select foo into { select: 'foo' } without conflicts.

Notes

  • The Ollama e2e tests skip gracefully when Ollama isn't running — only the "no provider" error test runs unconditionally.
  • The lockfile diff is large due to @agentic-kit/ollama resolution cascading through pnpm deduplication — not a code concern but worth a glance.
  • autoEmbedInput is intentionally a temporary CLI-only convenience. The long-term plan is server-side triggers/job queues for embedding generation.
  • buildSearchExamples / buildSearchExamplesMarkdown in docs-utils.ts now include create/update examples alongside the existing search examples.
  • The --fields--select rename was done for consistency with the TS SDK's select parameter. All 7 affected snapshots were regenerated and pass.

Link to Devin session: https://app.devin.ai/sessions/c92c3a11450342f8875625a60fa1be28
Requested by: @pyramation

Tests generated CLI commands (codegen → transpile → execute) against a
running PostgreSQL database with a real GraphQL server.

- Uses ts.transpileModule to strip types without resolving imports
- Uses async spawn (not execFileSync) to keep event loop unblocked
- Sets up appstash context pointing at test server endpoint
- Resolves NODE_PATH for pnpm's strict module isolation

5 focused corner-case tests:
1. Paginated list with --where (dot-notation) + --fields
2. Cursor-based forward pagination (--after)
3. find-first with --where.name.equalTo
4. Combined --where + --orderBy + --fields
5. Empty result set handling
…SE_DIR, bump appstash 0.7.0

- Type buildAnimalsTable() with proper Table interface from codegen
- Replace internal path.join require hacks with proper package exports
- Use APPSTASH_BASE_DIR env var instead of overriding HOME
- Bump appstash to ^0.7.0 for APPSTASH_BASE_DIR support
- Fix cliEntryPoint -> entryPoint config property
Suite 2 — Search CLI (6 tests):
- tsvector search via --where.tsvTsv (dot-notation passthrough)
- trgm fuzzy matching via --where.trgmTitle
- composite fullTextSearch filter
- search + pagination (--limit)
- pgvector similarity (conditional, skip if unavailable)
- _meta query from live server (MetaSchemaPlugin verification)

Uses search-seed fixture (5 articles with tsvector, pg_trgm, optional pgvector).
All search tests use list --where dot-notation to pass filter field names
directly to the server, testing the full pipeline: codegen -> transpile ->
spawn child process -> ORM findMany -> GraphQL -> real PostgreSQL.
- pgvector test: vector arrays can't be passed via CLI dot-notation
  (they become strings, not JSON arrays). Changed test to verify the
  CLI reports a clear GraphQL type error rather than crashing silently.
- _meta test: replaced with schema introspection test since the
  search-seed server doesn't load MetaSchemaPlugin (enableServicesApi
  is false). New test verifies Article type exposes expected search
  fields (tsvRank, titleTrgmSimilarity, bodyTrgmSimilarity, searchScore,
  and conditionally pgvector fields).
The CLI exits with code 0 even on GraphQL errors, returning
{ ok: false, errors: [...] }. Updated the test to check the
response content instead of expecting the promise to reject.
…eral for runner script

- Moved runCli() from inline per-suite to a shared module-level function
  that takes (distDir, tmpHome, ...args) — eliminates 60 lines of duplication
- Replaced string[] array joined with newlines with a readable template
  literal (RUNNER_SCRIPT constant)
- Fixed duplicate JSDoc comment on setupAppstashContext
- Removed stale Suite 3 reference from file header
When a table has search-capable fields (tsvector, trgm, BM25, pgvector),
the generated README and skill references now include concrete CLI
examples showing the exact dot-notation flags for each search type:

- tsvector:  --where.<field> "query"
- trgm:     --where.trgm<Base>.value "query" --where.trgm<Base>.threshold 0.3
- BM25:     --where.bm25<Base>.query "query"
- pgvector: --where.<field>.vector '[...]' --where.<field>.distance 1.0
- composite: --where.fullTextSearch "query"

Also adds a combined search + pagination example. Field-name derivation
mirrors buildSearchHandler so examples always match the generated code.

Integrated into all four generators: single-target README, single-target
skills, multi-target README, and multi-target skills.
Adds a pluggable text-to-vector embedding system for CLI search commands:

- New embedder.ts template using @agentic-kit/ollama for Ollama provider
- resolveEmbedder() resolves from env vars or appstash config
- autoEmbedWhere() converts text values to vector embeddings in where clauses
- --auto-embed flag on list and search commands for tables with vector fields
- Embedder module conditionally generated when tables have embedding fields
- Updated docs-generator with --auto-embed examples for vector fields
- CLI e2e tests for embedder: error without provider, real Ollama integration
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@pyramation pyramation merged commit aa11b42 into main Mar 28, 2026
44 checks passed
@pyramation pyramation deleted the devin/1774660034-pluggable-embeddings branch March 28, 2026 02:02
devin-ai-integration Bot pushed a commit that referenced this pull request Apr 20, 2026
… config

Replace old grant_roles/grant_privileges/policy_* fields with unified:
- grants[]: array of { roles: string[], privileges: unknown[] } objects
- policies[]: array of { $type, data, privileges, policy_role, permissive } objects

Updated files:
- generate-types.ts: BlueprintEntityTableProvision + BlueprintTable
- relation-many-to-many.ts: parameter schema for junction table
- export-utils.ts: secure_table_provision column types
- export.test.ts: test data + snapshots

Companion to constructive-db PR #929 (grants[]) and PR #924 (policies[]).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant