Skip to content

Phase 5: in-core vector search (vector type + HNSW) + composition#37

Merged
rezabaita12 merged 1 commit into
mainfrom
claude/admiring-pasteur-so1pq3
Jun 21, 2026
Merged

Phase 5: in-core vector search (vector type + HNSW) + composition#37
rezabaita12 merged 1 commit into
mainfrom
claude/admiring-pasteur-so1pq3

Conversation

@rezabaita12

@rezabaita12 rezabaita12 commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Implements the Capabilities phase (spec 12 / roadmap §Phase 5). The one
capability built INTO the engine is vector search; better-auth/PostgREST/DuckDB
are composed AROUND it and demonstrated as glue, never welded into the core.

Built in (the headline):

  • vector(N) type — fixed-length f32, dimension declared at column time and
    validated on insert (value.rs / exec.rs::check_vector_dims).
  • HNSW access method (vector.rs) — multi-layer navigable small world,
    dependency-free. It is a DERIVED structure over the column's vectors, rebuilt
    from the WAL on open exactly like the MVCC row store — so it rides the existing
    Storage seam (STORAGE_TRAIT_VERSION stays 2). That is why branching branches the
    index and scale-to-zero re-warms it for free; no side file, no separate
    durability path.
  • Distance operators <-> (L2), <=> (cosine), <#> (inner product), usable in
    projection, WHERE, and ORDER BY.
  • Top-k nearest-neighbour answered by the index (exec.rs::knn_select): recognizes
    ORDER BY <col> <dist-op> <q> ASC LIMIT k, searches the HNSW graph,
    over-fetches and MVCC-filters; falls back to a brute-force scan + sort with no
    index.
  • SQL surface: CREATE INDEX … USING hnsw (col) WITH (m, ef_construction, ef_search, metric) (+ pgvector opclass), DROP INDEX, vector literals [..].

Composed around (placement only, no core code): better-auth-style auth state
persisted in the embedded engine, and a DuckDB-readable columnar snapshot (the
materialization job) — clients/bun/examples/{vector-memory,compose}.ts. PostgREST
attaches over server mode (Phase 3) unchanged.

ABI: ENGINE_ABI_VERSION 2 -> 3 (vector type, HNSW, distance operators, the v…
bind encoding; no C symbols added or removed). Bun EXPECTED_ABI_VERSION and
engine.h updated.

Tests: crates/engine/tests/vector.rs (type + dim validation, distance operators,
HNSW-vs-brute-force parity, WHERE-filtered/MVCC KNN, branch isolation, rebuild
from WAL, rollback tombstoning, exact-nearest over a larger set), the C-ABI
vector test in ffi.rs, and clients/bun/test/vector.test.ts. Docs: PHASE5.md +
README/CLAUDE updates.

Linked issues — Phase 5 (epic #5)

Part of epic #5.

🤖 Generated with Claude Code

https://claude.ai/code/session_01QGAZRf27jwaktS1WBiswSQ

Implements the Capabilities phase (spec 12 / roadmap §Phase 5). The one
capability built INTO the engine is vector search; better-auth/PostgREST/DuckDB
are composed AROUND it and demonstrated as glue, never welded into the core.

Built in (the headline):
- `vector(N)` type — fixed-length f32, dimension declared at column time and
  validated on insert (value.rs / exec.rs::check_vector_dims).
- HNSW access method (vector.rs) — multi-layer navigable small world, dependency
  -free. It is a DERIVED structure over the column's vectors, rebuilt from the
  WAL on open exactly like the MVCC row store — so it rides the existing
  Storage seam (STORAGE_TRAIT_VERSION stays 2). That is why branching branches
  the index and scale-to-zero re-warms it for free; no side file, no separate
  durability path.
- Distance operators `<->` (L2), `<=>` (cosine), `<#>` (inner product), usable in
  projection, WHERE, and ORDER BY.
- Top-k nearest-neighbour answered by the index (exec.rs::knn_select): recognizes
  `ORDER BY <col> <dist-op> <q> ASC LIMIT k`, searches the HNSW graph, over-fetches
  and MVCC-filters; falls back to a brute-force scan + sort with no index.
- SQL surface: `CREATE INDEX … USING hnsw (col) WITH (m, ef_construction,
  ef_search, metric)` (+ pgvector opclass), `DROP INDEX`, vector literals `[..]`.

Composed around (placement only, no core code): better-auth-style auth state
persisted in the embedded engine, and a DuckDB-readable columnar snapshot
(the materialization job) — clients/bun/examples/{vector-memory,compose}.ts.
PostgREST attaches over server mode (Phase 3) unchanged.

ABI: ENGINE_ABI_VERSION 2 -> 3 (vector type, HNSW, distance operators, the
`v…` bind encoding; no C symbols added or removed). Bun EXPECTED_ABI_VERSION
and engine.h updated.

Tests: crates/engine/tests/vector.rs (type + dim validation, distance operators,
HNSW-vs-brute-force parity, WHERE-filtered/MVCC KNN, branch isolation, rebuild
from WAL, rollback tombstoning, exact-nearest over a larger set), the C-ABI
vector test in ffi.rs, and clients/bun/test/vector.test.ts. Docs: PHASE5.md +
README/CLAUDE updates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QGAZRf27jwaktS1WBiswSQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment