Phase 5: in-core vector search (vector type + HNSW) + composition#37
Merged
Conversation
Implements the Capabilities phase (spec 12 / roadmap §Phase 5). The one
capability built INTO the engine is vector search; better-auth/PostgREST/DuckDB
are composed AROUND it and demonstrated as glue, never welded into the core.
Built in (the headline):
- `vector(N)` type — fixed-length f32, dimension declared at column time and
validated on insert (value.rs / exec.rs::check_vector_dims).
- HNSW access method (vector.rs) — multi-layer navigable small world, dependency
-free. It is a DERIVED structure over the column's vectors, rebuilt from the
WAL on open exactly like the MVCC row store — so it rides the existing
Storage seam (STORAGE_TRAIT_VERSION stays 2). That is why branching branches
the index and scale-to-zero re-warms it for free; no side file, no separate
durability path.
- Distance operators `<->` (L2), `<=>` (cosine), `<#>` (inner product), usable in
projection, WHERE, and ORDER BY.
- Top-k nearest-neighbour answered by the index (exec.rs::knn_select): recognizes
`ORDER BY <col> <dist-op> <q> ASC LIMIT k`, searches the HNSW graph, over-fetches
and MVCC-filters; falls back to a brute-force scan + sort with no index.
- SQL surface: `CREATE INDEX … USING hnsw (col) WITH (m, ef_construction,
ef_search, metric)` (+ pgvector opclass), `DROP INDEX`, vector literals `[..]`.
Composed around (placement only, no core code): better-auth-style auth state
persisted in the embedded engine, and a DuckDB-readable columnar snapshot
(the materialization job) — clients/bun/examples/{vector-memory,compose}.ts.
PostgREST attaches over server mode (Phase 3) unchanged.
ABI: ENGINE_ABI_VERSION 2 -> 3 (vector type, HNSW, distance operators, the
`v…` bind encoding; no C symbols added or removed). Bun EXPECTED_ABI_VERSION
and engine.h updated.
Tests: crates/engine/tests/vector.rs (type + dim validation, distance operators,
HNSW-vs-brute-force parity, WHERE-filtered/MVCC KNN, branch isolation, rebuild
from WAL, rollback tombstoning, exact-nearest over a larger set), the C-ABI
vector test in ffi.rs, and clients/bun/test/vector.test.ts. Docs: PHASE5.md +
README/CLAUDE updates.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QGAZRf27jwaktS1WBiswSQ
This was
linked to
issues
Jun 21, 2026
This was
unlinked from
issues
Jun 21, 2026
This was
linked to
issues
Jun 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the Capabilities phase (spec 12 / roadmap §Phase 5). The one
capability built INTO the engine is vector search; better-auth/PostgREST/DuckDB
are composed AROUND it and demonstrated as glue, never welded into the core.
Built in (the headline):
vector(N)type — fixed-length f32, dimension declared at column time andvalidated on insert (value.rs / exec.rs::check_vector_dims).
dependency-free. It is a DERIVED structure over the column's vectors, rebuilt
from the WAL on open exactly like the MVCC row store — so it rides the existing
Storage seam (STORAGE_TRAIT_VERSION stays 2). That is why branching branches the
index and scale-to-zero re-warms it for free; no side file, no separate
durability path.
<->(L2),<=>(cosine),<#>(inner product), usable inprojection, WHERE, and ORDER BY.
ORDER BY <col> <dist-op> <q> ASC LIMIT k, searches the HNSW graph,over-fetches and MVCC-filters; falls back to a brute-force scan + sort with no
index.
CREATE INDEX … USING hnsw (col) WITH (m, ef_construction, ef_search, metric)(+ pgvector opclass),DROP INDEX, vector literals[..].Composed around (placement only, no core code): better-auth-style auth state
persisted in the embedded engine, and a DuckDB-readable columnar snapshot (the
materialization job) — clients/bun/examples/{vector-memory,compose}.ts. PostgREST
attaches over server mode (Phase 3) unchanged.
ABI: ENGINE_ABI_VERSION 2 -> 3 (vector type, HNSW, distance operators, the
v…bind encoding; no C symbols added or removed). Bun EXPECTED_ABI_VERSION and
engine.h updated.
Tests: crates/engine/tests/vector.rs (type + dim validation, distance operators,
HNSW-vs-brute-force parity, WHERE-filtered/MVCC KNN, branch isolation, rebuild
from WAL, rollback tombstoning, exact-nearest over a larger set), the C-ABI
vector test in ffi.rs, and clients/bun/test/vector.test.ts. Docs: PHASE5.md +
README/CLAUDE updates.
Linked issues — Phase 5 (epic #5)
Part of epic #5.
Storagetrait). Fully implemented and tested: branching branches the index, and
scale-to-zero re-warms it from the WAL.
state persisted in the embedded engine, branching with the DB) in
clients/bun/examples/compose.ts. The better-auth library itself is adoptedunmodified, not vendored — referenced here, not auto-closed.
with no engine changes; documented in
docs/PHASE5.md. No PostgREST process isshipped in-repo — referenced, not auto-closed.
DuckDB-readable columnar snapshot, published atomically) ships in
clients/bun/examples/compose.ts; DuckDB and Parquet/Iceberg are adoptedunmodified — referenced, not auto-closed.
🤖 Generated with Claude Code
https://claude.ai/code/session_01QGAZRf27jwaktS1WBiswSQ