Skip to content

refactor(kvs/lance): unify arrow type tree (drop lance::deps workaround)#4

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/phase-2-multibucket
May 16, 2026
Merged

refactor(kvs/lance): unify arrow type tree (drop lance::deps workaround)#4
AdaWorldAPI merged 1 commit into
mainfrom
claude/phase-2-multibucket

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Cleanup: drop the lance::deps::arrow_* re-export workaround introduced during the lance-1.0.4 era and unify on plain arrow_array / arrow_schema imports. Resolves the "Arrow type-tree split (workaround in place)" entry in .claude/lance-backend/KNOWN_DIFFERENCES.md.

Why now

Sprint L (#1) pinned lance = "4.0" and arrow-array = arrow-schema = "57". Verified via Cargo.lock + lance-4.0.0 source:

  • lance 4.0 internally uses arrow-array/schema = "57.0.0"
  • Our pin resolves to 57.3.1 (semver-compatible with 57.0.0)
  • Both are the same crate at the same major version → same TypeIds → fully interoperable at the Rust type level

The workaround was needed when:

  • Our pin: arrow = "55"
  • lance 1.0.4 internally: arrow = "56"
  • TypeIds distinct → couldn't pass our types to lance's APIs without conversion → routed through lance::deps::arrow_* re-exports (which gave us v56 types)

After Sprint L's bump, both are v57. The indirection is dead code.

What changes (one file, six sites)

surrealdb/core/src/kvs/lance/mod.rs:

Site Before After
Datastore::new (empty RecordBatch builder) lance::deps::arrow_schema::Schema::new(...) + lance::deps::arrow_array::RecordBatchIterator::new(...) arrow_schema::Schema::new(...) + arrow_array::RecordBatchIterator::new(...)
Transaction::commit (merge_insert source) lance::deps::arrow_array::RecordBatchIterator::new(...) arrow_array::RecordBatchIterator::new(...)
Transaction::get (val downcast) .downcast_ref::<lance::deps::arrow_array::BinaryArray>() .downcast_ref::<arrow_array::BinaryArray>()
Transaction::build_write_batch_lance (return + body imports) lance::deps::arrow_array::* + lance::deps::arrow_schema::* arrow_array::* + arrow_schema::*
Transaction::scan_impl (2× downcasts for key + val) same as get same as get

The doc-comment on build_write_batch_lance is updated to record the unification; the remaining mentions of lance::deps are commentary about the prior workaround for future readers.

Risk

Minimal. lance::deps::arrow_array::T and arrow_array::T are byte-identical at the binary level (same crate, same version). The change is local to import paths; no semantics shift.

Test plan

  • CI runs cargo check --features kv-lance --no-default-features — expect 0 errors (target was cleaned locally before this commit; trusting CI for the full rebuild + test sweep per maintainer's call).
  • CI runs cargo test --features "kv-lance kv-mem" --no-default-features --lib kvs::lance — expect the existing 50+ kv-lance tests still pass (no semantic change).
  • CI default build untouched (no changes to feature gates or feature-default surfaces).

Followups

The remaining items in KNOWN_DIFFERENCES.md after this PR:

  • Mixed writes+deletes commit still non-atomic (Dataset::delete separate from MergeInsertBuilder).
  • ScanLimit::Bytes proper byte accounting.
  • Concurrent-transaction property test.
  • Run wider upstream integration suite under SURREAL_TEST_KV=lance.
  • Multi-bucket BindSpace sharding (Phase 2 — kv-lance throughput).
  • lance-graph Cypher engine as SurrealQL function (Phase 3).
  • blasgraph GraphBLAS algebra (Phase 3).
  • Benchmarks vs RocksDB / SurrealKv.

The vector-index SIMD work from #2 + #3 (cosine, euclidean, manhattan, chebyshev, pearson via ndarray-hpc) is independent and already merged.


Generated by Claude Code

…ps workaround)

Sprint L pinned lance = "4.0" + arrow-array = "57" / arrow-schema = "57".
Lance 4.0 internally uses arrow 57.0.0, which resolves to the same
57.3.1 as our pin. The two are the SAME crate-version at the Rust
type level — same TypeIds, fully interoperable.

Previously (lance 1.0.4 era), our pin was arrow = "55" while lance
used v56 internally — distinct TypeIds, so we couldn't pass our arrow
types to lance's APIs without conversion. The workaround was to route
six sites through `lance::deps::arrow_*` re-exports.

That workaround is no longer needed. Sprint R drops it:

1. Datastore::new — empty RecordBatchIterator construction.
2. Transaction::commit — RecordBatchIterator for merge_insert source.
3. Transaction::get — BinaryArray downcast for val column extraction.
4. Transaction::build_write_batch_lance — RecordBatch + Schema + arrays.
5,6. Transaction::scan_impl — 2× BinaryArray downcasts.

All 6 now use plain `arrow_array::*` / `arrow_schema::*` imports
(both are already kv-lance feature deps in Cargo.toml). The
doc-comment on build_write_batch_lance is updated to record the
unification; references to lance::deps survive only as commentary
about the prior workaround for future readers.

Risk: very low. The types are byte-identical at the binary level
(same crate, same version); the type-system change is local.

Resolves the 'Arrow type-tree split' deferred item in
.claude/lance-backend/KNOWN_DIFFERENCES.md.
@AdaWorldAPI AdaWorldAPI marked this pull request as ready for review May 16, 2026 01:31
@AdaWorldAPI AdaWorldAPI merged commit 3170f03 into main May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants