refactor(kvs/lance): unify arrow type tree (drop lance::deps workaround)#4
Merged
Merged
Conversation
…ps workaround) Sprint L pinned lance = "4.0" + arrow-array = "57" / arrow-schema = "57". Lance 4.0 internally uses arrow 57.0.0, which resolves to the same 57.3.1 as our pin. The two are the SAME crate-version at the Rust type level — same TypeIds, fully interoperable. Previously (lance 1.0.4 era), our pin was arrow = "55" while lance used v56 internally — distinct TypeIds, so we couldn't pass our arrow types to lance's APIs without conversion. The workaround was to route six sites through `lance::deps::arrow_*` re-exports. That workaround is no longer needed. Sprint R drops it: 1. Datastore::new — empty RecordBatchIterator construction. 2. Transaction::commit — RecordBatchIterator for merge_insert source. 3. Transaction::get — BinaryArray downcast for val column extraction. 4. Transaction::build_write_batch_lance — RecordBatch + Schema + arrays. 5,6. Transaction::scan_impl — 2× BinaryArray downcasts. All 6 now use plain `arrow_array::*` / `arrow_schema::*` imports (both are already kv-lance feature deps in Cargo.toml). The doc-comment on build_write_batch_lance is updated to record the unification; references to lance::deps survive only as commentary about the prior workaround for future readers. Risk: very low. The types are byte-identical at the binary level (same crate, same version); the type-system change is local. Resolves the 'Arrow type-tree split' deferred item in .claude/lance-backend/KNOWN_DIFFERENCES.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cleanup: drop the
lance::deps::arrow_*re-export workaround introduced during the lance-1.0.4 era and unify on plainarrow_array/arrow_schemaimports. Resolves the "Arrow type-tree split (workaround in place)" entry in.claude/lance-backend/KNOWN_DIFFERENCES.md.Why now
Sprint L (#1) pinned
lance = "4.0"andarrow-array = arrow-schema = "57". Verified viaCargo.lock+ lance-4.0.0 source:TypeIds → fully interoperable at the Rust type levelThe workaround was needed when:
lance::deps::arrow_*re-exports (which gave us v56 types)After Sprint L's bump, both are v57. The indirection is dead code.
What changes (one file, six sites)
surrealdb/core/src/kvs/lance/mod.rs:Datastore::new(empty RecordBatch builder)lance::deps::arrow_schema::Schema::new(...)+lance::deps::arrow_array::RecordBatchIterator::new(...)arrow_schema::Schema::new(...)+arrow_array::RecordBatchIterator::new(...)Transaction::commit(merge_insert source)lance::deps::arrow_array::RecordBatchIterator::new(...)arrow_array::RecordBatchIterator::new(...)Transaction::get(val downcast).downcast_ref::<lance::deps::arrow_array::BinaryArray>().downcast_ref::<arrow_array::BinaryArray>()Transaction::build_write_batch_lance(return + body imports)lance::deps::arrow_array::*+lance::deps::arrow_schema::*arrow_array::*+arrow_schema::*Transaction::scan_impl(2× downcasts for key + val)getgetThe doc-comment on
build_write_batch_lanceis updated to record the unification; the remaining mentions oflance::depsare commentary about the prior workaround for future readers.Risk
Minimal.
lance::deps::arrow_array::Tandarrow_array::Tare byte-identical at the binary level (same crate, same version). The change is local to import paths; no semantics shift.Test plan
cargo check --features kv-lance --no-default-features— expect 0 errors (target was cleaned locally before this commit; trusting CI for the full rebuild + test sweep per maintainer's call).cargo test --features "kv-lance kv-mem" --no-default-features --lib kvs::lance— expect the existing 50+ kv-lance tests still pass (no semantic change).Followups
The remaining items in
KNOWN_DIFFERENCES.mdafter this PR:Dataset::deleteseparate fromMergeInsertBuilder).ScanLimit::Bytesproper byte accounting.SURREAL_TEST_KV=lance.The vector-index SIMD work from #2 + #3 (cosine, euclidean, manhattan, chebyshev, pearson via
ndarray-hpc) is independent and already merged.Generated by Claude Code