GH-1142: Add VectorOps utility and fix addVector/removeVector with safe shared semantics#1149
Open
John-W-Lewis wants to merge 2 commits into
Open
GH-1142: Add VectorOps utility and fix addVector/removeVector with safe shared semantics#1149John-W-Lewis wants to merge 2 commits into
John-W-Lewis wants to merge 2 commits into
Conversation
Provides shareCopy (shared memory), transferCopy (move ownership), and deepCopy (independent clone) for both FieldVector and VectorSchemaRoot, implemented purely via getFieldBuffers/loadFieldBuffers without depending on TransferPair.
…semantics Use the new VectorOps.shareCopy to fix the unsafe shared-reference bug in addVector/removeVector while preserving the original intended semantics: both source and result roots remain usable with the same data, and memory is only released when all sharing roots are closed. Also applies Spotless formatting fixes. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Thank you for opening a pull request! Please label the PR with one or more of:
Also, add the 'breaking-change' label if appropriate. See CONTRIBUTING.md for details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces
VectorOps, a new utility class inorg.apache.arrow.vector.utilthat provides three generic whole-vector operations:shareCopy-- creates a new vector sharing the same underlying memory allocations via reference counting. Both source and result remain usable; memory is released only when all sharing vectors are closed.transferCopy-- creates a new vector by transferring buffer ownership. The source is left empty and can be reused viaallocateNew().deepCopy-- creates a fully independent clone with its own buffer allocations.These operations work generically across all vector types via
getFieldBuffers()/loadFieldBuffers(), requiring no per-type implementation -- unlikeTransferPair, which must be implemented by every vector type.VectorOpscan replaceTransferPairfor whole-vector operations;TransferPairremains necessary for sub-range splitting/slicing (splitAndTransfer).This PR also uses
shareCopyto fix the unsafe shared-reference bug inVectorSchemaRoot.addVector()andremoveVector()(see #1142). The original implementation shared raw object references between source and result roots, meaning closing one would invalidate the other. The fix preserves the original intended semantics (both roots remain readable) while making it safe through proper reference counting.Closes #1142
Test plan
TestVectorOps: 12 tests covering all three operations on IntVector, VarCharVector, VectorSchemaRoot, with/without explicit allocator, and source-closed-before-shared-copy semanticsTestVectorSchemaRoot: Updated existing tests + 2 new ownership tests verifying that closing the source root afteraddVector/removeVectordoes not affect the resultMade with Cursor