-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-56000][BUILD] Upgrade arrow-java to 19.0.0
#54820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After making the modification here, all tests have passed. However, I haven't yet examined the specific change details in version 19.0.0 that necessitated this modification, so I'm temporarily unable to confirm whether it is truly issue-free for version 18.3.0 or if problems simply haven't been uncovered yet. Let's hold off for a while. If this turns out to be a lingering issue, I will submit a pr to fix it first.
Personally, I'm more inclined to think that this is a lingering issue.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Problem Statement
After upgrading
arrow-javafrom 18.3.0 to 19.0.0, Spark Connect client tests(e.g.CatalogSuite,DataFrameTableValuedFunctionsSuite) fail inafterAll()whenallocator.close()is called:The stack trace points to:
2. Root Cause
2.1 Pre-existing bug in
SparkResult.processResponses()In
SparkResult.processResponses(), when a deserialized Arrow batch contains 0 rows (numRecordsInBatch == 0), theArrowMessageobjects are neither stored inresultMapnor closed:SparkResultCloseable.close()only releases messages stored inresultMap. Empty-batch messages fall through and their underlying Arrow buffers are never released.2.2 Arrow GH-343 made it observable
Arrow-Java GH-343 fixed offset buffer IPC serialization for empty vectors (
valueCount == 0). This fix, included in v19.0.0, changed the IPC body size of empty batches from 0 bytes to a non-zero value, which turned the previously-silent Spark bug into a visible allocator failure.The relevant commits between v18.3.0 and v19.0.0:
0f8a0808f(PR #967)ListVector/LargeListVectoroffset buffer whenvalueCount == 077df3ecb2(PR #989)BaseVariableWidthVector/BaseLargeVariableWidthVectoroffset buffer whenvalueCount == 0What changed: when
valueCount == 0,setReaderAndWriterIndex()previously setoffsetBuffer.writerIndex(0), makingreadableBytes() == 0and writing 0 bytes to the IPC stream. The Arrow spec requires that offset buffers always contain at least one entry[0], so GH-343 changed this tooffsetBuffer.writerIndex(OFFSET_WIDTH), makingreadableBytes() == 4.setReaderAndWriterIndex()whenvalueCount == 0offsetBuffer.writerIndex(0)→readableBytes() = 0offsetBuffer.writerIndex(OFFSET_WIDTH = 4)→readableBytes() = 43. Detailed Causal Chain
valueCount=0)readMessageBody(in, bodyLength, allocator)allocator.buffer(0)→ returns singletongetEmpty(), backed byEmptyReferenceManager(not tracked by allocator)allocator.buffer(bodyLength > 0)→ allocates realArrowBuf(tracked by allocator)deserializeRecordBatchcallsbody.slice()per field bufferEmptyReferenceManager;retain()/release()are no-opsBufferLedger;retain()increments refcountArrowRecordBatchconstructor callsretain()per slicebody.getReferenceManager().release()ArrowRecordBatch.close()never called (Spark bug)allocator.close()IllegalStateExceptionKey mechanism:
BaseAllocator.buffer(0)returns untracked empty bufferIn v18.3.0, empty-batch IPC body is 0 bytes →
allocator.buffer(0)→getEmpty()(untracked). All downstreamslice(),retain(), andrelease()calls are no-ops. The missingclose()inSparkResultis harmless.In v19.0.0, empty-batch IPC body is > 0 bytes →
allocator.buffer(n)→ real tracked buffer. The missingclose()becomes a real off-heap memory leak.4. Does v18.3.0 have an actual memory leak?
No. Under v18.3.0, empty batches cause no memory leak at all:
allocator.buffer(0)returns the pre-allocated singleton empty buffer. No additional off-heap memory is allocated for empty batches, so there is nothing to leak.ArrowRecordBatch/ArrowBufwrapper objects hold no strong references afterprocessResponses()returns, and are collected by normal GC.EmptyReferenceManageris a no-op singleton. The allocator never registers these buffers, soallocator.close()sees no outstanding allocations.The bug in
SparkResultis logically present in v18.3.0, but it is structurally impossible to cause any resource leak because the entire empty buffer path — from allocation through slicing to reference counting — operates on untracked no-op objects.Under v19.0.0 without the fix, the situation is different:
allocator.buffer(bodyLength > 0)allocates real off-heap memory.ArrowRecordBatchis neverclose()-d, so theBufferLedgerrefcount never reaches 0.ArrowBufhas no finalizer orCleaner, so GC of the Java wrapper does not decrement the off-heap refcount.allocator.close()detects and reports it.5. The Fix
When
numRecordsInBatch == 0, the deserializedArrowMessageobjects are explicitly closed. This callsArrowRecordBatch.close(), which invokesrelease()on each sliced buffer, allowing theBufferLedgerrefcount to reach 0 and the off-heap memory to be freed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After conducting research, I believe this issue will not have a material impact on version 18.3.0, so I personally prefer to address it with a fix in the current pr.