[GLUTEN-11133][VL] Refactor batch serialization API to defer the buffer copy from C++ code to Java code#11127
Merged
zhztheplayer merged 4 commits intoapache:mainfrom Nov 21, 2025
Conversation
|
Run Gluten Clickhouse CI on x86 |
Member
Author
|
cc @zjuwangg @jinchengchenghh Thanks. |
Contributor
jinchengchenghh
left a comment
There was a problem hiding this comment.
Overall looks good, only small nits
|
|
||
| virtual std::shared_ptr<arrow::Buffer> serializeColumnarBatches( | ||
| const std::vector<std::shared_ptr<ColumnarBatch>>& batches) = 0; | ||
| virtual void addForSerialization(const std::shared_ptr<ColumnarBatch>& batch) = 0; |
Contributor
There was a problem hiding this comment.
This function calls Velox function append, so I would prefer function name append
| const std::vector<std::shared_ptr<ColumnarBatch>>& batches) = 0; | ||
| virtual void addForSerialization(const std::shared_ptr<ColumnarBatch>& batch) = 0; | ||
|
|
||
| virtual int64_t serializedSize() = 0; |
Contributor
There was a problem hiding this comment.
same maxSerializedSize
|
Run Gluten Clickhouse CI on x86 |
Member
Author
|
@jinchengchenghh Let me know if you have further comments. Thanks. |
jinchengchenghh
approved these changes
Nov 20, 2025
Member
Author
|
Thanks for reviewing @zjuwangg @jinchengchenghh |
WangGuangxin
pushed a commit
to WangGuangxin/gluten
that referenced
this pull request
Dec 23, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In preparation for moving the memory allocation triggered by this code from heap to off-heap.
The patch defers the off-heap-to-on-heap buffer copy during columnar batch serialization from here to here.
In the next PR(s), we'll refactor the Java / Scala code to remove the copy and store the off-heap binary data directly when
spark.gluten.velox.offHeapBroadcastBuildRelation.enabled=true.Related issue: #11133