Skip to content

[GLUTEN-11133][VL] Refactor batch serialization API to defer the buffer copy from C++ code to Java code#11127

Merged
zhztheplayer merged 4 commits intoapache:mainfrom
zhztheplayer:wip-offheap-serialize-1
Nov 21, 2025
Merged

[GLUTEN-11133][VL] Refactor batch serialization API to defer the buffer copy from C++ code to Java code#11127
zhztheplayer merged 4 commits intoapache:mainfrom
zhztheplayer:wip-offheap-serialize-1

Conversation

@zhztheplayer
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer commented Nov 19, 2025

In preparation for moving the memory allocation triggered by this code from heap to off-heap.

The patch defers the off-heap-to-on-heap buffer copy during columnar batch serialization from here to here.

In the next PR(s), we'll refactor the Java / Scala code to remove the copy and store the off-heap binary data directly when spark.gluten.velox.offHeapBroadcastBuildRelation.enabled=true.

Related issue: #11133

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels Nov 19, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhztheplayer zhztheplayer marked this pull request as ready for review November 20, 2025 10:21
@zhztheplayer zhztheplayer changed the title [VL] Refactor batch serialization API to defer the buffer copy from C++ code to Java code [GLUTEN-11133][VL] Refactor batch serialization API to defer the buffer copy from C++ code to Java code Nov 20, 2025
@zhztheplayer
Copy link
Copy Markdown
Member Author

cc @zjuwangg @jinchengchenghh Thanks.

Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, only small nits


virtual std::shared_ptr<arrow::Buffer> serializeColumnarBatches(
const std::vector<std::shared_ptr<ColumnarBatch>>& batches) = 0;
virtual void addForSerialization(const std::shared_ptr<ColumnarBatch>& batch) = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function calls Velox function append, so I would prefer function name append

const std::vector<std::shared_ptr<ColumnarBatch>>& batches) = 0;
virtual void addForSerialization(const std::shared_ptr<ColumnarBatch>& batch) = 0;

virtual int64_t serializedSize() = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same maxSerializedSize

Copy link
Copy Markdown
Contributor

@zjuwangg zjuwangg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhztheplayer
Copy link
Copy Markdown
Member Author

@jinchengchenghh Let me know if you have further comments. Thanks.

@zhztheplayer
Copy link
Copy Markdown
Member Author

Thanks for reviewing @zjuwangg @jinchengchenghh

@zhztheplayer zhztheplayer merged commit 1ed01a5 into apache:main Nov 21, 2025
61 checks passed
WangGuangxin pushed a commit to WangGuangxin/gluten that referenced this pull request Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants