Skip to content

[GLUTEN-10920][VL] Allow disabling hash/sort shuffle reader buffer#10922

Merged
wForget merged 1 commit intoapache:mainfrom
wForget:GLUTEN-10920
Oct 23, 2025
Merged

[GLUTEN-10920][VL] Allow disabling hash/sort shuffle reader buffer#10922
wForget merged 1 commit intoapache:mainfrom
wForget:GLUTEN-10920

Conversation

@wForget
Copy link
Member

@wForget wForget commented Oct 22, 2025

What changes are proposed in this pull request?

Add buffer to the shuffle read input stream only if readerBufferSize is greater than 0.

How was this patch tested?

Manually testing internal test case:

add spark.gluten.sql.columnar.shuffle.readerBufferSize=1MB; conf:
image

add spark.gluten.sql.columnar.shuffle.readerBufferSize=0; conf:

image

Related issue: #10920

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@FelixYBW
Copy link
Contributor

@marin-ma why it's onheap copy? shouldn't reducer use netty to load data into direct memory?

image

@FelixYBW
Copy link
Contributor

@wForget did you set spark.shuffle.io.preferDirectBufs=False?

@zuston
Copy link
Member

zuston commented Oct 23, 2025

@wForget did you set spark.shuffle.io.preferDirectBufs=False?

The issue occurs when using Uniffle, rather than the vanilla shuffle.

@wForget
Copy link
Member Author

wForget commented Oct 23, 2025

@marin-ma why it's onheap copy? shouldn't reducer use netty to load data into direct memory?

I filed #10923 for this issue

@wForget wForget merged commit 7b7ef95 into apache:main Oct 23, 2025
139 of 143 checks passed
@wForget
Copy link
Member Author

wForget commented Oct 23, 2025

Thanks @zuston @FelixYBW for the review, merged to main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants