[WIP][SPARK-44243][CORE] Add a parameter to determine the locality of local shuffle reader#41786
[WIP][SPARK-44243][CORE] Add a parameter to determine the locality of local shuffle reader#41786wankunde wants to merge 1 commit intoapache:masterfrom
Conversation
|
If you want locality for shuffle, enable |
|
cc @maryannxue FYI |
|
What's the use of the new conf? How does it improve locality? Isn't it just enough to do https://github.com/apache/spark/pull/41786/files#diff-a3b15298f97577c1fadcc2d76d015eebd6343e246c6717417d33f3c458847f46L1133? |
|
Thanks @mridulm @maryannxue for your review. If a query contains shuffle A and shuffle B, there are many PartialReducerPartitions after OptimizeSkewedJoin optimization, and shuffle B is a local read shuffle. Enable |
What changes were proposed in this pull request?
Follow changes of #40339
Local shuffle reader can achieve better performance with preferred locations. If we disable SHUFFLE_REDUCE_LOCALITY_ENABLE in queries that include reduce shuffles and local shuffles, local shuffle readers can not get preferred locations.
Add new parameter LOCAL_SHUFFLE_LOCALITY_ENABLE to determine whether to get the preferred locations of the current partitionSpec.
Why are the changes needed?
Improvement for spark local shuffle.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Exists UT