Skip to content

[SPARK-43586][SQL] Use the smaller value of Range.numElements and Range.numSlices as numSlices of RangeExec#41230

Closed
LuciferYang wants to merge 3 commits intoapache:masterfrom
LuciferYang:range-num-slices
Closed

[SPARK-43586][SQL] Use the smaller value of Range.numElements and Range.numSlices as numSlices of RangeExec#41230
LuciferYang wants to merge 3 commits intoapache:masterfrom
LuciferYang:range-num-slices

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented May 19, 2023

What changes were proposed in this pull request?

This pr change RangeExec#numSlices to use smaller value in Range.numElements and Range.numSlices to avoid scheduling unnecessary tasks.

Why are the changes needed?

Avoid scheduling unnecessary tasks when Range.numSlices > Range.numElements.

Does this PR introduce any user-facing change?

Yes, when Range.numSlices > Range.numElements is, for the result of queryExecution.debug, the splits value in the printed Physical Plan will change.

How was this patch tested?

  • Existing UT
  • Manual check:

start a spark-shell with --master "local[100]", then run

spark.range(10).map(_ + 1).reduce(_ + _)

Before

image

After

image

@LuciferYang LuciferYang marked this pull request as draft May 19, 2023 05:02
@github-actions github-actions bot added the SQL label May 19, 2023
@LuciferYang LuciferYang marked this pull request as ready for review May 19, 2023 10:09
@HyukjinKwon
Copy link
Member

cc @wangyum @cloud-fan

@cloud-fan
Copy link
Contributor

I feel it's better to always respect the user-specified num slice parameter. If the num slice is not specified, I agree that we can make it not larger than the num elements.

@LuciferYang
Copy link
Contributor Author

I feel it's better to always respect the user-specified num slice parameter. If the num slice is not specified, I agree that we can make it not larger than the num elements.

This may require adding a status to record whether it is a user-specified slice or use default? Sounds like it will increase the complexity of the code. I don't think it's worth because this just a minor case, maybe keep it as it is is better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants