-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Avoid skew in Roundrobin repartition #18880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| assert_eq!(partition_row_counts.len(), 3); | ||
| assert_eq!(partition_row_counts[0], 2); | ||
| assert_eq!(partition_row_counts[0], 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shows the improvement - all output partitions get at least 1 batch, instead of skewing them to the left side.
026513e to
7923e9c
Compare
|
Couldn't show perf difference on my 10 core machine. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you @Dandandan
I have scheduled a benchmark run just to make sure there isn't something we missed but I don't think so
|
🤖 |
|
🤖: Benchmark completed Details
|
Looks like mostly noise. |
Which issue does this PR close?
Rationale for this change
This makes roundrobin repartition more fairly distributed.
The benchmarks probably don't reflect this as much (maybe on very high core counts?), as the partitioning already mostly happens at the source side.
What changes are included in this PR?
Set start partition based on input partition.
Are these changes tested?
Existing tests
Are there any user-facing changes?