-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: integrate batch coalescer with repartition exec #19002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Ok(Box::pin(RecordBatchStreamAdapter::new( | ||
| self.schema(), | ||
| futures::stream::iter(self.batches.clone().into_iter().map(Ok)), | ||
| futures::stream::iter(self.batches.clone().into_iter().map(move |batch| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #18782 (comment) for my thoughts/reason on updating this test.
|
run benchmark tpch tpch_mem tpch10 |
|
🤖 Hi @Dandandan, thanks for the request (#19002 (comment)). |
|
run benchmark tpch tpch_mem tpch10 |
|
🤖 Hi @Dandandan, thanks for the request (#19002 (comment)). |
|
run benchmark tpch |
|
🤖 |
|
run benchmark tpch10 |
|
run benchmark clickbench_partitioned |
I will update my scraper to support this syntax FYI |
|
🤖: Benchmark completed Details
|
| break; | ||
| } | ||
| let inner_poll = self.poll_next_inner(cx); | ||
| let _timer = cloned_time.timer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be before the poll_next_inner() call ?
| } | ||
| Poll::Ready(None) => { | ||
| completed = true; | ||
| coalescer.finish()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| coalescer.finish()?; | |
| if let Err(e) = coalescer.finish() { | |
| self.batch_coalescer = Some(coalescer); | |
| return self.baseline_metrics.record_poll(Poll::Ready(Some(Err(e)))); | |
| } |
Otherwise in case of an error the self.batch_coalescer won't be restored.
| } | ||
|
|
||
| impl PerPartitionStream { | ||
| #[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #[allow(clippy::too_many_arguments)] | |
| #[expect(clippy::too_many_arguments)] |
By using expect instead of allow the Clippy rule will fail too once it is no more needed and the developer will have to remove it. Otherwise it may become obsolete.
| coalescer.finish()?; | ||
| } | ||
| Poll::Ready(Some(Ok(batch))) => { | ||
| coalescer.push_batch(batch)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| coalescer.push_batch(batch)?; | |
| coalescer.?; | |
| if let Err(e) = coalescer.push_batch(batch) { | |
| self.batch_coalescer = Some(coalescer); | |
| return self.baseline_metrics.record_poll(Poll::Ready(Some(Err(e)))); | |
| } |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark clickbench_partitioned |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Thank you for this PR @jizezhang 🙏 My reading of the benchmark results so far is that this PR may be slightly faster, and doesn't cause any regressions. I'll try and patch up my ability to run tpch SF10 tests and then give this one a good review |
Which issue does this PR close?
BatchCoalescerintoRepartitionExecand remove fromCoalesceBatchesoptimization rule #18782.Rationale for this change
RepartitionExechas two cases: sort-preserving and non sort-preserving. This change integratesLimitedBatchCoalescerwith the latter. For the former, it seems thatSortPreservingMergeStreamthat builds on top ofPerPartitionStreamshas batching logic built in:datafusion/datafusion/physical-plan/src/sorts/merge.rs
Lines 279 to 289 in e4dcf0c
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No