Don't optimize AnalyzeExec (#6379) (try 2)#6494
Conversation
| ))); | ||
| // Gather futures that will run each input partition using a | ||
| // JoinSet to cancel outstanding futures on drop | ||
| let mut set = JoinSet::new(); |
There was a problem hiding this comment.
This uses the cool JoinSet I learned about from @nvartolomei and @Darksonn on #6449 ❤️
There was a problem hiding this comment.
this logic was just extracted into its own function
| #[cfg_attr(tarpaulin, ignore)] | ||
| async fn csv_explain_analyze_order_by() { | ||
| let ctx = SessionContext::new(); | ||
| register_aggregate_csv_by_sql(&ctx).await; |
| // Turn the tasks in the JoinSet into a stream of | ||
| // Result<usize> representing the counts of each output | ||
| // partition. | ||
| let counts_stream = futures::stream::unfold(set, |mut set| async { |
There was a problem hiding this comment.
I think looking at https://github.com/apache/arrow-datafusion/pull/6494/files?w=1 makes it clearer what I did -- which was to change the plumbing to use futures and stream fu rather than channels
| let test2 = UnaryTestCase { | ||
| source_type: SourceType::Unbounded, | ||
| expect_fail: true, | ||
| expect_fail: false, |
| // Turn the tasks in the JoinSet into a stream of | ||
| // Result<usize> representing the counts of each output | ||
| // partition. | ||
| let counts_stream = futures::stream::unfold(set, |mut set| async { |
There was a problem hiding this comment.
I confirmed that the tokio stream adapaters don't appear to have a JoinSet impl - https://docs.rs/tokio-stream/latest/tokio_stream/wrappers/index.html?search=
| let end = Instant::now(); | ||
| // future that gathers the input counts into an overall output | ||
| // count, and makes an output batch | ||
| let output = counts_stream |
There was a problem hiding this comment.
FWIW you could just use a regular async move here, instead of needing the futures adapters
There was a problem hiding this comment.
That is an excellent point -- I got so carried away with being clever I lost sight of that.
I rewrote this logic to use a single future and also moved the output record batch creation into a function to separate the business logic from the async orchestration: 5521a70
|
Test failures do not appear to be related |
|
Test failures I think are due to #6495 |
Which issue does this PR close?
Closes #6380
Closes #6379
Rationale for this change
See #6379
What changes are included in this PR?
This is based off #6380 from @tustvold but it was getting a little big to just push to his branch so I decided to make a new PR
AnalyzeExecAnalyzeExecto handle multiple input streams usingfutures-fuAre these changes tested?
Yes
Are there any user-facing changes?
Less confusing
EXPLAIN ANALYZEresults