Skip to content

Conversation

@isidentical
Copy link
Contributor

Which issue does this PR close?

Closes #3747.

Rationale for this change

This code removes the assumption that the unsorted array's size and the sorted array's size is always the same or less (as discussed in #3747). Now we always do a final check on whether there is any size mismatch, and if we find one we record it so that the next iterations can decide whether to spill or not in a more reliable fashion.

What changes are included in this PR?

Fixes the broken check that assumes size(sorted_arr) <= size(input_arr). We also now record the changes in size no matter whether there is a propagated limit or not, since the size might actually be different without any change in the underlying data.

Are there any user-facing changes?

This is a fix/correction of two problems, the first one is the wrong check mentioned in #3747 and the second one is the wrong calculation of in-memory sorted arrays where the final result actually takes more space than we initially assumed it would.

@isidentical isidentical marked this pull request as ready for review October 7, 2022 13:26
@andygrove
Copy link
Member

Thanks @isidentical and @Dandandan

@andygrove andygrove merged commit 0ab0e0f into apache:master Oct 7, 2022
@ursabot
Copy link

ursabot commented Oct 7, 2022

Benchmark runs are scheduled for baseline = d863853 and contender = 0ab0e0f. 0ab0e0f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFusionError(Internal("The size of the sorted batch is larger than the size of the input batch: 2120 > 2312"))

4 participants