Always track the final size of the in-mem sorted arrays #3753

isidentical · 2022-10-07T13:15:58Z

Which issue does this PR close?

Closes #3747.

Rationale for this change

This code removes the assumption that the unsorted array's size and the sorted array's size is always the same or less (as discussed in #3747). Now we always do a final check on whether there is any size mismatch, and if we find one we record it so that the next iterations can decide whether to spill or not in a more reliable fashion.

What changes are included in this PR?

Fixes the broken check that assumes size(sorted_arr) <= size(input_arr). We also now record the changes in size no matter whether there is a propagated limit or not, since the size might actually be different without any change in the underlying data.

Are there any user-facing changes?

This is a fix/correction of two problems, the first one is the wrong check mentioned in #3747 and the second one is the wrong calculation of in-memory sorted arrays where the final result actually takes more space than we initially assumed it would.

andygrove · 2022-10-07T16:42:37Z

Thanks @isidentical and @Dandandan

ursabot · 2022-10-07T16:52:07Z

Benchmark runs are scheduled for baseline = d863853 and contender = 0ab0e0f. 0ab0e0f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Always track the final size of the in-mem sorted arrays

12871c6

isidentical force-pushed the gh-3747 branch from e037980 to 12871c6 Compare October 7, 2022 13:20

isidentical marked this pull request as ready for review October 7, 2022 13:26

Dandandan approved these changes Oct 7, 2022

View reviewed changes

andygrove merged commit 0ab0e0f into apache:master Oct 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Always track the final size of the in-mem sorted arrays #3753

Always track the final size of the in-mem sorted arrays #3753

Uh oh!

isidentical commented Oct 7, 2022

Uh oh!

andygrove commented Oct 7, 2022

Uh oh!

ursabot commented Oct 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Always track the final size of the in-mem sorted arrays #3753

Always track the final size of the in-mem sorted arrays #3753

Uh oh!

Conversation

isidentical commented Oct 7, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

andygrove commented Oct 7, 2022

Uh oh!

ursabot commented Oct 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants