Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort_primitive result is capped to the min of limit or values.len #236

Merged
merged 3 commits into from
May 1, 2021

Conversation

medwards
Copy link
Contributor

@medwards medwards commented Apr 28, 2021

Closes #235

What changes are included in this PR?

The result slice in sort_primitive is the size of all the values being sorted even when a limit is provided. Change the size to be the maximum possible result (thus avoiding the panic in #235)

There are other slice size errors as well (such as when the number of nulls exceeds the limit) that are included.

Are there any user-facing changes?

No

Notes

I've provided a reproducing test, but skipped other variants of limits /w nulls. Those should be easy so let me know if they're of interest. Additionally it is possible that result is intentionally over-large in which case I can try alternative fixes.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find @medwards -- thank you for the contribution. I am not sure about the use of saturating_sub but I have another idea too (see medwards#1).

👍

FYI @sundy-li who I think contributed the original limit code in apache/arrow#9602

arrow/src/compute/kernels/sort.rs Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- I think this PR is good. Any other tests you are willing to write would be appreciated.

BTW if you rebase this PR now to pick up the latest changes on master it should pass the CI checks.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @medwards , thanks a lot for this PR! It looks great to me 💯

I left two comments that I think we may aim to, but please do challenge them if you disagree :)

arrow/src/compute/kernels/sort.rs Show resolved Hide resolved
arrow/src/compute/kernels/sort.rs Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2021

Codecov Report

Merging #236 (e5840d7) into master (d008f31) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #236   +/-   ##
=======================================
  Coverage   82.51%   82.52%           
=======================================
  Files         162      162           
  Lines       43655    43672   +17     
=======================================
+ Hits        36022    36039   +17     
  Misses       7633     7633           
Impacted Files Coverage Δ
arrow/src/compute/kernels/sort.rs 94.41% <100.00%> (+0.08%) ⬆️
parquet/src/encodings/encoding.rs 94.86% <0.00%> (-0.20%) ⬇️
arrow/src/array/transform/fixed_binary.rs 84.21% <0.00%> (+5.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d008f31...e5840d7. Read the comment docs.

@alamb
Copy link
Contributor

alamb commented Apr 30, 2021

BTW we are working on the Dev PR / Process failing check in #242

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks a lot for solving this and for your contribution 👍

@alamb
Copy link
Contributor

alamb commented May 1, 2021

Thanks again @medwards !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort with limit panics for the limit includes some but not all nulls, for large arrays
4 participants