Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use sort_unstable_by in primitive sorting #552

Merged
merged 1 commit into from
Jul 20, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented Jul 14, 2021

Which issue does this PR close?

use sort_unstable_by in primitive sorting

Closes #553

Rationale for this change

  1. less memory usage
  2. likely faster
  3. given a present limit the k-select is already unstable, we need to be consistent

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 14, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #552 (8418f1c) into master (fc78af6) will not change coverage.
The diff coverage is 90.90%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #552   +/-   ##
=======================================
  Coverage   82.47%   82.47%           
=======================================
  Files         167      167           
  Lines       46142    46142           
=======================================
  Hits        38056    38056           
  Misses       8086     8086           
Impacted Files Coverage Δ
arrow/src/compute/kernels/sort.rs 94.15% <90.90%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc78af6...8418f1c. Read the comment docs.

@jimexist
Copy link
Member Author

sort 2^10               time:   [110.68 us 111.64 us 112.55 us]
                        change: [-14.710% -13.112% -11.406%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

sort 2^12               time:   [516.32 us 519.97 us 523.58 us]
                        change: [-10.932% -9.5913% -8.2390%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

sort nulls 2^10         time:   [88.568 us 89.197 us 89.907 us]
                        change: [-20.378% -18.966% -17.484%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

sort nulls 2^12         time:   [372.34 us 375.56 us 378.70 us]
                        change: [-25.293% -24.019% -22.641%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

bool sort 2^12          time:   [184.53 us 185.82 us 187.17 us]
                        change: [-60.022% -59.238% -58.451%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

bool sort nulls 2^12    time:   [210.24 us 213.40 us 216.47 us]
                        change: [-54.063% -53.122% -52.165%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

sort 2^12 limit 10      time:   [61.881 us 62.244 us 62.614 us]
                        change: [-8.6258% -6.5015% -4.3522%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

sort 2^12 limit 100     time:   [67.314 us 67.729 us 68.205 us]
                        change: [-5.5755% -3.5509% -1.4872%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

sort 2^12 limit 1000    time:   [178.07 us 179.37 us 180.82 us]
                        change: [-2.1034% +0.0621% +2.3026%] (p = 0.96 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe

sort 2^12 limit 2^12    time:   [459.51 us 462.14 us 464.90 us]
                        change: [-25.238% -23.036% -21.081%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

sort nulls 2^12 limit 10
                        time:   [117.28 us 118.29 us 119.29 us]
                        change: [+7.7135% +9.3966% +11.186%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

sort nulls 2^12 limit 100
                        time:   [106.53 us 107.89 us 109.50 us]
                        change: [-6.4134% -4.0486% -1.6589%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

sort nulls 2^12 limit 1000
                        time:   [113.71 us 114.30 us 114.92 us]
                        change: [-5.9266% -4.5141% -3.0736%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

sort nulls 2^12 limit 2^12
                        time:   [336.84 us 340.39 us 344.08 us]
                        change: [-31.546% -30.224% -28.822%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

@jorgecarleitao jorgecarleitao added the api-change Changes to the arrow API label Jul 19, 2021
@jorgecarleitao
Copy link
Member

I think that this is backward incompatible since the sorted indices are now different between implementations, It is a semantic, not an API, backward incompatibility. See https://stackoverflow.com/questions/1517793/what-is-stability-in-sorting-algorithms-and-why-is-it-important if you are interested in the why it is important.

@jimexist
Copy link
Member Author

I think that this is backward incompatible since the sorted indices are now different between implementations, It is a semantic, not an API, backward incompatibility. See https://stackoverflow.com/questions/1517793/what-is-stability-in-sorting-algorithms-and-why-is-it-important if you are interested in the why it is important.

agree that it's API change. and also it's an improvement of consistency because somehow the sort with limit has been unstable. if needed we can add a stable sort alternative later.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me. I think we should also update the docstring so it no longer says the sort is stable:

/// Sort the `ArrayRef` using `SortOptions`.
///
/// Performs a stable sort on values and indices. Nulls are ordered according to the `nulls_first` flag in `options`.
/// Floats are sorted using IEEE 754 totalOrder
///

@alamb
Copy link
Contributor

alamb commented Jul 19, 2021

Thanks @jimexist

@alamb
Copy link
Contributor

alamb commented Jul 20, 2021

I will update the docstring in a follow on PR -- I am trying to keep the merge queue down

@alamb alamb merged commit 99ae88c into apache:master Jul 20, 2021
@alamb
Copy link
Contributor

alamb commented Jul 20, 2021

Thanks again @jimexist

@alamb
Copy link
Contributor

alamb commented Jul 20, 2021

Doc update proposed in #572

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

primitive sorting can be improved and more consistent with and without limit if sorted unstably
6 participants