Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use partition for bool sort #448

Merged
merged 2 commits into from
Jun 19, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented Jun 11, 2021

Which issue does this PR close?

Closes #447

Rationale for this change

when no limit is present, the best way to sort boolean array is to partition in place, the second best way is to partition.

we are currently using stable sort which will allocate O(n/2) additional memory so this new change might be faster because even if partition_in_place is not used (due to nightly features), the memory allocated is also linear, resulting similar memory overhead, and the sorting happens quicker;

however if the original sort had been unstable (quicksort, in place), this change is not so obvious to be faster, because for quick sort the first pivot sweep would most definitely have the booleans separated.

What changes are included in this PR?

Are there any user-facing changes?

@alamb
Copy link
Contributor

alamb commented Jun 13, 2021

I will try and review this tomorrow

@jimexist jimexist force-pushed the use-partition-for-bool-sort branch from 403fe20 to b3e3dcd Compare June 13, 2021 15:11
@alamb alamb mentioned this pull request Jun 14, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does indeed appear that sort_by allocates auxiliary memory according to the documentation: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort_by

I tested the performance of this change using the benchmark in #457

I ran the following tests:

(arrow_dev) alamb@ip-10-0-0-124:~/Software/arrow-rs$ cargo bench -p arrow --bench sort_kernel -- 'bool sort' --save-baseline master

(arrow_dev) alamb@ip-10-0-0-124:~/Software/arrow-rs$ cargo bench -p arrow --bench sort_kernel -- 'bool sort' --save-baseline bool-partition

Critcmp implies there isn't much difference with this implementation vs what is on master

critcmp master bool-partition
group                   bool-partition                         master
-----                   --------------                         ------
bool sort 2^12          1.02   422.0±16.63µs        ? ?/sec    1.00   414.9±15.86µs        ? ?/sec
bool sort nulls 2^12    1.01   416.4±16.02µs        ? ?/sec    1.00    410.6±9.14µs        ? ?/sec
alamb@ip-10-0-0-124:~/Software/arrow-rs$

So looks good to me 👍

@jimexist
Copy link
Member Author

i guess 1024 is too short an array so it's dominated by the memory allocation rather than sorting

@alamb
Copy link
Contributor

alamb commented Jun 15, 2021

i guess 1024 is too short an array so it's dominated by the memory allocation rather than sorting

I also was under the impression that the partition sort was aiming to improve memory usage rather than speed

@jimexist
Copy link
Member Author

i guess 1024 is too short an array so it's dominated by the memory allocation rather than sorting

I also was under the impression that the partition sort was aiming to improve memory usage rather than speed

Both but the memory consumption is not so obvious unless partition in place is stabilized which is currently in nightly

@alamb
Copy link
Contributor

alamb commented Jun 16, 2021

Ok, I'll increase the size of the sort benchmark and see if I can see any difference

@alamb
Copy link
Contributor

alamb commented Jun 19, 2021

I finally got a chance to re-run the tests

Using 2^14 = 16384

critcmp master bool-partition
group                   bool-partition                         master
-----                   --------------                         ------
bool sort 2^14          1.00  1912.9±52.24µs        ? ?/sec    1.09      2.1±0.16ms        ? ?/sec
bool sort nulls 2^12    1.00   404.6±11.62µs        ? ?/sec    1.06   426.9±25.21µs        ? ?/sec

So looks like a nice change to me

Here is the test change in case anyone else wants to try:

diff --git a/arrow/benches/sort_kernel.rs b/arrow/benches/sort_kernel.rs
index f9f5f24c1..f517afede 100644
--- a/arrow/benches/sort_kernel.rs
+++ b/arrow/benches/sort_kernel.rs
@@ -80,9 +80,9 @@ fn add_benchmark(c: &mut Criterion) {
         b.iter(|| bench_sort(&arr_a, &arr_b, None))
     });
 
-    let arr_a = create_bool_array(2u64.pow(12) as usize, false);
-    let arr_b = create_bool_array(2u64.pow(12) as usize, false);
-    c.bench_function("bool sort 2^12", |b| {
+    let arr_a = create_bool_array(2u64.pow(14) as usize, false);
+    let arr_b = create_bool_array(2u64.pow(14) as usize, false);
+    c.bench_function("bool sort 2^14", |b| {
         b.iter(|| bench_sort(&arr_a, &arr_b, None))
     });

Sorry for the delay @jimexist -- it has been a busy few days for me

@alamb alamb merged commit 1f1c637 into apache:master Jun 19, 2021
@jimexist jimexist deleted the use-partition-for-bool-sort branch June 19, 2021 12:03
alamb pushed a commit that referenced this pull request Jun 22, 2021
* optimize boolean sort using parition

* add docs
alamb pushed a commit that referenced this pull request Jun 23, 2021
* optimize boolean sort using parition

* add docs
alamb added a commit that referenced this pull request Jun 23, 2021
* optimize boolean sort using parition

* add docs

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sort kernel boolean sort can be O(n)
2 participants