Vectorized lexicographical_partition_ranges (~80% faster) #4575

tustvold · 2023-07-27T16:09:26Z

Which issue does this PR close?

Rationale for this change

lexicographical_partition_ranges(u8) 2^10
                        time:   [1.5420 µs 1.5426 µs 1.5432 µs]
                        change: [-81.810% -81.793% -81.778%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

lexicographical_partition_ranges(u8) 2^12
                        time:   [2.3049 µs 2.3062 µs 2.3076 µs]
                        change: [-85.514% -85.454% -85.393%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

lexicographical_partition_ranges(u8) 2^10 with nulls
                        time:   [2.3843 µs 2.3858 µs 2.3873 µs]
                        change: [-67.819% -67.794% -67.767%] (p = 0.00 < 0.05)
                        Performance has improved.

lexicographical_partition_ranges(u8) 2^12 with nulls
                        time:   [3.2080 µs 3.2098 µs 3.2117 µs]
                        change: [-77.353% -77.335% -77.319%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

lexicographical_partition_ranges(f64) 2^10
                        time:   [3.2753 µs 3.2774 µs 3.2799 µs]
                        change: [-82.556% -82.542% -82.528%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

lexicographical_partition_ranges(low cardinality) 1024
                        time:   [388.79 ns 389.09 ns 389.40 ns]
                        change: [-22.807% -22.726% -22.644%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2023-07-27T16:22:57Z

arrow-ord/src/partition.rs

-    partition_point(start + bound / 2, end.min(start + bound + 1), |idx| {
-        comparator.compare(idx, target) != Ordering::Greater
-    })
+    Ok(out.into_iter())


In opted to preserve the existing function signature for now, I can definitely see a future incarnation returning the computed bitmask somehow to allow for more optimal processing

Maybe worth a ticket (I can also update the docs in #4615)

tustvold · 2023-07-27T16:27:22Z

arrow-ord/src/partition.rs

+        Some(n) => {
+            let n1 = n.inner().slice(0, slice_len);
+            let n2 = n.inner().slice(1, slice_len);
+            &(&n1 ^ &n2) | &values_ne


There is quite possibly some more clever way to bit transitions from a bitmask, however, this is already likely sufficiently fast as to be irrelevant

Agree. Took me a while to follow the logic though, a comment could help. "values are either not-equal (and both non-null) or exactly one value is null"

this is quite clever

alamb · 2023-08-01T11:03:02Z

I filed #4614 to track this. I am reviewing this PR as well. Thank you @tustvold -- looks very exciting

alamb

I think this code looks very nice and clever @tustvold well done 👏

cc @crepererum as I think you also used this approach while working on the deduplication logic in IOx.

The only thing I worry about with this change is the relative lack of test coverage. Specifically, I didnt' see a tests for the following cases which seem important:

partitioning of arrays with 0 and 1 elements -- I think given the new slicing we should add a test to verify the correct behavior in these scenarios
arrays where the values in change but the slots are marked null, so they shouldn't be new partitions (to ensure the null mask handling works properly)
mulit-column partitioning where both arrays had null values and
arrays with nulls that are greated than 2 in length (aka where there are more than 2 partitions)

alamb · 2023-08-01T12:02:47Z

arrow-ord/src/partition.rs

-    partition_point(start + bound / 2, end.min(start + bound + 1), |idx| {
-        comparator.compare(idx, target) != Ordering::Greater
-    })
+    Ok(out.into_iter())


Maybe worth a ticket (I can also update the docs in #4615)

…ranges

tustvold · 2023-08-02T16:13:54Z

arrow-ord/src/partition.rs

+    }
+
+    /// Returns the number of partitions
+    pub fn len(&self) -> usize {


This will allow a very quick check for the case of all partitions have a length of 1, which may allow for a more efficient special case. In the case of IOx this will allow for it to avoid calling into the dedup logic at all

tustvold · 2023-08-02T16:14:42Z

arrow-ord/src/partition.rs

-) -> Result<impl Iterator<Item = Range<usize>> + '_, ArrowError> {
-    LexicographicalPartitionIterator::try_new(columns)
-}
+pub fn partition(columns: &[ArrayRef]) -> Result<Partitions, ArrowError> {


The API no longer takes SortColumn as it doesn't actually matter what the sort order is, just that the data is sorted

tustvold · 2023-08-02T16:15:19Z

arrow-ord/src/partition.rs

-    previous_partition_point: usize,
-    partition_point: usize,
-}
+    match num_rows {


@alamb Your testing suggestions were on the money 👍

tustvold · 2023-08-02T16:18:40Z

I have updated this PR with more tests and a cleaner API, PTAL

alamb

I think this looks really nice to me -- great work @tustvold

cc @wolfcm

alamb · 2023-08-03T19:46:16Z

arrow-ord/src/partition.rs

+        Some(n) => {
+            let n1 = n.inner().slice(0, slice_len);
+            let n2 = n.inner().slice(1, slice_len);
+            &(&n1 ^ &n2) | &values_ne


this is quite clever

alamb · 2023-08-03T19:50:54Z

arrow-ord/src/partition.rs

-            },
+            Arc::new(Int64Array::new(vec![1; 9].into(), None)) as _,
+            Arc::new(Int64Array::new(
+                vec![1, 1, 2, 2, 2, 3, 3, 3, 3].into(),


I think it would help to also have a test like this where the value in the null would actually be wrong / different

Suggested change

vec![1, 1, 2, 2, 2, 3, 3, 3, 3].into(),

vec![1, 1, 2, 2, 2, 3, 0, 3, 3].into(),

However I also made that change and it passed so 👍

alamb · 2023-08-03T19:51:08Z

arrow-ord/src/partition.rs

+    ///
+    /// Consecutive ranges will be contiguous: i.e [`(a, b)` and `(b, c)`], and
+    /// `start = 0` and `end = self.len()` for the first and last range respectively
+    pub fn ranges(&self) -> Vec<Range<usize>> {


One potential difference with this implementation that I realized compared to main is that it requires memory (a Vec) to store the partition ranges where the previous implementation just iterated over them. I think it would be possible to implement an iterator for this as well to avoid that regression. I'll see if I can make a PR to do so

I messed around with trying to make this work but got stymied by the borrow checker -- specifically that the BitIndexIterator had a reference so I couldn't embed it in another iterator.

Faster lexicographical_partition_ranges

cc29399

tustvold mentioned this pull request Jul 27, 2023

Improved performance for streaming group by apache/datafusion#7023

Open

github-actions bot added the arrow Changes to the arrow crate label Jul 27, 2023

tustvold changed the title ~~Faster lexicographical_partition_ranges (~80% faster)~~ Vectorized lexicographical_partition_ranges (~80% faster) Jul 27, 2023

tustvold commented Jul 27, 2023

View reviewed changes

Add comments

1754705

alamb mentioned this pull request Aug 1, 2023

Minor: improve docs and add example for lexicographical_partition_ranges #4615

Merged

alamb reviewed Aug 1, 2023

View reviewed changes

tustvold added 2 commits August 2, 2023 16:26

Merge remote-tracking branch 'upstream/master' into faster-partition-…

ea83e26

…ranges

Add tests and cleanup API

086bf8e

tustvold commented Aug 2, 2023

View reviewed changes

Update benchmarks

58904d4

Fix bench

fa9d399

tustvold mentioned this pull request Aug 2, 2023

Minor: tweak wording of lexicographical_partition_ranges docs #4633

Closed

alamb approved these changes Aug 3, 2023

View reviewed changes

alamb merged commit 841a6a9 into apache:master Aug 3, 2023
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorized lexicographical_partition_ranges (~80% faster) #4575

Vectorized lexicographical_partition_ranges (~80% faster) #4575

tustvold commented Jul 27, 2023 •

edited by alamb

tustvold Jul 27, 2023

alamb Aug 1, 2023

tustvold Jul 27, 2023 •

edited

jhorstmann Jul 28, 2023

alamb Aug 3, 2023

alamb commented Aug 1, 2023

alamb left a comment

alamb Aug 1, 2023

tustvold Aug 2, 2023

tustvold Aug 2, 2023

tustvold Aug 2, 2023

tustvold commented Aug 2, 2023

alamb left a comment

alamb Aug 3, 2023

alamb Aug 3, 2023

alamb Aug 3, 2023

alamb Aug 3, 2023

	vec![1, 1, 2, 2, 2, 3, 3, 3, 3].into(),
	vec![1, 1, 2, 2, 2, 3, 0, 3, 3].into(),

Vectorized lexicographical_partition_ranges (~80% faster) #4575

Vectorized lexicographical_partition_ranges (~80% faster) #4575

Conversation

tustvold commented Jul 27, 2023 • edited by alamb

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Jul 27, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Aug 1, 2023

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Aug 2, 2023

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Jul 27, 2023 •

edited by alamb

tustvold Jul 27, 2023 •

edited