Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup sort #4613

Merged
merged 4 commits into from
Aug 4, 2023
Merged

Cleanup sort #4613

merged 4 commits into from
Aug 4, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Aug 1, 2023

Which issue does this PR close?

Part of #4545

Rationale for this change

Cleans up the sort kernel to make the code easier to maintain.

Also changes list sorting to use the same rank-based approach used for dictionaries, which should be substantially faster, although I don't have benchmarks to quantify this.

The benchmarks are quite noisy, but this seems to possibly introduce a slight performance improvement.

sort i64 2^10           time:   [9.1265 µs 9.1303 µs 9.1344 µs]
                        change: [-0.1108% -0.0301% +0.0483%] (p = 0.47 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

sort i64 2^12           time:   [30.098 µs 30.180 µs 30.264 µs]
                        change: [+0.0343% +0.3620% +0.6738%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

sort i64 nulls 2^10     time:   [80.763 µs 80.801 µs 80.844 µs]
                        change: [-0.5475% -0.3889% -0.1707%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

sort i64 nulls 2^12     time:   [30.081 µs 30.134 µs 30.195 µs]
                        change: [-3.5573% -2.4943% -1.5011%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

sort f32 to indices 2^12
                        time:   [129.59 µs 129.64 µs 129.70 µs]
                        change: [-5.1583% -5.0872% -5.0257%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

sort f32 nulls to indices 2^12
                        time:   [93.133 µs 93.219 µs 93.321 µs]
                        change: [-4.6517% -4.5602% -4.4676%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

sort string[10] to indices 2^12
                        time:   [371.99 µs 372.25 µs 372.56 µs]
                        change: [+0.3681% +0.4194% +0.4676%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

sort string[10] nulls to indices 2^12
                        time:   [208.87 µs 208.96 µs 209.05 µs]
                        change: [-3.9053% -3.8462% -3.7828%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

sort string[10] dict to indices 2^12
                        time:   [501.59 µs 501.75 µs 501.92 µs]
                        change: [+1.2853% +1.3713% +1.4949%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

sort string[10] dict nulls to indices 2^12
                        time:   [271.88 µs 271.96 µs 272.06 µs]
                        change: [-2.3687% -2.3245% -2.2848%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 1, 2023
@@ -799,6 +799,15 @@ pub trait AsArray: private::Sealed {
self.as_list_opt().expect("list array")
}

/// Downcast this to a [`FixedSizeBinaryArray`] returning `None` if not possible
fn as_fixed_size_binary_opt(&self) -> Option<&FixedSizeBinaryArray>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait is sealed so this isn't a breaking change


/// Compare two `Array`s based on the ordering defined in [build_compare]
fn cmp_array(a: &dyn Array, b: &dyn Array) -> Ordering {
let cmp_op = build_compare(a, b).unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was getting called per array comparison, doing all this dispatch and allocation logic every time! The performance would have been catastrophic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was called as part of sorting ListArray -- so that might become substantially faster

}
},
DataType::Dictionary(_, _) => {
let value_null_first = if options.descending {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is moved into a child_rank function

) -> Result<UInt32Array, ArrowError> {
let rank = child_rank(array.values().as_ref(), options)?;
let offsets = array.value_offsets();
let mut valids = value_indices
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication in extracting these valids arrays is somewhat unfortunate, I have some ideas of how to fix it, but wanted to keep the PR size down

@@ -1980,7 +1526,7 @@ mod tests {
nulls_first: false,
}),
None,
vec![2, 3, 1, 4, 5, 0],
vec![2, 3, 1, 4, 0, 5],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies #4603 to boolean arrays which I missed by accident, yet another benefit of consolidating the code 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 and 5 here are the indexes of nulls. Makes sense

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice consolidation @tustvold -- 👍

I found the github diff hard to review, but in the editor it is quite clear that the implementations were consolidated into a single type specialized function (sort_impl) rather than manually copy/pasted.

}
DataType::Duration(TimeUnit::Second) => {
sort_primitive::<DurationSecondType, _>(values, v, n, cmp, &options, limit)
let (v, n) = partition_validity(array);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since partition_validity uses u32 for indexes, doesn't that mean this code can not sort arrays with more than 4B values? Maybe for things like LargeList this could be a problem 🤔

However, it seems like this was the case prior to this PR as well, so it probably isn't an issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, but I think it somewhat inherent to the current design. FWIW acero also only makes use of 32-bit indices

.into_iter()
.map(|index| (index, values.value(index as usize)))
.collect::<Vec<(u32, bool)>>();
sort_impl(options, &mut valids, &null_indices, limit, |a, b| a.cmp(&b)).into()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 sort_impl is very nice


/// Compare two `Array`s based on the ordering defined in [build_compare]
fn cmp_array(a: &dyn Array, b: &dyn Array) -> Ordering {
let cmp_op = build_compare(a, b).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was called as part of sorting ListArray -- so that might become substantially faster

@@ -1980,7 +1526,7 @@ mod tests {
nulls_first: false,
}),
None,
vec![2, 3, 1, 4, 5, 0],
vec![2, 3, 1, 4, 0, 5],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 and 5 here are the indexes of nulls. Makes sense

@tustvold tustvold merged commit a81da6c into apache:master Aug 4, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants