Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-29887: [C++] Implement dictionary array sorting #35280

Merged
merged 11 commits into from
May 22, 2023

Conversation

benibus
Copy link
Collaborator

@benibus benibus commented Apr 21, 2023

Rationale for this change

Sorting for DictionaryArrays is not currently supported.

What changes are included in this PR?

  • Adds support for dictionaries in the array_sort_indices kernel
  • Adds tests and benchmarks
  • Alters the internal ArraySortFunc definition to return an error status and accept the caller's ExecContext as an argument

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

Notes

This picks up where #13334 left off. Those commits have been squashed and included in this PR.

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #29887 has been automatically assigned in GitHub to PR creator.

@benibus benibus force-pushed the GH-29887-dict-sorting branch 2 times, most recently from 94181ab to e347db5 Compare April 24, 2023 20:38
@benibus benibus marked this pull request as ready for review April 24, 2023 21:21
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions but looks good overall

cpp/src/arrow/compute/kernels/vector_array_sort.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_array_sort.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_sort_test.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels May 1, 2023
@benibus benibus force-pushed the GH-29887-dict-sorting branch 2 times, most recently from c81b0d4 to bfd81c0 Compare May 4, 2023 17:19
@benibus
Copy link
Collaborator Author

benibus commented May 4, 2023

I did some refactoring/reorganizing of the benchmarks so the new ones should hopefully make more sense. The original intent should still be intact (although I changed some of the string length parameters to be consistent with the existing non-dictionary string benchmarks).

@benibus benibus requested a review from westonpace May 4, 2023 17:38
@pitrou pitrou self-requested a review May 16, 2023 16:30
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat, thanks. A couple comments and questions below.

cpp/src/arrow/compute/kernels/vector_array_sort.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_sort_internal.h Outdated Show resolved Hide resolved
@pitrou
Copy link
Member

pitrou commented May 16, 2023

Just for the record, it's a pity that the all-or-mostly-null case is a bit slower:

ArraySortIndicesInt64WideDict/32768/10000         254 us          254 us         2756 bytes_per_second=122.985M/s items_per_second=16.1199M/s null_percent=0.01 size=32.768k
ArraySortIndicesInt64WideDict/32768/100           272 us          272 us         2615 bytes_per_second=115.005M/s items_per_second=15.0739M/s null_percent=1 size=32.768k
ArraySortIndicesInt64WideDict/32768/10            324 us          324 us         2162 bytes_per_second=96.5235M/s items_per_second=12.6515M/s null_percent=10 size=32.768k
ArraySortIndicesInt64WideDict/32768/2             386 us          386 us         1811 bytes_per_second=80.9335M/s items_per_second=10.6081M/s null_percent=50 size=32.768k
ArraySortIndicesInt64WideDict/32768/1             357 us          357 us         1956 bytes_per_second=87.483M/s items_per_second=11.4666M/s null_percent=100 size=32.768k
ArraySortIndicesInt64WideDict/32768/0             254 us          254 us         2751 bytes_per_second=122.957M/s items_per_second=16.1163M/s null_percent=0 size=32.768k
ArraySortIndicesInt64WideDict/1048576/100        3398 us         3396 us          205 bytes_per_second=294.442M/s items_per_second=38.593M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64WideDict/8388608/100       31369 us        31341 us           22 bytes_per_second=255.253M/s items_per_second=33.4566M/s null_percent=1 size=8.38861M

This can be left to another issue or PR, though. Overall the improvement is very nice, especially for strings.

@benibus benibus force-pushed the GH-29887-dict-sorting branch 2 times, most recently from 502fe15 to 7d1c591 Compare May 18, 2023 21:12
@benibus benibus requested a review from pitrou May 18, 2023 22:30
@benibus
Copy link
Collaborator Author

benibus commented May 18, 2023

After some ad-hoc testing, the benchmark' array lengths/sizes should be essentially the same now between the dict and non-dict versions. For strings, the dict versions slightly overshoot the others (in terms of bytes) due to some null overlap between the index/value arrays.

I also added some special-casing for the 100% null benchmarks - primarily to ensure that the input DictionaryArray was actually 0 bytes (like the StringArray equivalents). Caveat: it only tests the best-case scenario - i.e. where the index and/or value array is entirely null.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @benibus . Looks mostly good to me, just one more suggestion. Also, can you rebase?

ArianaVillegas and others added 4 commits May 22, 2023 11:29
Draft sort indices on array dictionary

Update sort indices

Update sorter

Address feedback

Address feedback

Add tests and benchmarks

Some nits re:benchmarking

Some nits

Use the faster algorithm (rank-then-take-then-sort)

Remove unused code

More TODOs
Alters ArraySortFunc to allow for error propagation and passing through
the caller's ExecContext to use in additional kernels
@pitrou
Copy link
Member

pitrou commented May 22, 2023

I will note that the string benchmark numbers are lower with the latest benchmarking changes (which reduce the actual string array length), but that reflects the non-trivial fixed costs in the sorting routine:

ArraySortIndicesStringWide/32768/10000            585 us          584 us         1202 bytes_per_second=53.4765M/s items_per_second=1.75232M/s null_percent=0.01 size=32.768k
ArraySortIndicesStringWide/32768/100              606 us          605 us         1169 bytes_per_second=51.627M/s items_per_second=1.70824M/s null_percent=1 size=32.768k
ArraySortIndicesStringWide/32768/10               610 us          610 us         1145 bytes_per_second=51.2671M/s items_per_second=1.86694M/s null_percent=10 size=32.768k
ArraySortIndicesStringWide/32768/2                653 us          652 us         1072 bytes_per_second=47.8979M/s items_per_second=3.13904M/s null_percent=50 size=32.768k
ArraySortIndicesStringWide/32768/1               79.4 us         79.4 us         8561 bytes_per_second=393.738M/s items_per_second=12.902M/s null_percent=100 size=32.768k
ArraySortIndicesStringWide/32768/0                580 us          579 us         1203 bytes_per_second=53.9836M/s items_per_second=1.76893M/s null_percent=0 size=32.768k
ArraySortIndicesStringWide/1048576/100          25200 us        25175 us           28 bytes_per_second=39.7218M/s items_per_second=1.31475M/s null_percent=1 size=1048.58k
ArraySortIndicesStringWide/8388608/100         239121 us       238851 us            3 bytes_per_second=33.4937M/s items_per_second=1.10861M/s null_percent=1 size=8.38861M
ArraySortIndicesStringWideDict/32768/10000        190 us          190 us         3683 bytes_per_second=164.852M/s items_per_second=5.40187M/s null_percent=0.01 size=32.768k
ArraySortIndicesStringWideDict/32768/100          195 us          195 us         3591 bytes_per_second=160.284M/s items_per_second=5.30347M/s null_percent=1 size=32.768k
ArraySortIndicesStringWideDict/32768/10           216 us          216 us         3244 bytes_per_second=144.701M/s items_per_second=5.26945M/s null_percent=10 size=32.768k
ArraySortIndicesStringWideDict/32768/2            278 us          278 us         2504 bytes_per_second=112.403M/s items_per_second=7.36645M/s null_percent=50 size=32.768k
ArraySortIndicesStringWideDict/32768/1            108 us          108 us         6456 bytes_per_second=288.614M/s items_per_second=9.45732M/s null_percent=100 size=32.768k
ArraySortIndicesStringWideDict/32768/0            190 us          190 us         3676 bytes_per_second=164.204M/s items_per_second=5.38062M/s null_percent=0 size=32.768k
ArraySortIndicesStringWideDict/1048576/100       1005 us         1005 us          695 bytes_per_second=995.211M/s items_per_second=32.9405M/s null_percent=1 size=1048.58k
ArraySortIndicesStringWideDict/8388608/100       6826 us         6823 us          102 bytes_per_second=1.14501G/s items_per_second=38.8083M/s null_percent=1 size=8.38861M

@pitrou pitrou merged commit 6ceb12f into apache:main May 22, 2023
33 checks passed
@ursabot
Copy link

ursabot commented May 30, 2023

Benchmark runs are scheduled for baseline = f3500f6 and contender = 6ceb12f. 6ceb12f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.56% ⬆️0.36%] test-mac-arm
[Finished ⬇️2.94% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.42% ⬆️0.24%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 6ceb12f7 ec2-t3-xlarge-us-east-2
[Finished] 6ceb12f7 test-mac-arm
[Finished] 6ceb12f7 ursa-i9-9960x
[Finished] 6ceb12f7 ursa-thinkcentre-m75q
[Finished] f3500f65 ec2-t3-xlarge-us-east-2
[Finished] f3500f65 test-mac-arm
[Finished] f3500f65 ursa-i9-9960x
[Finished] f3500f65 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented May 31, 2023

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Sorting dictionary array not implemented
6 participants