GH-14937: [C++] String Sort / Rank Benchmarks #34811

felipecrv · 2023-03-31T00:56:42Z

This is #15041 with the last review comments addressed.

Closes: [C++] Add rank kernel benchmarks #14937

github-actions · 2023-03-31T00:57:05Z

Closes: [C++] Add rank kernel benchmarks #14937

felipecrv · 2023-03-31T01:04:36Z

@WillAyd @cyb70289] @benibus @zeroshade

felipecrv · 2023-03-31T14:51:50Z

Python issue in CI is unrelated.

wgtmac

Could you please provide some results? Just curious about the

wgtmac · 2023-03-31T15:42:35Z

cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc

Is n_chunks worth being a parameter instead of a constant value?

The other benchmarks also chunk the arrays by 10, so to keeps things comparable, we should use 10 everywhere.

wgtmac · 2023-03-31T15:44:32Z

cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc

I am not familiar with the sort function here. Does it sort null values to the first or to the end? Does the null distribution affect the benchmark result? It would be good to provide some benchmark result as well.

I pasted below. Null in variable-size binary arrays will reduce the amout of data in the data buffer, making CPU caching more effective.

cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc

westonpace · 2023-03-31T16:07:17Z

cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc

Given that you went to the trouble of making this configurable, should we have a ArraySortIndicesStringShort and ArraySortIndicesStringLong similar to how we have Narrow and Wide versions of the integer benchmarks?

I just got this from @WillAyd code in #15041 :D

But sure, I will create different versions. What would be a wide string range and a narrow string range in terms of constants?

…essed

felipecrv · 2023-03-31T20:08:40Z

@wgtmac here it is. Number might contain noise as I run a lot of stuff on my dev machine:

2023-03-31T17:06:28-03:00
Running ./release/arrow-compute-vector-sort-benchmark
Run on (20 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 24576 KiB (x1)
Load Average: 14.67, 9.38, 5.20
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------------------------------------------------
Benchmark                                                     Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------
ArraySortIndicesInt64Narrow/49152/10000                    29.7 us         29.7 us        22973 bytes_per_second=1.54182G/s items_per_second=206.94M/s null_percent=0.01 size=49.152k
ArraySortIndicesInt64Narrow/49152/100                      51.2 us         51.2 us        13009 bytes_per_second=915.044M/s items_per_second=119.937M/s null_percent=1 size=49.152k
ArraySortIndicesInt64Narrow/49152/10                       88.7 us         88.7 us         7757 bytes_per_second=528.268M/s items_per_second=69.2411M/s null_percent=10 size=49.152k
ArraySortIndicesInt64Narrow/49152/2                         137 us          137 us         5035 bytes_per_second=341.966M/s items_per_second=44.8222M/s null_percent=50 size=49.152k
ArraySortIndicesInt64Narrow/49152/1                        20.4 us         20.4 us        34265 bytes_per_second=2.24275G/s items_per_second=301.017M/s null_percent=100 size=49.152k
ArraySortIndicesInt64Narrow/49152/0                        29.3 us         29.3 us        23818 bytes_per_second=1.56083G/s items_per_second=209.491M/s null_percent=0 size=49.152k
ArraySortIndicesInt64Narrow/1048576/100                    1023 us         1023 us          692 bytes_per_second=977.378M/s items_per_second=128.107M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64Narrow/8388608/100                    9737 us         9737 us           69 bytes_per_second=821.639M/s items_per_second=107.694M/s null_percent=1 size=8.38861M
ArraySortIndicesInt64Wide/49152/10000                       576 us          576 us         1204 bytes_per_second=81.3665M/s items_per_second=10.6649M/s null_percent=0.01 size=49.152k
ArraySortIndicesInt64Wide/49152/100                         579 us          579 us         1185 bytes_per_second=80.9528M/s items_per_second=10.6106M/s null_percent=1 size=49.152k
ArraySortIndicesInt64Wide/49152/10                          547 us          547 us         1288 bytes_per_second=85.7428M/s items_per_second=11.2385M/s null_percent=10 size=49.152k
ArraySortIndicesInt64Wide/49152/2                           339 us          339 us         2050 bytes_per_second=138.155M/s items_per_second=18.1082M/s null_percent=50 size=49.152k
ArraySortIndicesInt64Wide/49152/1                          20.2 us         20.2 us        34674 bytes_per_second=2.26193G/s items_per_second=303.591M/s null_percent=100 size=49.152k
ArraySortIndicesInt64Wide/49152/0                           574 us          574 us         1218 bytes_per_second=81.6507M/s items_per_second=10.7021M/s null_percent=0 size=49.152k
ArraySortIndicesInt64Wide/1048576/100                     17807 us        17807 us           39 bytes_per_second=56.1582M/s items_per_second=7.36077M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64Wide/8388608/100                    176650 us       176640 us            4 bytes_per_second=45.2899M/s items_per_second=5.93624M/s null_percent=1 size=8.38861M
ArraySortIndicesBool/49152/10000                           1393 us         1393 us          491 bytes_per_second=33.6513M/s items_per_second=282.288M/s null_percent=0.01 size=49.152k
ArraySortIndicesBool/49152/100                             1727 us         1727 us          399 bytes_per_second=27.1404M/s items_per_second=227.67M/s null_percent=1 size=49.152k
ArraySortIndicesBool/49152/10                              2681 us         2680 us          263 bytes_per_second=17.4892M/s items_per_second=146.71M/s null_percent=10 size=49.152k
ArraySortIndicesBool/49152/2                               3924 us         3924 us          178 bytes_per_second=11.947M/s items_per_second=100.219M/s null_percent=50 size=49.152k
ArraySortIndicesBool/49152/1                                509 us          509 us         1378 bytes_per_second=92.1565M/s items_per_second=773.065M/s null_percent=100 size=49.152k
ArraySortIndicesBool/49152/0                               1191 us         1191 us          590 bytes_per_second=39.3436M/s items_per_second=330.038M/s null_percent=0 size=49.152k
ArraySortIndicesBool/1048576/100                          50282 us        50268 us           13 bytes_per_second=19.8935M/s items_per_second=166.879M/s null_percent=1 size=1048.58k
ArraySortIndicesBool/8388608/100                         406805 us       406792 us            2 bytes_per_second=19.6661M/s items_per_second=164.971M/s null_percent=1 size=8.38861M
ArraySortIndicesString/49152/10000                          541 us          541 us         1284 bytes_per_second=86.5857M/s generated_array_size=48.968k items_per_second=5.67448M/s null_percent=0.01 size=49.152k
ArraySortIndicesString/49152/100                            542 us          542 us         1284 bytes_per_second=86.5375M/s generated_array_size=48.957k items_per_second=5.72855M/s null_percent=1 size=49.152k
ArraySortIndicesString/49152/10                             560 us          560 us         1242 bytes_per_second=83.7744M/s generated_array_size=49.854k items_per_second=6.09967M/s null_percent=10 size=49.152k
ArraySortIndicesString/49152/2                              598 us          598 us         1172 bytes_per_second=78.3507M/s generated_array_size=50.222k items_per_second=10.2696M/s null_percent=50 size=49.152k
ArraySortIndicesString/49152/1                             10.6 us         10.6 us        66427 bytes_per_second=4.33839G/s generated_array_size=0 items_per_second=291.145M/s null_percent=100 size=49.152k
ArraySortIndicesString/49152/0                              541 us          541 us         1291 bytes_per_second=86.6558M/s generated_array_size=48.968k items_per_second=5.67908M/s null_percent=0 size=49.152k
ArraySortIndicesString/1048576/100                        16768 us        16768 us           42 bytes_per_second=59.6388M/s generated_array_size=1047.78k items_per_second=3.94797M/s null_percent=1 size=1048.58k
ArraySortIndicesString/8388608/100                       164434 us       164419 us            4 bytes_per_second=48.6561M/s generated_array_size=8.39039M items_per_second=3.22094M/s null_percent=1 size=8.38861M
ChunkedArraySortIndicesInt64Narrow/49152/10000             1022 us         1022 us          684 bytes_per_second=45.8473M/s items_per_second=6.00538M/s null_percent=0.01 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/100               1050 us         1050 us          660 bytes_per_second=44.6223M/s items_per_second=5.84492M/s null_percent=1 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/10                 997 us          997 us          726 bytes_per_second=46.996M/s items_per_second=6.15586M/s null_percent=10 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/2                  595 us          595 us         1173 bytes_per_second=78.728M/s items_per_second=10.3123M/s null_percent=50 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/1                 19.8 us         19.8 us        35207 bytes_per_second=2.31382G/s items_per_second=310.353M/s null_percent=100 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/0                 1023 us         1023 us          677 bytes_per_second=45.8237M/s items_per_second=6.00229M/s null_percent=0 size=49.152k
ChunkedArraySortIndicesInt64Narrow/1048576/100            12184 us        12184 us           57 bytes_per_second=82.0732M/s items_per_second=10.7573M/s null_percent=1 size=1048.58k
ChunkedArraySortIndicesInt64Narrow/8388608/100           102996 us       102992 us            7 bytes_per_second=77.6756M/s items_per_second=10.181M/s null_percent=1 size=8.38861M
ChunkedArraySortIndicesInt64Wide/49152/10000               1190 us         1190 us          588 bytes_per_second=39.404M/s items_per_second=5.1614M/s null_percent=0.01 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/100                 1213 us         1213 us          575 bytes_per_second=38.6591M/s items_per_second=5.06383M/s null_percent=1 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/10                  1115 us         1115 us          631 bytes_per_second=42.0403M/s items_per_second=5.50672M/s null_percent=10 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/2                    657 us          657 us         1064 bytes_per_second=71.3973M/s items_per_second=9.3521M/s null_percent=50 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/1                   20.0 us         20.0 us        35122 bytes_per_second=2.28723G/s items_per_second=306.787M/s null_percent=100 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/0                   1218 us         1218 us          578 bytes_per_second=38.4828M/s items_per_second=5.04073M/s null_percent=0 size=49.152k
ChunkedArraySortIndicesInt64Wide/1048576/100              32409 us        32404 us           22 bytes_per_second=30.8605M/s items_per_second=4.04488M/s null_percent=1 size=1048.58k
ChunkedArraySortIndicesInt64Wide/8388608/100             331134 us       331123 us            2 bytes_per_second=24.1602M/s items_per_second=3.16671M/s null_percent=1 size=8.38861M
ChunkedArraySortIndicesString/49152/10000                   784 us          784 us          887 bytes_per_second=59.775M/s generated_array_size=48.449k items_per_second=3.91487M/s null_percent=0.01 size=49.152k
ChunkedArraySortIndicesString/49152/100                     805 us          805 us          881 bytes_per_second=58.2652M/s generated_array_size=48.34k items_per_second=3.85327M/s null_percent=1 size=49.152k
ChunkedArraySortIndicesString/49152/10                      793 us          793 us          880 bytes_per_second=59.1309M/s generated_array_size=48.389k items_per_second=4.30158M/s null_percent=10 size=49.152k
ChunkedArraySortIndicesString/49152/2                       849 us          849 us          813 bytes_per_second=55.1923M/s generated_array_size=49.338k items_per_second=7.22945M/s null_percent=50 size=49.152k
ChunkedArraySortIndicesString/49152/1                      11.8 us         11.8 us        59466 bytes_per_second=3.87754G/s generated_array_size=0 items_per_second=260.048M/s null_percent=100 size=49.152k
ChunkedArraySortIndicesString/49152/0                       771 us          771 us          901 bytes_per_second=60.7886M/s generated_array_size=48.449k items_per_second=3.98125M/s null_percent=0 size=49.152k
ChunkedArraySortIndicesString/1048576/100                 21498 us        21498 us           33 bytes_per_second=46.5166M/s generated_array_size=1050.85k items_per_second=3.0794M/s null_percent=1 size=1048.58k
ChunkedArraySortIndicesString/8388608/100                206858 us       206850 us            3 bytes_per_second=38.6754M/s generated_array_size=8.38237M items_per_second=2.56021M/s null_percent=1 size=8.38861M

zeroshade

this all LGTM, i think the other comments already addressed question i would have asked.

felipecrv · 2023-04-11T23:37:51Z

@zeroshade I added the Narrow and Wide versions based on @westonpace's request above. I picked 16 chars for narrow and 64 chars for the wide. The CI failure is unrelated, so feel free to merge if the last commit is OK.

WillAyd · 2023-04-17T18:24:31Z

Thanks @felipecrv for seeing this through

ursabot · 2023-04-19T09:23:24Z

Benchmark runs are scheduled for baseline = 8c91434 and contender = a09201b. a09201b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️1.28% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.24% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] a09201b5 ec2-t3-xlarge-us-east-2
[Failed] a09201b5 test-mac-arm
[Finished] a09201b5 ursa-i9-9960x
[Finished] a09201b5 ursa-thinkcentre-m75q
[Finished] 8c914343 ec2-t3-xlarge-us-east-2
[Failed] 8c914343 test-mac-arm
[Finished] 8c914343 ursa-i9-9960x
[Finished] 8c914343 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2023-04-19T09:24:07Z

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

This is apache#15041 with the last review comments addressed. * Closes: apache#14937 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Will Ayd <william.ayd@icloud.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>

github-actions bot added Component: C++ awaiting review Awaiting review labels Mar 31, 2023

felipecrv force-pushed the string-sort-benchmark branch from 4cf771b to 6d85624 Compare March 31, 2023 01:00

felipecrv marked this pull request as ready for review March 31, 2023 13:32

felipecrv requested a review from westonpace as a code owner March 31, 2023 13:32

felipecrv force-pushed the string-sort-benchmark branch from 7722db1 to 2caf530 Compare March 31, 2023 13:37

wgtmac reviewed Mar 31, 2023

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 31, 2023

westonpace reviewed Mar 31, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 31, 2023

WillAyd and others added 4 commits March 31, 2023 16:46

Added String Benchmarks for Sorting / Ranking

efe6b8c

Consider null proportions and don't count offsets/nulls as bytes proc…

aa34a0a

…essed

Remove generated_array_size counter

930184b

Fix casts

f02c06d

felipecrv force-pushed the string-sort-benchmark branch from 2caf530 to f02c06d Compare March 31, 2023 20:13

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 31, 2023

zeroshade approved these changes Apr 4, 2023

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 4, 2023

Add Narrow and Wide version of the new string benchmarks

8e9f554

felipecrv mentioned this pull request Apr 13, 2023

GH-14937: [C++] String Sort / Rank Benchmarks #15041

Closed

zeroshade merged commit a09201b into apache:main Apr 17, 2023

felipecrv deleted the string-sort-benchmark branch April 17, 2023 18:20

GH-14937: [C++] String Sort / Rank Benchmarks #34811

GH-14937: [C++] String Sort / Rank Benchmarks #34811

Uh oh!

Conversation

felipecrv commented Mar 31, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2023

Uh oh!

felipecrv commented Mar 31, 2023

Uh oh!

felipecrv commented Mar 31, 2023

Uh oh!

wgtmac left a comment

Choose a reason for hiding this comment

Uh oh!

wgtmac Mar 31, 2023

Choose a reason for hiding this comment

Uh oh!

felipecrv Mar 31, 2023

Choose a reason for hiding this comment

Uh oh!

wgtmac Mar 31, 2023

Choose a reason for hiding this comment

Uh oh!

felipecrv Mar 31, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

westonpace Mar 31, 2023

Choose a reason for hiding this comment

Uh oh!

felipecrv Mar 31, 2023

Choose a reason for hiding this comment

Uh oh!

felipecrv commented Mar 31, 2023

Uh oh!

zeroshade left a comment

Choose a reason for hiding this comment

Uh oh!

felipecrv commented Apr 11, 2023

Uh oh!

WillAyd commented Apr 17, 2023

Uh oh!

ursabot commented Apr 19, 2023

Uh oh!

ursabot commented Apr 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

felipecrv commented Mar 31, 2023 •

edited by github-actions bot

Loading