Skip to content

Conversation

@felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Mar 31, 2023

This is #15041 with the last review comments addressed.

@github-actions
Copy link

@felipecrv
Copy link
Contributor Author

@felipecrv felipecrv marked this pull request as ready for review March 31, 2023 13:32
@felipecrv felipecrv requested a review from westonpace as a code owner March 31, 2023 13:32
@felipecrv felipecrv force-pushed the string-sort-benchmark branch from 7722db1 to 2caf530 Compare March 31, 2023 13:37
@felipecrv
Copy link
Contributor Author

Python issue in CI is unrelated.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide some results? Just curious about the

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is n_chunks worth being a parameter instead of a constant value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other benchmarks also chunk the arrays by 10, so to keeps things comparable, we should use 10 everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the sort function here. Does it sort null values to the first or to the end? Does the null distribution affect the benchmark result? It would be good to provide some benchmark result as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pasted below. Null in variable-size binary arrays will reduce the amout of data in the data buffer, making CPU caching more effective.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 31, 2023
Comment on lines +226 to +219
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you went to the trouble of making this configurable, should we have a ArraySortIndicesStringShort and ArraySortIndicesStringLong similar to how we have Narrow and Wide versions of the integer benchmarks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got this from @WillAyd code in #15041 :D

But sure, I will create different versions. What would be a wide string range and a narrow string range in terms of constants?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 31, 2023
@felipecrv
Copy link
Contributor Author

@wgtmac here it is. Number might contain noise as I run a lot of stuff on my dev machine:

2023-03-31T17:06:28-03:00
Running ./release/arrow-compute-vector-sort-benchmark
Run on (20 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 24576 KiB (x1)
Load Average: 14.67, 9.38, 5.20
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------------------------------------------------
Benchmark                                                     Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------
ArraySortIndicesInt64Narrow/49152/10000                    29.7 us         29.7 us        22973 bytes_per_second=1.54182G/s items_per_second=206.94M/s null_percent=0.01 size=49.152k
ArraySortIndicesInt64Narrow/49152/100                      51.2 us         51.2 us        13009 bytes_per_second=915.044M/s items_per_second=119.937M/s null_percent=1 size=49.152k
ArraySortIndicesInt64Narrow/49152/10                       88.7 us         88.7 us         7757 bytes_per_second=528.268M/s items_per_second=69.2411M/s null_percent=10 size=49.152k
ArraySortIndicesInt64Narrow/49152/2                         137 us          137 us         5035 bytes_per_second=341.966M/s items_per_second=44.8222M/s null_percent=50 size=49.152k
ArraySortIndicesInt64Narrow/49152/1                        20.4 us         20.4 us        34265 bytes_per_second=2.24275G/s items_per_second=301.017M/s null_percent=100 size=49.152k
ArraySortIndicesInt64Narrow/49152/0                        29.3 us         29.3 us        23818 bytes_per_second=1.56083G/s items_per_second=209.491M/s null_percent=0 size=49.152k
ArraySortIndicesInt64Narrow/1048576/100                    1023 us         1023 us          692 bytes_per_second=977.378M/s items_per_second=128.107M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64Narrow/8388608/100                    9737 us         9737 us           69 bytes_per_second=821.639M/s items_per_second=107.694M/s null_percent=1 size=8.38861M
ArraySortIndicesInt64Wide/49152/10000                       576 us          576 us         1204 bytes_per_second=81.3665M/s items_per_second=10.6649M/s null_percent=0.01 size=49.152k
ArraySortIndicesInt64Wide/49152/100                         579 us          579 us         1185 bytes_per_second=80.9528M/s items_per_second=10.6106M/s null_percent=1 size=49.152k
ArraySortIndicesInt64Wide/49152/10                          547 us          547 us         1288 bytes_per_second=85.7428M/s items_per_second=11.2385M/s null_percent=10 size=49.152k
ArraySortIndicesInt64Wide/49152/2                           339 us          339 us         2050 bytes_per_second=138.155M/s items_per_second=18.1082M/s null_percent=50 size=49.152k
ArraySortIndicesInt64Wide/49152/1                          20.2 us         20.2 us        34674 bytes_per_second=2.26193G/s items_per_second=303.591M/s null_percent=100 size=49.152k
ArraySortIndicesInt64Wide/49152/0                           574 us          574 us         1218 bytes_per_second=81.6507M/s items_per_second=10.7021M/s null_percent=0 size=49.152k
ArraySortIndicesInt64Wide/1048576/100                     17807 us        17807 us           39 bytes_per_second=56.1582M/s items_per_second=7.36077M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64Wide/8388608/100                    176650 us       176640 us            4 bytes_per_second=45.2899M/s items_per_second=5.93624M/s null_percent=1 size=8.38861M
ArraySortIndicesBool/49152/10000                           1393 us         1393 us          491 bytes_per_second=33.6513M/s items_per_second=282.288M/s null_percent=0.01 size=49.152k
ArraySortIndicesBool/49152/100                             1727 us         1727 us          399 bytes_per_second=27.1404M/s items_per_second=227.67M/s null_percent=1 size=49.152k
ArraySortIndicesBool/49152/10                              2681 us         2680 us          263 bytes_per_second=17.4892M/s items_per_second=146.71M/s null_percent=10 size=49.152k
ArraySortIndicesBool/49152/2                               3924 us         3924 us          178 bytes_per_second=11.947M/s items_per_second=100.219M/s null_percent=50 size=49.152k
ArraySortIndicesBool/49152/1                                509 us          509 us         1378 bytes_per_second=92.1565M/s items_per_second=773.065M/s null_percent=100 size=49.152k
ArraySortIndicesBool/49152/0                               1191 us         1191 us          590 bytes_per_second=39.3436M/s items_per_second=330.038M/s null_percent=0 size=49.152k
ArraySortIndicesBool/1048576/100                          50282 us        50268 us           13 bytes_per_second=19.8935M/s items_per_second=166.879M/s null_percent=1 size=1048.58k
ArraySortIndicesBool/8388608/100                         406805 us       406792 us            2 bytes_per_second=19.6661M/s items_per_second=164.971M/s null_percent=1 size=8.38861M
ArraySortIndicesString/49152/10000                          541 us          541 us         1284 bytes_per_second=86.5857M/s generated_array_size=48.968k items_per_second=5.67448M/s null_percent=0.01 size=49.152k
ArraySortIndicesString/49152/100                            542 us          542 us         1284 bytes_per_second=86.5375M/s generated_array_size=48.957k items_per_second=5.72855M/s null_percent=1 size=49.152k
ArraySortIndicesString/49152/10                             560 us          560 us         1242 bytes_per_second=83.7744M/s generated_array_size=49.854k items_per_second=6.09967M/s null_percent=10 size=49.152k
ArraySortIndicesString/49152/2                              598 us          598 us         1172 bytes_per_second=78.3507M/s generated_array_size=50.222k items_per_second=10.2696M/s null_percent=50 size=49.152k
ArraySortIndicesString/49152/1                             10.6 us         10.6 us        66427 bytes_per_second=4.33839G/s generated_array_size=0 items_per_second=291.145M/s null_percent=100 size=49.152k
ArraySortIndicesString/49152/0                              541 us          541 us         1291 bytes_per_second=86.6558M/s generated_array_size=48.968k items_per_second=5.67908M/s null_percent=0 size=49.152k
ArraySortIndicesString/1048576/100                        16768 us        16768 us           42 bytes_per_second=59.6388M/s generated_array_size=1047.78k items_per_second=3.94797M/s null_percent=1 size=1048.58k
ArraySortIndicesString/8388608/100                       164434 us       164419 us            4 bytes_per_second=48.6561M/s generated_array_size=8.39039M items_per_second=3.22094M/s null_percent=1 size=8.38861M
ChunkedArraySortIndicesInt64Narrow/49152/10000             1022 us         1022 us          684 bytes_per_second=45.8473M/s items_per_second=6.00538M/s null_percent=0.01 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/100               1050 us         1050 us          660 bytes_per_second=44.6223M/s items_per_second=5.84492M/s null_percent=1 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/10                 997 us          997 us          726 bytes_per_second=46.996M/s items_per_second=6.15586M/s null_percent=10 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/2                  595 us          595 us         1173 bytes_per_second=78.728M/s items_per_second=10.3123M/s null_percent=50 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/1                 19.8 us         19.8 us        35207 bytes_per_second=2.31382G/s items_per_second=310.353M/s null_percent=100 size=49.152k
ChunkedArraySortIndicesInt64Narrow/49152/0                 1023 us         1023 us          677 bytes_per_second=45.8237M/s items_per_second=6.00229M/s null_percent=0 size=49.152k
ChunkedArraySortIndicesInt64Narrow/1048576/100            12184 us        12184 us           57 bytes_per_second=82.0732M/s items_per_second=10.7573M/s null_percent=1 size=1048.58k
ChunkedArraySortIndicesInt64Narrow/8388608/100           102996 us       102992 us            7 bytes_per_second=77.6756M/s items_per_second=10.181M/s null_percent=1 size=8.38861M
ChunkedArraySortIndicesInt64Wide/49152/10000               1190 us         1190 us          588 bytes_per_second=39.404M/s items_per_second=5.1614M/s null_percent=0.01 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/100                 1213 us         1213 us          575 bytes_per_second=38.6591M/s items_per_second=5.06383M/s null_percent=1 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/10                  1115 us         1115 us          631 bytes_per_second=42.0403M/s items_per_second=5.50672M/s null_percent=10 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/2                    657 us          657 us         1064 bytes_per_second=71.3973M/s items_per_second=9.3521M/s null_percent=50 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/1                   20.0 us         20.0 us        35122 bytes_per_second=2.28723G/s items_per_second=306.787M/s null_percent=100 size=49.152k
ChunkedArraySortIndicesInt64Wide/49152/0                   1218 us         1218 us          578 bytes_per_second=38.4828M/s items_per_second=5.04073M/s null_percent=0 size=49.152k
ChunkedArraySortIndicesInt64Wide/1048576/100              32409 us        32404 us           22 bytes_per_second=30.8605M/s items_per_second=4.04488M/s null_percent=1 size=1048.58k
ChunkedArraySortIndicesInt64Wide/8388608/100             331134 us       331123 us            2 bytes_per_second=24.1602M/s items_per_second=3.16671M/s null_percent=1 size=8.38861M
ChunkedArraySortIndicesString/49152/10000                   784 us          784 us          887 bytes_per_second=59.775M/s generated_array_size=48.449k items_per_second=3.91487M/s null_percent=0.01 size=49.152k
ChunkedArraySortIndicesString/49152/100                     805 us          805 us          881 bytes_per_second=58.2652M/s generated_array_size=48.34k items_per_second=3.85327M/s null_percent=1 size=49.152k
ChunkedArraySortIndicesString/49152/10                      793 us          793 us          880 bytes_per_second=59.1309M/s generated_array_size=48.389k items_per_second=4.30158M/s null_percent=10 size=49.152k
ChunkedArraySortIndicesString/49152/2                       849 us          849 us          813 bytes_per_second=55.1923M/s generated_array_size=49.338k items_per_second=7.22945M/s null_percent=50 size=49.152k
ChunkedArraySortIndicesString/49152/1                      11.8 us         11.8 us        59466 bytes_per_second=3.87754G/s generated_array_size=0 items_per_second=260.048M/s null_percent=100 size=49.152k
ChunkedArraySortIndicesString/49152/0                       771 us          771 us          901 bytes_per_second=60.7886M/s generated_array_size=48.449k items_per_second=3.98125M/s null_percent=0 size=49.152k
ChunkedArraySortIndicesString/1048576/100                 21498 us        21498 us           33 bytes_per_second=46.5166M/s generated_array_size=1050.85k items_per_second=3.0794M/s null_percent=1 size=1048.58k
ChunkedArraySortIndicesString/8388608/100                206858 us       206850 us            3 bytes_per_second=38.6754M/s generated_array_size=8.38237M items_per_second=2.56021M/s null_percent=1 size=8.38861M

@felipecrv felipecrv force-pushed the string-sort-benchmark branch from 2caf530 to f02c06d Compare March 31, 2023 20:13
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 31, 2023
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all LGTM, i think the other comments already addressed question i would have asked.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 4, 2023
@felipecrv
Copy link
Contributor Author

@zeroshade I added the Narrow and Wide versions based on @westonpace's request above. I picked 16 chars for narrow and 64 chars for the wide. The CI failure is unrelated, so feel free to merge if the last commit is OK.

@felipecrv felipecrv deleted the string-sort-benchmark branch April 17, 2023 18:20
@WillAyd
Copy link
Contributor

WillAyd commented Apr 17, 2023

Thanks @felipecrv for seeing this through

@ursabot
Copy link

ursabot commented Apr 19, 2023

Benchmark runs are scheduled for baseline = 8c91434 and contender = a09201b. a09201b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️1.28% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.24% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] a09201b5 ec2-t3-xlarge-us-east-2
[Failed] a09201b5 test-mac-arm
[Finished] a09201b5 ursa-i9-9960x
[Finished] a09201b5 ursa-thinkcentre-m75q
[Finished] 8c914343 ec2-t3-xlarge-us-east-2
[Failed] 8c914343 test-mac-arm
[Finished] 8c914343 ursa-i9-9960x
[Finished] 8c914343 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Apr 19, 2023

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
This is apache#15041 with the last review comments addressed.
* Closes: apache#14937

Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Will Ayd <william.ayd@icloud.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
This is apache#15041 with the last review comments addressed.
* Closes: apache#14937

Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Will Ayd <william.ayd@icloud.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Add rank kernel benchmarks

6 participants