Skip to content

Commit 95faa02

Browse files
mrk-andreevMaxGekk
authored andcommitted
[SPARK-49490][SQL] Add benchmarks for initCap
### What changes were proposed in this pull request? Add benchmarks for all codepaths of initCap, namely, paths that call: - execBinaryICU - execBinary - execLowercase - execICU ### Why are the changes needed? Requested by jira ticket SPARK-49490. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The benchmark was tested locally by performing a manual run. ### Was this patch authored or co-authored using generative AI tooling? No Closes #48501 from mrk-andreev/SPARK-49490. Authored-by: Mark Andreev <mark.andreev@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent 3bc374d commit 95faa02

File tree

5 files changed

+324
-144
lines changed

5 files changed

+324
-144
lines changed
Lines changed: 70 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,88 @@
1-
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
2-
AMD EPYC 7763 64-Core Processor
1+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
2+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
33
collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
44
--------------------------------------------------------------------------------------------------------------------------
5-
UTF8_BINARY 1353 1357 5 0.1 13532.2 1.0X
6-
UTF8_LCASE 2601 2602 2 0.0 26008.0 1.9X
7-
UNICODE 16745 16756 16 0.0 167450.9 12.4X
8-
UNICODE_CI 16590 16627 52 0.0 165904.8 12.3X
5+
UTF8_BINARY 1193 1194 1 0.1 11929.0 1.0X
6+
UTF8_LCASE 2717 2721 6 0.0 27168.5 2.3X
7+
UNICODE 17991 17993 2 0.0 179913.6 15.1X
8+
UNICODE_CI 17837 17842 7 0.0 178369.9 15.0X
99

10-
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
11-
AMD EPYC 7763 64-Core Processor
10+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
11+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
1212
collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
1313
---------------------------------------------------------------------------------------------------------------------------
14-
UTF8_BINARY 1746 1746 0 0.1 17462.6 1.0X
15-
UTF8_LCASE 2629 2630 1 0.0 26294.8 1.5X
16-
UNICODE 16744 16744 0 0.0 167438.6 9.6X
17-
UNICODE_CI 16518 16521 4 0.0 165180.2 9.5X
14+
UTF8_BINARY 1523 1523 0 0.1 15233.9 1.0X
15+
UTF8_LCASE 2441 2441 0 0.0 24407.9 1.6X
16+
UNICODE 17875 17884 13 0.0 178749.6 11.7X
17+
UNICODE_CI 17701 17703 2 0.0 177013.8 11.6X
1818

19-
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
20-
AMD EPYC 7763 64-Core Processor
19+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
20+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
2121
collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
2222
------------------------------------------------------------------------------------------------------------------------
23-
UTF8_BINARY 2808 2808 1 0.0 28076.2 1.0X
24-
UTF8_LCASE 5409 5410 0 0.0 54093.0 1.9X
25-
UNICODE 67930 67957 38 0.0 679296.7 24.2X
26-
UNICODE_CI 56004 56005 1 0.0 560044.2 19.9X
23+
UTF8_BINARY 2660 2666 9 0.0 26601.1 1.0X
24+
UTF8_LCASE 5013 5016 3 0.0 50134.0 1.9X
25+
UNICODE 75622 75623 1 0.0 756217.3 28.4X
26+
UNICODE_CI 63036 63042 9 0.0 630360.9 23.7X
2727

28-
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
29-
AMD EPYC 7763 64-Core Processor
28+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
29+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
3030
collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
3131
------------------------------------------------------------------------------------------------------------------------
32-
UTF8_BINARY 1612 1614 2 0.1 16118.8 1.0X
33-
UTF8_LCASE 14509 14526 23 0.0 145092.7 9.0X
34-
UNICODE 308136 308631 700 0.0 3081364.6 191.2X
35-
UNICODE_CI 314612 314846 330 0.0 3146120.0 195.2X
32+
UTF8_BINARY 2121 2122 0 0.0 21214.2 1.0X
33+
UTF8_LCASE 27635 27636 1 0.0 276347.7 13.0X
34+
UNICODE 523746 524012 376 0.0 5237460.5 246.9X
35+
UNICODE_CI 520134 520227 131 0.0 5201343.3 245.2X
3636

37-
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
38-
AMD EPYC 7763 64-Core Processor
37+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
38+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
3939
collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
4040
------------------------------------------------------------------------------------------------------------------------
41-
UTF8_BINARY 1913 1914 1 0.1 19131.3 1.0X
42-
UTF8_LCASE 9785 9788 5 0.0 97847.7 5.1X
43-
UNICODE 311517 311580 89 0.0 3115167.2 162.8X
44-
UNICODE_CI 316517 316660 201 0.0 3165173.7 165.4X
41+
UTF8_BINARY 2767 2769 4 0.0 27666.3 1.0X
42+
UTF8_LCASE 26861 26861 1 0.0 268606.4 9.7X
43+
UNICODE 518540 518815 389 0.0 5185401.3 187.4X
44+
UNICODE_CI 521156 521261 148 0.0 5211559.5 188.4X
4545

46-
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
47-
AMD EPYC 7763 64-Core Processor
46+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
47+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
4848
collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
4949
------------------------------------------------------------------------------------------------------------------------
50-
UTF8_BINARY 1891 1891 0 0.1 18912.1 1.0X
51-
UTF8_LCASE 10089 10093 5 0.0 100893.6 5.3X
52-
UNICODE 336905 336931 36 0.0 3369051.8 178.1X
53-
UNICODE_CI 339944 340585 907 0.0 3399439.0 179.7X
50+
UTF8_BINARY 2919 2921 3 0.0 29190.2 1.0X
51+
UTF8_LCASE 26862 26862 1 0.0 268618.0 9.2X
52+
UNICODE 504534 504927 556 0.0 5045340.3 172.8X
53+
UNICODE_CI 506542 506565 32 0.0 5065423.0 173.5X
54+
55+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
56+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
57+
collation unit benchmarks - initCap using impl execICU: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
58+
--------------------------------------------------------------------------------------------------------------------------------------
59+
UNICODE 419 425 5 0.2 4189.2 1.0X
60+
UNICODE_CI 416 426 6 0.2 4163.2 1.0X
61+
62+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
63+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
64+
collation unit benchmarks - initCap using impl execBinaryICU: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
65+
--------------------------------------------------------------------------------------------------------------------------------------------
66+
UTF8_BINARY 575 576 0 0.2 5754.0 1.0X
67+
UTF8_LCASE 575 576 1 0.2 5747.8 1.0X
68+
UNICODE 576 576 0 0.2 5761.5 1.0X
69+
UNICODE_CI 576 578 2 0.2 5758.0 1.0X
70+
71+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
72+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
73+
collation unit benchmarks - initCap using impl execBinary: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
74+
-----------------------------------------------------------------------------------------------------------------------------------------
75+
UTF8_BINARY 159 159 1 0.6 1587.6 1.0X
76+
UTF8_LCASE 159 159 0 0.6 1586.6 1.0X
77+
UNICODE 158 159 1 0.6 1584.9 1.0X
78+
UNICODE_CI 159 160 1 0.6 1586.1 1.0X
79+
80+
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1018-aws
81+
Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
82+
collation unit benchmarks - initCap using impl execLowercase: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
83+
--------------------------------------------------------------------------------------------------------------------------------------------
84+
UTF8_BINARY 397 405 5 0.3 3974.4 1.0X
85+
UTF8_LCASE 401 405 5 0.2 4009.5 1.0X
86+
UNICODE 395 399 3 0.3 3953.9 1.0X
87+
UNICODE_CI 395 400 3 0.3 3952.0 1.0X
5488

0 commit comments

Comments
 (0)