New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Investigate recent regressions in some utf8 kernel benchmarks #30040
Comments
Eduardo Ponce / @edponce:
Now, #3 should not have a noticeable impact, but #1 and #2 can because of the loop iteration and extra function call. |
Antoine Pitrou / @pitrou: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (26)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
Utf8Lower 644.618 MiB/sec 814.359 MiB/sec 26.332 {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'Utf8Lower', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 29}
Utf8Upper 647.102 MiB/sec 779.122 MiB/sec 20.402 {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'Utf8Upper', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 29}
IsAlphaNumericAscii 506.334 MiB/sec 577.621 MiB/sec 14.079 {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'IsAlphaNumericAscii', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22}
MatchLike 808.991 MiB/sec 863.400 MiB/sec 6.725 {'family_index': 7, 'per_family_instance_index': 0, 'run_name': 'MatchLike', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 38}
TrimManyUtf8 669.685 MiB/sec 714.090 MiB/sec 6.631 {'family_index': 15, 'per_family_instance_index': 0, 'run_name': 'TrimManyUtf8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 29}
BinaryJoinArrayArray 1.105 GiB/sec 1.131 GiB/sec 2.336 {'family_index': 17, 'per_family_instance_index': 0, 'run_name': 'BinaryJoinArrayArray', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6817}
BinaryJoinElementWiseArrayScalar/64 1.018 GiB/sec 1.038 GiB/sec 1.894 {'family_index': 18, 'per_family_instance_index': 2, 'run_name': 'BinaryJoinElementWiseArrayScalar/64', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 102}
BinaryJoinElementWiseArrayArray/8 754.305 MiB/sec 768.563 MiB/sec 1.890 {'family_index': 19, 'per_family_instance_index': 1, 'run_name': 'BinaryJoinElementWiseArrayArray/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 564}
MatchLikePrefix 3.605 GiB/sec 3.665 GiB/sec 1.663 {'family_index': 9, 'per_family_instance_index': 0, 'run_name': 'MatchLikePrefix', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 161}
AsciiUpper 7.936 GiB/sec 8.051 GiB/sec 1.441 {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'AsciiUpper', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 359}
MatchLikeSuffix 3.637 GiB/sec 3.678 GiB/sec 1.132 {'family_index': 10, 'per_family_instance_index': 0, 'run_name': 'MatchLikeSuffix', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 165}
TrimSingleAscii 1.233 GiB/sec 1.245 GiB/sec 0.970 {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'TrimSingleAscii', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 57}
AsciiLower 7.933 GiB/sec 8.009 GiB/sec 0.957 {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'AsciiLower', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 365}
TrimManyAscii 955.260 MiB/sec 962.251 MiB/sec 0.732 {'family_index': 6, 'per_family_instance_index': 0, 'run_name': 'TrimManyAscii', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 42}
BinaryJoinElementWiseArrayScalar/8 816.534 MiB/sec 822.348 MiB/sec 0.712 {'family_index': 18, 'per_family_instance_index': 1, 'run_name': 'BinaryJoinElementWiseArrayScalar/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 612}
BinaryJoinElementWiseArrayArray/2 591.531 MiB/sec 594.633 MiB/sec 0.525 {'family_index': 19, 'per_family_instance_index': 0, 'run_name': 'BinaryJoinElementWiseArrayArray/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1767}
MatchSubstring 577.765 MiB/sec 579.194 MiB/sec 0.247 {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'MatchSubstring', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 25}
BinaryJoinElementWiseArrayScalar/2 747.787 MiB/sec 748.576 MiB/sec 0.106 {'family_index': 18, 'per_family_instance_index': 0, 'run_name': 'BinaryJoinElementWiseArrayScalar/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2232}
IsAlphaNumericUnicode 1005.992 MiB/sec 1005.034 MiB/sec -0.095 {'family_index': 13, 'per_family_instance_index': 0, 'run_name': 'IsAlphaNumericUnicode', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 44}
BinaryJoinArrayScalar 1.238 GiB/sec 1.236 GiB/sec -0.195 {'family_index': 16, 'per_family_instance_index': 0, 'run_name': 'BinaryJoinArrayScalar', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7191}
BinaryJoinElementWiseArrayArray/128 1.225 GiB/sec 1.222 GiB/sec -0.246 {'family_index': 19, 'per_family_instance_index': 3, 'run_name': 'BinaryJoinElementWiseArrayArray/128', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 58}
BinaryJoinElementWiseArrayArray/64 1.014 GiB/sec 1.008 GiB/sec -0.594 {'family_index': 19, 'per_family_instance_index': 2, 'run_name': 'BinaryJoinElementWiseArrayArray/64', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 94}
BinaryJoinElementWiseArrayScalar/128 1.244 GiB/sec 1.232 GiB/sec -1.014 {'family_index': 18, 'per_family_instance_index': 3, 'run_name': 'BinaryJoinElementWiseArrayScalar/128', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 60}
SplitPattern 458.727 MiB/sec 451.400 MiB/sec -1.597 {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'SplitPattern', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 20}
MatchLikeSubstring 566.577 MiB/sec 548.685 MiB/sec -3.158 {'family_index': 8, 'per_family_instance_index': 0, 'run_name': 'MatchLikeSubstring', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 25}
TrimSingleUtf8 982.468 MiB/sec 942.122 MiB/sec -4.107 {'family_index': 14, 'per_family_instance_index': 0, 'run_name': 'TrimSingleUtf8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 44} |
Jonathan Keane / @jonkeane: |
Jonathan Keane / @jonkeane: |
Antoine Pitrou / @pitrou: |
Antoine Pitrou / @pitrou: |
See https://conbench.ursa.dev/benchmarks/6ccff6887e7c47148a09fe46f18c8688/
Some (on the surface) unrelated commits have caused performance for a few string kernels to plummet. We should try to replicate locally.
Reporter: David Li / @lidavidm
Related issues:
Note: This issue was originally created as ARROW-14481. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: