[C++] Investigate recent regressions in some utf8 kernel benchmarks #30040

asfimport · 2021-10-26T22:09:28Z

See https://conbench.ursa.dev/benchmarks/6ccff6887e7c47148a09fe46f18c8688/

Some (on the surface) unrelated commits have caused performance for a few string kernels to plummet. We should try to replicate locally.

Reporter: David Li / @lidavidm

Related issues:

[C++] Mixed support for binary types in regex functions (Discovered while testing)

_{Note: This issue was originally created as ARROW-14481. Please see the migration documentation for further details.}

asfimport · 2021-10-26T23:00:33Z

Eduardo Ponce / @edponce:
Looking more carefully ARROW-13879 does modifies code used by unary string transform and predicate functions:

Kernel registration was slightly modified here for string UTF8 transforms which converts 2 explicit statements into a for-loop along with a function call to a generator dispatcher.
A similar change is here for the unary string predicates.
Also, string predicates exec functor was modified here to provide the predicate as a template parameter instead of as a function parameter.

Now, #3 should not have a noticeable impact, but #1 and #2 can because of the loop iteration and extra function call.
Nevertheless, these are code paths that are not critical and should only be called once.

asfimport · 2021-11-03T18:08:55Z

Antoine Pitrou / @pitrou:
I do not see any regression locally between git master and a random commit from October 21st.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (26)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                           benchmark         baseline        contender  change %                                                                                                                                                                          counters
                           Utf8Lower  644.618 MiB/sec  814.359 MiB/sec    26.332                            {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'Utf8Lower', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 29}
                           Utf8Upper  647.102 MiB/sec  779.122 MiB/sec    20.402                            {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'Utf8Upper', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 29}
                 IsAlphaNumericAscii  506.334 MiB/sec  577.621 MiB/sec    14.079                   {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'IsAlphaNumericAscii', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22}
                           MatchLike  808.991 MiB/sec  863.400 MiB/sec     6.725                             {'family_index': 7, 'per_family_instance_index': 0, 'run_name': 'MatchLike', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 38}
                        TrimManyUtf8  669.685 MiB/sec  714.090 MiB/sec     6.631                         {'family_index': 15, 'per_family_instance_index': 0, 'run_name': 'TrimManyUtf8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 29}
                BinaryJoinArrayArray    1.105 GiB/sec    1.131 GiB/sec     2.336               {'family_index': 17, 'per_family_instance_index': 0, 'run_name': 'BinaryJoinArrayArray', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6817}
 BinaryJoinElementWiseArrayScalar/64    1.018 GiB/sec    1.038 GiB/sec     1.894 {'family_index': 18, 'per_family_instance_index': 2, 'run_name': 'BinaryJoinElementWiseArrayScalar/64', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 102}
   BinaryJoinElementWiseArrayArray/8  754.305 MiB/sec  768.563 MiB/sec     1.890   {'family_index': 19, 'per_family_instance_index': 1, 'run_name': 'BinaryJoinElementWiseArrayArray/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 564}
                     MatchLikePrefix    3.605 GiB/sec    3.665 GiB/sec     1.663                      {'family_index': 9, 'per_family_instance_index': 0, 'run_name': 'MatchLikePrefix', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 161}
                          AsciiUpper    7.936 GiB/sec    8.051 GiB/sec     1.441                           {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'AsciiUpper', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 359}
                     MatchLikeSuffix    3.637 GiB/sec    3.678 GiB/sec     1.132                     {'family_index': 10, 'per_family_instance_index': 0, 'run_name': 'MatchLikeSuffix', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 165}
                     TrimSingleAscii    1.233 GiB/sec    1.245 GiB/sec     0.970                       {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'TrimSingleAscii', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 57}
                          AsciiLower    7.933 GiB/sec    8.009 GiB/sec     0.957                           {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'AsciiLower', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 365}
                       TrimManyAscii  955.260 MiB/sec  962.251 MiB/sec     0.732                         {'family_index': 6, 'per_family_instance_index': 0, 'run_name': 'TrimManyAscii', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 42}
  BinaryJoinElementWiseArrayScalar/8  816.534 MiB/sec  822.348 MiB/sec     0.712  {'family_index': 18, 'per_family_instance_index': 1, 'run_name': 'BinaryJoinElementWiseArrayScalar/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 612}
   BinaryJoinElementWiseArrayArray/2  591.531 MiB/sec  594.633 MiB/sec     0.525  {'family_index': 19, 'per_family_instance_index': 0, 'run_name': 'BinaryJoinElementWiseArrayArray/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1767}
                      MatchSubstring  577.765 MiB/sec  579.194 MiB/sec     0.247                        {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'MatchSubstring', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 25}
  BinaryJoinElementWiseArrayScalar/2  747.787 MiB/sec  748.576 MiB/sec     0.106 {'family_index': 18, 'per_family_instance_index': 0, 'run_name': 'BinaryJoinElementWiseArrayScalar/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2232}
               IsAlphaNumericUnicode 1005.992 MiB/sec 1005.034 MiB/sec    -0.095                {'family_index': 13, 'per_family_instance_index': 0, 'run_name': 'IsAlphaNumericUnicode', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 44}
               BinaryJoinArrayScalar    1.238 GiB/sec    1.236 GiB/sec    -0.195              {'family_index': 16, 'per_family_instance_index': 0, 'run_name': 'BinaryJoinArrayScalar', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7191}
 BinaryJoinElementWiseArrayArray/128    1.225 GiB/sec    1.222 GiB/sec    -0.246  {'family_index': 19, 'per_family_instance_index': 3, 'run_name': 'BinaryJoinElementWiseArrayArray/128', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 58}
  BinaryJoinElementWiseArrayArray/64    1.014 GiB/sec    1.008 GiB/sec    -0.594   {'family_index': 19, 'per_family_instance_index': 2, 'run_name': 'BinaryJoinElementWiseArrayArray/64', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 94}
BinaryJoinElementWiseArrayScalar/128    1.244 GiB/sec    1.232 GiB/sec    -1.014 {'family_index': 18, 'per_family_instance_index': 3, 'run_name': 'BinaryJoinElementWiseArrayScalar/128', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 60}
                        SplitPattern  458.727 MiB/sec  451.400 MiB/sec    -1.597                          {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'SplitPattern', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 20}
                  MatchLikeSubstring  566.577 MiB/sec  548.685 MiB/sec    -3.158                    {'family_index': 8, 'per_family_instance_index': 0, 'run_name': 'MatchLikeSubstring', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 25}
                      TrimSingleUtf8  982.468 MiB/sec  942.122 MiB/sec    -4.107                       {'family_index': 14, 'per_family_instance_index': 0, 'run_name': 'TrimSingleUtf8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 44}

asfimport · 2021-11-03T18:18:14Z

Jonathan Keane / @jonkeane:
Oddly, it looks like these have in fact reverted back to their previous performance, here's a recent run: https://conbench.ursa.dev/benchmarks/673597e52dd24e4e9c04ffcd8570ea99/ at 695 MiB/s (up from 611MiB/s before, and much closer to the ~710-720 MiB/s we were seeing before).

asfimport · 2021-11-03T18:19:41Z

Jonathan Keane / @jonkeane:
Though there is quite a bit of variability in that plot still — is that kind of variability expected for this benchmark (we don't yet have enough history after the release to have a good measure of that yet)?

asfimport · 2021-11-03T18:23:41Z

Antoine Pitrou / @pitrou:
I don't think variability is expected for this particular benchmark, but as we've already discussed there may be variability in any benchmark that goes out of the L2 cache.

asfimport · 2021-11-10T21:19:18Z

David Li / @lidavidm:
Maybe we can close this then? (Sorry, I've been unable to actually go look at this.)

asfimport · 2021-11-23T18:11:23Z

Antoine Pitrou / @pitrou:
Closing as invalid for now.

asfimport closed this as completed Nov 23, 2021

asfimport mentioned this issue Jan 11, 2023

[C++] Mixed support for binary types in regex functions #29497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Investigate recent regressions in some utf8 kernel benchmarks #30040

[C++] Investigate recent regressions in some utf8 kernel benchmarks #30040

asfimport commented Oct 26, 2021 •

edited

asfimport commented Oct 26, 2021

asfimport commented Nov 3, 2021

asfimport commented Nov 3, 2021

asfimport commented Nov 3, 2021

asfimport commented Nov 3, 2021

asfimport commented Nov 10, 2021

asfimport commented Nov 23, 2021

[C++] Investigate recent regressions in some utf8 kernel benchmarks #30040

[C++] Investigate recent regressions in some utf8 kernel benchmarks #30040

Comments

asfimport commented Oct 26, 2021 • edited

Related issues:

asfimport commented Oct 26, 2021

asfimport commented Nov 3, 2021

asfimport commented Nov 3, 2021

asfimport commented Nov 3, 2021

asfimport commented Nov 3, 2021

asfimport commented Nov 10, 2021

asfimport commented Nov 23, 2021

asfimport commented Oct 26, 2021 •

edited