CAMEL-16354: camel-core - Optimize Splitters using Scanner #8570

essobedo · 2022-10-18T13:04:07Z

Fix for https://issues.apache.org/jira/browse/CAMEL-16354

Motivation

Using Scanner to use for splitting could potentially be optimized for more basic splitting by single char as we do for commas.

The Scanner creates a lot of object allocations with reg exp patterns and whatnot that is way overkill.

Modifications:

Fix warnings in ObjectHelperTest
Add new tests to improve the code coverage of ObjectHelper.createIterable
Improve the performance in case the separator is a literal or a pattern

Results:

JMH Settings/Environment

# JMH version: 1.35
# VM version: JDK 11, OpenJDK 64-Bit Server VM, 11+28
# VM invoker: /usr/local/jdk-11/bin/java
# VM options: <none>
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op

Literal Separator

Benchmark                                          (type)  Mode  Cnt       Score     Error  Units
MyBenchmark.newSplitterWithSpecificSeparator         TINY  avgt   15      98.854 ±   0.592  ns/op
MyBenchmark.newSplitterWithSpecificSeparator        SMALL  avgt   15     321.351 ±  13.413  ns/op
MyBenchmark.newSplitterWithSpecificSeparator       MEDIUM  avgt   15    2905.660 ±  17.381  ns/op
MyBenchmark.newSplitterWithSpecificSeparator          BIG  avgt   15   28820.933 ±  81.483  ns/op

Benchmark                                         (type)  Mode  Cnt       Score      Error  Units
MyBenchmark.currentSplitterWithSpecificSeparator    TINY  avgt   15    1722.947 ±    7.436  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator   SMALL  avgt   15    3795.835 ±   22.725  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator  MEDIUM  avgt   15   29253.195 ±  118.111  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator     BIG  avgt   15  281669.154 ± 4146.332  ns/op

=> The new approach is about 10 times faster with a more predictable average time whatever the number of tokens to extract

Pattern Separator

Benchmark                                          (type)  Mode  Cnt        Score     Error  Units
MyBenchmark.newSplitterWithSpecificSeparator         TINY  avgt   15      419.647 ±   6.646  ns/op
MyBenchmark.newSplitterWithSpecificSeparator        SMALL  avgt   15      956.858 ±   8.325  ns/op
MyBenchmark.newSplitterWithSpecificSeparator       MEDIUM  avgt   15     7799.689 ± 111.446  ns/op
MyBenchmark.newSplitterWithSpecificSeparator          BIG  avgt   15    76060.763 ± 216.367  ns/op


Benchmark                                         (type)  Mode  Cnt       Score       Error  Units
MyBenchmark.currentSplitterWithSpecificSeparator    TINY  avgt   15    2085.248 ±     3.928  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator   SMALL  avgt   15    5187.362 ±    17.942  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator  MEDIUM  avgt   15   44677.586 ±   771.814  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator     BIG  avgt   15  444297.273 ± 10360.311  ns/op

=> The new approach is about 5 times faster with a more predictable average time whatever the number of tokens to extract

Where:

TINY has 3 tokens to extract
SMALL has 10 tokens to extract
MEDIUM has 100 tokens to extract
BIG has 1000 tokens to extract

NB: Lower score is better as it is expressed in nanoseconds per operation

github-actions · 2022-10-18T13:06:06Z

🌟 Thank you for your contribution to the Apache Camel project! 🌟

⚠️ Please note that the changes on this PR may be tested automatically.

If necessary Apache Camel Committers may access logs and test results in the job summaries!

github-actions · 2022-10-18T14:14:04Z