Skip to content

Conversation

@essobedo
Copy link
Contributor

@essobedo essobedo commented Oct 18, 2022

Fix for https://issues.apache.org/jira/browse/CAMEL-16354

Motivation

Using Scanner to use for splitting could potentially be optimized for more basic splitting by single char as we do for commas.

The Scanner creates a lot of object allocations with reg exp patterns and whatnot that is way overkill.

Modifications:

  • Fix warnings in ObjectHelperTest
  • Add new tests to improve the code coverage of ObjectHelper.createIterable
  • Improve the performance in case the separator is a literal or a pattern

Results:

JMH Settings/Environment

# JMH version: 1.35
# VM version: JDK 11, OpenJDK 64-Bit Server VM, 11+28
# VM invoker: /usr/local/jdk-11/bin/java
# VM options: <none>
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op

Literal Separator

Benchmark                                          (type)  Mode  Cnt       Score     Error  Units
MyBenchmark.newSplitterWithSpecificSeparator         TINY  avgt   15      98.854 ±   0.592  ns/op
MyBenchmark.newSplitterWithSpecificSeparator        SMALL  avgt   15     321.351 ±  13.413  ns/op
MyBenchmark.newSplitterWithSpecificSeparator       MEDIUM  avgt   15    2905.660 ±  17.381  ns/op
MyBenchmark.newSplitterWithSpecificSeparator          BIG  avgt   15   28820.933 ±  81.483  ns/op

Benchmark                                         (type)  Mode  Cnt       Score      Error  Units
MyBenchmark.currentSplitterWithSpecificSeparator    TINY  avgt   15    1722.947 ±    7.436  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator   SMALL  avgt   15    3795.835 ±   22.725  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator  MEDIUM  avgt   15   29253.195 ±  118.111  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator     BIG  avgt   15  281669.154 ± 4146.332  ns/op

=> The new approach is about 10 times faster with a more predictable average time whatever the number of tokens to extract

Pattern Separator

Benchmark                                          (type)  Mode  Cnt        Score     Error  Units
MyBenchmark.newSplitterWithSpecificSeparator         TINY  avgt   15      419.647 ±   6.646  ns/op
MyBenchmark.newSplitterWithSpecificSeparator        SMALL  avgt   15      956.858 ±   8.325  ns/op
MyBenchmark.newSplitterWithSpecificSeparator       MEDIUM  avgt   15     7799.689 ± 111.446  ns/op
MyBenchmark.newSplitterWithSpecificSeparator          BIG  avgt   15    76060.763 ± 216.367  ns/op


Benchmark                                         (type)  Mode  Cnt       Score       Error  Units
MyBenchmark.currentSplitterWithSpecificSeparator    TINY  avgt   15    2085.248 ±     3.928  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator   SMALL  avgt   15    5187.362 ±    17.942  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator  MEDIUM  avgt   15   44677.586 ±   771.814  ns/op
MyBenchmark.currentSplitterWithSpecificSeparator     BIG  avgt   15  444297.273 ± 10360.311  ns/op

=> The new approach is about 5 times faster with a more predictable average time whatever the number of tokens to extract

Where:

  • TINY has 3 tokens to extract
  • SMALL has 10 tokens to extract
  • MEDIUM has 100 tokens to extract
  • BIG has 1000 tokens to extract

NB: Lower score is better as it is expressed in nanoseconds per operation

@github-actions github-actions bot added the core label Oct 18, 2022
@github-actions
Copy link
Contributor

🌟 Thank you for your contribution to the Apache Camel project! 🌟

⚠️ Please note that the changes on this PR may be tested automatically.

If necessary Apache Camel Committers may access logs and test results in the job summaries!

@essobedo essobedo force-pushed the CAMEL-16354/optimize-splitter branch 3 times, most recently from 7568207 to d33bb02 Compare October 18, 2022 13:55
@essobedo essobedo force-pushed the CAMEL-16354/optimize-splitter branch from d33bb02 to 7c8abac Compare October 18, 2022 13:56
@github-actions
Copy link
Contributor

🚫 There are (likely) no components to be tested in this PR

1 similar comment
@github-actions
Copy link
Contributor

🚫 There are (likely) no components to be tested in this PR

@github-actions
Copy link
Contributor

🚫 There are (likely) no components to be tested in this PR

@essobedo
Copy link
Contributor Author

@davsclaus Is it what you had in mind when first created the ticket?

@github-actions
Copy link
Contributor

🚫 There are (likely) no components to be tested in this PR

2 similar comments
@github-actions
Copy link
Contributor

🚫 There are (likely) no components to be tested in this PR

@github-actions
Copy link
Contributor

🚫 There are (likely) no components to be tested in this PR

@davsclaus
Copy link
Contributor

Yeah this is great - the increased performance is great, and also with reduced object allocations as well.
Most use cases for splitting is new line or some basic delimiter chars. So many users can benefit from this.

@essobedo essobedo merged commit a5afe55 into main Oct 18, 2022
@essobedo essobedo deleted the CAMEL-16354/optimize-splitter branch October 18, 2022 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants