Use larger fixed length prefix extractor #12476

Zelldon · 2023-04-18T18:57:41Z

Description

In my recent benchmarks and profiling, we have seen again that the most performance issues or impact are coming from our iterating over data (or column families), even if no data exist.

See related comment here

I have recently checked some configurations, and wiki pages and stumbled over #useFixedLengthPrefixExtractor. I think this was mention once from @oleschoenburg as well.

We have already configured the prefix extractor to a LONG. If we iterate we normally do this via a prefix, mostly a long (e.g. scopeKey) which needs to be considered. The column family itself is already long, which means if we use another long in the fixed extractor we can consider the additional key.

The prefix extractor is used during iteration, if more bytes can be considered during seek and iteration it makes the internal search more performant. There are some data structures and logics applied to the size of the extractor and how the data is organized.

Based on my JMH benchmarks I did with #12241 I was able to show that it improved the performance by a lot

Result "io.camunda.zeebe.engine.perf.EnginePerformanceTest.measureProcessExecutionTime":
  552.536 ±(99.9%) 89.015 ops/s [Average]
  (min, avg, max) = (177.826, 552.536, 1122.175), stdev = 376.894
  CI (99.9%): [463.521, 641.550] (assumes normal distribution)


# Run complete. Total time: 00:05:09

Benchmark                                           Mode  Cnt    Score    Error  Units
EnginePerformanceTest.measureProcessExecutionTime  thrpt  200  552.536 ± 89.015  ops/s

Where the base is ~230, see other results here.

We need to run some more benchmarks to understand better the performance impact and maybe down sides (?) but it looks like a good think to do ?

I might still miss some knowledge about RocksDB so please take it with a grain of salt. Would be happy for any input you have @romansmirnov @oleschoenburg @npepinpe

Zeebe Benchmarks:

Related resources:

Related issues

closes #

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Other teams:
If the change impacts another team an issue has been created for this team, explaining what they need to do to support this change.

Please refer to our review guidelines.

The prefix extractor is used during iterator, if more bytes can be considered during seek and iteration it makes the internal search more performant. If we iterate we normally do this via a prefix, mostly a long (e.g. scopeKey) which needs to be considered. The column family itself is already a long, which means if we use another long in the fixed extractor we can considere the addition key.

Zelldon · 2023-04-19T12:30:44Z

Not sure whether this works as expected. The problem we have with this is that we need to have at least at the prefix a long (which seems to be the case in most cases, at least most of the tests are running and the benchmarks). In our ZeebeDB tests it fails because of using strings as keys and interesting doesn't work anymore as expected, as specially when using foreach without prefix.

If we could enforce via API that the first part of the key is always a long than this might work and would give us some performance boost, but if not than this might not work.

Zelldon · 2023-05-05T11:58:08Z

I see right not need for this to investigate this further, especially with the results here #12033 (comment)

Zelldon added the benchmark label Apr 18, 2023

Zelldon mentioned this pull request Apr 19, 2023

Implement JMH benchmark for support process instance creation on larger state #12241

Closed

Zelldon added kind/research Marks an issue as part of a research or investigation area/performance Marks an issue as performance related and removed benchmark labels Apr 19, 2023

Zelldon mentioned this pull request Apr 20, 2023

[EPIC] Support stable performance for new instances even on larger state #12033

Closed

Zelldon changed the title ~~Use larger fixed length prefix extractor~~ [POC]: Use larger fixed length prefix extractor Apr 28, 2023

Zelldon changed the title ~~[POC]: Use larger fixed length prefix extractor~~ Use larger fixed length prefix extractor Apr 28, 2023

Zelldon closed this May 5, 2023

Zelldon deleted the zell-set-prefix-filter branch March 28, 2024 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use larger fixed length prefix extractor #12476

Use larger fixed length prefix extractor #12476

Zelldon commented Apr 18, 2023 •

edited

Zelldon commented Apr 19, 2023

Zelldon commented May 5, 2023

Use larger fixed length prefix extractor #12476

Use larger fixed length prefix extractor #12476

Conversation

Zelldon commented Apr 18, 2023 • edited

Description

Related issues

Definition of Done

Zelldon commented Apr 19, 2023

Zelldon commented May 5, 2023

Zelldon commented Apr 18, 2023 •

edited