Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use larger fixed length prefix extractor #12476

Closed
wants to merge 1 commit into from
Closed

Conversation

Zelldon
Copy link
Member

@Zelldon Zelldon commented Apr 18, 2023

Description

In my recent benchmarks and profiling, we have seen again that the most performance issues or impact are coming from our iterating over data (or column families), even if no data exist.

See related comment here

I have recently checked some configurations, and wiki pages and stumbled over #useFixedLengthPrefixExtractor. I think this was mention once from @oleschoenburg as well.

We have already configured the prefix extractor to a LONG. If we iterate we normally do this via a prefix, mostly a long (e.g. scopeKey) which needs to be considered. The column family itself is already long, which means if we use another long in the fixed extractor we can consider the additional key.

The prefix extractor is used during iteration, if more bytes can be considered during seek and iteration it makes the internal search more performant. There are some data structures and logics applied to the size of the extractor and how the data is organized.

Based on my JMH benchmarks I did with #12241 I was able to show that it improved the performance by a lot

Result "io.camunda.zeebe.engine.perf.EnginePerformanceTest.measureProcessExecutionTime":
  552.536 ±(99.9%) 89.015 ops/s [Average]
  (min, avg, max) = (177.826, 552.536, 1122.175), stdev = 376.894
  CI (99.9%): [463.521, 641.550] (assumes normal distribution)


# Run complete. Total time: 00:05:09

Benchmark                                           Mode  Cnt    Score    Error  Units
EnginePerformanceTest.measureProcessExecutionTime  thrpt  200  552.536 ± 89.015  ops/s

Where the base is ~230, see other results here.

We need to run some more benchmarks to understand better the performance impact and maybe down sides (?) but it looks like a good think to do ?

I might still miss some knowledge about RocksDB so please take it with a grain of salt. Would be happy for any input you have @romansmirnov @oleschoenburg @npepinpe

Zeebe Benchmarks:

Related resources:

Related issues

closes #

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Other teams:
If the change impacts another team an issue has been created for this team, explaining what they need to do to support this change.

Please refer to our review guidelines.

The prefix extractor is used during iterator, if more bytes can be
considered during seek and iteration it makes the internal search more
performant.

If we iterate we normally do this via a prefix, mostly a long (e.g.
scopeKey) which needs to be considered. The column family itself is
already a long, which means if we use another long in the fixed
extractor we can considere the addition key.
@Zelldon Zelldon added kind/research Marks an issue as part of a research or investigation area/performance Marks an issue as performance related and removed benchmark labels Apr 19, 2023
@Zelldon
Copy link
Member Author

Zelldon commented Apr 19, 2023

Not sure whether this works as expected. The problem we have with this is that we need to have at least at the prefix a long (which seems to be the case in most cases, at least most of the tests are running and the benchmarks). In our ZeebeDB tests it fails because of using strings as keys and interesting doesn't work anymore as expected, as specially when using foreach without prefix.

If we could enforce via API that the first part of the key is always a long than this might work and would give us some performance boost, but if not than this might not work.

@Zelldon Zelldon changed the title Use larger fixed length prefix extractor [POC]: Use larger fixed length prefix extractor Apr 28, 2023
@Zelldon Zelldon changed the title [POC]: Use larger fixed length prefix extractor Use larger fixed length prefix extractor Apr 28, 2023
@Zelldon
Copy link
Member Author

Zelldon commented May 5, 2023

I see right not need for this to investigate this further, especially with the results here #12033 (comment)

@Zelldon Zelldon closed this May 5, 2023
@Zelldon Zelldon deleted the zell-set-prefix-filter branch March 28, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Marks an issue as performance related kind/research Marks an issue as part of a research or investigation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant