Use larger fixed length prefix extractor #12476
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
In my recent benchmarks and profiling, we have seen again that the most performance issues or impact are coming from our iterating over data (or column families), even if no data exist.
See related comment here
I have recently checked some configurations, and wiki pages and stumbled over
#useFixedLengthPrefixExtractor
. I think this was mention once from @oleschoenburg as well.We have already configured the prefix extractor to a LONG. If we iterate we normally do this via a prefix, mostly a long (e.g. scopeKey) which needs to be considered. The column family itself is already long, which means if we use another long in the fixed extractor we can consider the additional key.
The prefix extractor is used during iteration, if more bytes can be considered during seek and iteration it makes the internal search more performant. There are some data structures and logics applied to the size of the extractor and how the data is organized.
Based on my JMH benchmarks I did with #12241 I was able to show that it improved the performance by a lot
Where the base is ~230, see other results here.
We need to run some more benchmarks to understand better the performance impact and maybe down sides (?) but it looks like a good think to do ?
I might still miss some knowledge about RocksDB so please take it with a grain of salt. Would be happy for any input you have @romansmirnov @oleschoenburg @npepinpe
Zeebe Benchmarks:
Related resources:
Related issues
closes #
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Other teams:
If the change impacts another team an issue has been created for this team, explaining what they need to do to support this change.
Please refer to our review guidelines.