New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a AppendingDeltaPackedLongBuffer-based storage format to single value field data #5706
Added a AppendingDeltaPackedLongBuffer-based storage format to single value field data #5706
Conversation
… value field data The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in reduced memory signature. By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting `memory_storage_hint` which can be set to `ORDINALS`, `PACKED` or `PAGED`
if (s != null) { | ||
return "always".equals(s) ? MemoryStorageFormat.ORDINALS : null; | ||
} | ||
return MemoryStorageFormat.fromString(fieldDataType.getSettings().get(SETTING_MEMORY_STORAGE_HINT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to be able to get the default value from the settings in order to be able to randomize it in our integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note sure I follow? The default value is null , which means the code is allowed to decide based on memory size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, you can look at how the cache.recycler.page.type
setting is set in TestCluster.getRandomNodeSettings
: it is either not set in order to make sure things work fine with default settings, or set to a random value to make sure all recycler types get tested by our integration tests. I was thinking doing something similar here would help make sure that our integration tests would pass with any of these formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - the problem is that I don't have easy access to node level settings from that part of the code and I can't see how I to easily add it.
…bitset which allows choosing a better place holder also made page size configurable
@jpountz I pushed another commit. Thx for the feedback |
Thanks Boaz, this looks great. +1 to merge |
… value field data The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in reduced memory signature. By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting `memory_storage_hint` which can be set to `ORDINALS`, `PACKED` or `PAGED` Closes #5706
When we load sparse single valued data, we automatically assign a missing value to represent a document who has none. We try to find a value that will increase the number of bits needed to represent the data. If that missing value happen to be 0, we do no properly intialize the value array. This commit solved this problem but also cleans up the code even more to make spotting such issues easier in the future.
When we load sparse single valued data, we automatically assign a missing value to represent a document who has none. We try to find a value that will increase the number of bits needed to represent the data. If that missing value happen to be 0, we do no properly intialize the value array. This commit solved this problem but also cleans up the code even more to make spotting such issues easier in the future.
The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in a reduced memory signature.
By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting
memory_storage_hint
which can be set toORDINALS
,PACKED
orPAGED
Running some benchmarks on simulated time based data shows 25-30% reduction in memory usage with a very small performance overhead (current implementations uses PACKED as a memory format with 0.5 acceptable overhead ratio):