Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a AppendingDeltaPackedLongBuffer-based storage format to single value field data #5706

Closed

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Apr 7, 2014

The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in a reduced memory signature.

By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting memory_storage_hint which can be set to ORDINALS, PACKED or PAGED

Running some benchmarks on simulated time based data shows 25-30% reduction in memory usage with a very small performance overhead (current implementations uses PACKED as a memory format with 0.5 acceptable overhead ratio):


------------------ SUMMARY -------------------------------
docs: 5000000
match percentage: 0.1
memory format hint: PACKED
acceptable_overhead_ratio: 0.5
field data: 19mb
                     name      took    millis
                   hist_l     16.9s        33
------------------ SUMMARY -------------------------------

------------------ SUMMARY -------------------------------
docs: 5000000
match percentage: 0.1
memory format hint: PAGED
acceptable_overhead_ratio: 0.5
field data: 14.6mb
                     name      took    millis
                   hist_l     18.2s        36
------------------ SUMMARY -------------------------------

------------------ SUMMARY -------------------------------
docs: 5000000
match percentage: 0.1
memory format hint: PACKED
acceptable_overhead_ratio: 0.0
field data: 16mb
                     name      took    millis
                   hist_l     17.4s        34
------------------ SUMMARY -------------------------------

------------------ SUMMARY -------------------------------
docs: 5000000
match percentage: 0.1
memory format hint: PAGED
acceptable_overhead_ratio: 0.0
field data: 10.8mb
                     name      took    millis
                   hist_l       21s        42
------------------ SUMMARY -------------------------------

… value field data

The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in reduced memory signature.

By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting `memory_storage_hint` which can be set to `ORDINALS`, `PACKED` or `PAGED`
if (s != null) {
return "always".equals(s) ? MemoryStorageFormat.ORDINALS : null;
}
return MemoryStorageFormat.fromString(fieldDataType.getSettings().get(SETTING_MEMORY_STORAGE_HINT));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to be able to get the default value from the settings in order to be able to randomize it in our integration tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note sure I follow? The default value is null , which means the code is allowed to decide based on memory size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, you can look at how the cache.recycler.page.type setting is set in TestCluster.getRandomNodeSettings: it is either not set in order to make sure things work fine with default settings, or set to a random value to make sure all recycler types get tested by our integration tests. I was thinking doing something similar here would help make sure that our integration tests would pass with any of these formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - the problem is that I don't have easy access to node level settings from that part of the code and I can't see how I to easily add it.

…bitset which allows choosing a better place holder

also made page size configurable
@bleskes
Copy link
Contributor Author

bleskes commented Apr 11, 2014

@jpountz I pushed another commit. Thx for the feedback

@jpountz
Copy link
Contributor

jpountz commented Apr 11, 2014

Thanks Boaz, this looks great. +1 to merge

@bleskes bleskes closed this in 1d1ca3b Apr 11, 2014
bleskes added a commit that referenced this pull request Apr 11, 2014
… value field data

The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in reduced memory signature.

By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting `memory_storage_hint` which can be set to `ORDINALS`, `PACKED` or `PAGED`

Closes #5706
bleskes added a commit that referenced this pull request Apr 11, 2014
When we load sparse single valued data, we automatically assign a missing value to represent a document who has none. We try to find a value that will increase the number of bits needed to represent the data. If that missing value happen to be 0, we do no properly intialize the value array.

This commit solved this problem but also cleans up the code even more to make spotting such issues easier in the future.
bleskes added a commit that referenced this pull request Apr 11, 2014
When we load sparse single valued data, we automatically assign a missing value to represent a document who has none. We try to find a value that will increase the number of bits needed to represent the data. If that missing value happen to be 0, we do no properly intialize the value array.

This commit solved this problem but also cleans up the code even more to make spotting such issues easier in the future.
@bleskes bleskes deleted the exp/single_paged_compressed_fielddata branch May 19, 2014 09:49
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Fielddata labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v1.2.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants