Added a AppendingDeltaPackedLongBuffer-based storage format to single value field data #5706

bleskes · 2014-04-07T11:52:33Z

The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in a reduced memory signature.

By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting memory_storage_hint which can be set to ORDINALS, PACKED or PAGED

Running some benchmarks on simulated time based data shows 25-30% reduction in memory usage with a very small performance overhead (current implementations uses PACKED as a memory format with 0.5 acceptable overhead ratio):


------------------ SUMMARY -------------------------------
docs: 5000000
match percentage: 0.1
memory format hint: PACKED
acceptable_overhead_ratio: 0.5
field data: 19mb
                     name      took    millis
                   hist_l     16.9s        33
------------------ SUMMARY -------------------------------

------------------ SUMMARY -------------------------------
docs: 5000000
match percentage: 0.1
memory format hint: PAGED
acceptable_overhead_ratio: 0.5
field data: 14.6mb
                     name      took    millis
                   hist_l     18.2s        36
------------------ SUMMARY -------------------------------

------------------ SUMMARY -------------------------------
docs: 5000000
match percentage: 0.1
memory format hint: PACKED
acceptable_overhead_ratio: 0.0
field data: 16mb
                     name      took    millis
                   hist_l     17.4s        34
------------------ SUMMARY -------------------------------

------------------ SUMMARY -------------------------------
docs: 5000000
match percentage: 0.1
memory format hint: PAGED
acceptable_overhead_ratio: 0.0
field data: 10.8mb
                     name      took    millis
                   hist_l       21s        42
------------------ SUMMARY -------------------------------

… value field data The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in reduced memory signature. By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting `memory_storage_hint` which can be set to `ORDINALS`, `PACKED` or `PAGED`

jpountz · 2014-04-08T16:08:01Z

src/main/java/org/elasticsearch/index/fielddata/IndexFieldData.java

+            if (s != null) {
+                return "always".equals(s) ? MemoryStorageFormat.ORDINALS : null;
+            }
+            return MemoryStorageFormat.fromString(fieldDataType.getSettings().get(SETTING_MEMORY_STORAGE_HINT));


It would be nice to be able to get the default value from the settings in order to be able to randomize it in our integration tests.

Note sure I follow? The default value is null , which means the code is allowed to decide based on memory size

For example, you can look at how the cache.recycler.page.type setting is set in TestCluster.getRandomNodeSettings: it is either not set in order to make sure things work fine with default settings, or set to a random value to make sure all recycler types get tested by our integration tests. I was thinking doing something similar here would help make sure that our integration tests would pass with any of these formats.

I see - the problem is that I don't have easy access to node level settings from that part of the code and I can't see how I to easily add it.

…bitset which allows choosing a better place holder also made page size configurable

bleskes · 2014-04-11T12:49:08Z

@jpountz I pushed another commit. Thx for the feedback

jpountz · 2014-04-11T12:57:29Z

Thanks Boaz, this looks great. +1 to merge

… value field data The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in reduced memory signature. By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting `memory_storage_hint` which can be set to `ORDINALS`, `PACKED` or `PAGED` Closes #5706

When we load sparse single valued data, we automatically assign a missing value to represent a document who has none. We try to find a value that will increase the number of bits needed to represent the data. If that missing value happen to be 0, we do no properly intialize the value array. This commit solved this problem but also cleans up the code even more to make spotting such issues easier in the future.

jpountz reviewed Apr 8, 2014
View reviewed changes

Improved support for missing values by relying on the docsWithValues …

b650fff

…bitset which allows choosing a better place holder also made page size configurable

bleskes added v2.0.0 labels Apr 11, 2014

bleskes closed this in 1d1ca3b Apr 11, 2014

bleskes deleted the exp/single_paged_compressed_fielddata branch May 19, 2014 09:49

clintongormley added the :Fielddata label Jun 7, 2015

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Fielddata labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a AppendingDeltaPackedLongBuffer-based storage format to single value field data #5706

Added a AppendingDeltaPackedLongBuffer-based storage format to single value field data #5706

bleskes commented Apr 7, 2014

jpountz Apr 8, 2014

bleskes Apr 8, 2014

jpountz Apr 9, 2014

bleskes Apr 11, 2014

bleskes commented Apr 11, 2014

jpountz commented Apr 11, 2014

Added a AppendingDeltaPackedLongBuffer-based storage format to single value field data #5706

Added a AppendingDeltaPackedLongBuffer-based storage format to single value field data #5706

Conversation

bleskes commented Apr 7, 2014

jpountz Apr 8, 2014

Choose a reason for hiding this comment

bleskes Apr 8, 2014

Choose a reason for hiding this comment

jpountz Apr 9, 2014

Choose a reason for hiding this comment

bleskes Apr 11, 2014

Choose a reason for hiding this comment

bleskes commented Apr 11, 2014

jpountz commented Apr 11, 2014