New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge integer field data implementations together #3220
Comments
This commit merges field data implementations for byte, short, int and long data into PackedArrayAtomicFieldData which uses Lucene's PackedInts API to store data. Close elastic#3220
With the help of @martijnvg , I ran a few benchmarks to compare the new implementation against the old ones. Loading time are similar, memory usage is between 1x and 2x smaller and faceting runs at similar speeds (there are little differences based on the dataset due to CPU caching effects). For example, here are the results of HistogramFacetSearchBenchmark on a 20m documents index for fields of type byte (b_value), short (s_value), int (i_value) and long (l_value):
And here are the results on a 5m index:
|
About memory and loading time, here are reports from LongFieldDataBenchmark on 1M documents.
More information about the data sets:
More information about the columns:
Explanation of the memory reduction:
|
This commit merges field data implementations for byte, short, int and long data into PackedArrayAtomicFieldData which uses Lucene's PackedInts API to store data. Close #3220
This commit merges field data implementations for byte, short, int and long data into PackedArrayAtomicFieldData which uses Lucene's PackedInts API to store data. Close elastic#3220
Elasticsearch has 4 similar field data implementations for its integer types: byte, short, int and long. These implementations could be merged together and even be made a little more memory-efficient by using Lucene's PackedInts API.
The text was updated successfully, but these errors were encountered: