You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current precision_step value for all numeric fields is 4. This equates to 8 terms-per-value for 32-bit types, and 16-terms-per-value for 64-bit types. This means these fields are 8x or 16x more costly than a "regular" field (in terms of space/indexing), to accelerate range queries.
The benefit of the lower precision step is to visit less terms during range queries, and these terms will be denser (so smaller integers in the postings lists, but large ones).
Several things in Lucene have changed in the past few years: visiting more terms at query-time is less costly because this is executed per-segment, postings list compression has improved, the term dictionary is faster in general, and terms with only one posting are inlined into the term dictionary.
Furthermore, the speedup has much less benefit when using filters, because in that case it only impacts "uncached" filters. Once cached, its the same either way since it is just a bitset.
I think we should change the default: its just a default. The current default IMO is too aggressive.
Change the default numeric precision_step to 16 for 64-bit types,
8 for 32-bit and 16-bit types. Disable precision_step for the 8-bit
byte type.
Closes#5905
mikemccand
pushed a commit
to mikemccand/elasticsearch
that referenced
this issue
Apr 24, 2014
Change the default numeric precision_step to 16 for 64-bit types,
8 for 32-bit and 16-bit types. Disable precision_step for the 8-bit
byte type.
Closeselastic#5905
The current precision_step value for all numeric fields is 4. This equates to 8 terms-per-value for 32-bit types, and 16-terms-per-value for 64-bit types. This means these fields are 8x or 16x more costly than a "regular" field (in terms of space/indexing), to accelerate range queries.
The benefit of the lower precision step is to visit less terms during range queries, and these terms will be denser (so smaller integers in the postings lists, but large ones).
Several things in Lucene have changed in the past few years: visiting more terms at query-time is less costly because this is executed per-segment, postings list compression has improved, the term dictionary is faster in general, and terms with only one posting are inlined into the term dictionary.
Furthermore, the speedup has much less benefit when using filters, because in that case it only impacts "uncached" filters. Once cached, its the same either way since it is just a bitset.
I think we should change the default: its just a default. The current default IMO is too aggressive.
There are some benchmarks posted here: https://issues.apache.org/jira/browse/LUCENE-5609
The text was updated successfully, but these errors were encountered: