You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue adds the option to configure a PostingsFormat and assign it to a field in the mapping. This feature is very expert and in almost all cases Elasticsearch's defaults will suite your needs.
Configuring a postingsformat per field
There're several default postings formats configured by default which can be used in your mapping:
pulsing - A postings format that encodes the postinglist for terms with low document frequency in the term directory.
direct - A codec that wraps the default postings format during write time, but loads the terms and postinglists into memory directly in memory during read time as raw arrays. This postings format is exceptional memory intensive, but can give a substantial increase in search performance.
memory - A codec that loads and stores terms and postinglists in memory using a FST. Acts like a cached postingslist.
bloom_default - Maintains a bloom filter for the indexed terms, which is stored to disk and builds on top of the default postings format. This postings format is useful for low document frequency terms and offers a fail fast for seeks to terms that don't exist.
bloom_pulsing - Similar to the bloom_default postings format, but builds on top of the pulsing postings format.
default - The default postings format. The default if none is specified.
On all fields it possible to configure a postings_format attribute. Example mapping:
In the above example the freq_cut_off is set the 5 (defaults to 1). This tells the pulsing postings format to inline the postinglist of terms with a document frequency lower or equal to 5 in the term dictionary.
Note: when we doc this, we need to properly doc and expose all the configuration options for all codecs.
The text was updated successfully, but these errors were encountered:
This issue adds the option to configure a
PostingsFormat
and assign it to a field in the mapping. This feature is very expert and in almost all cases Elasticsearch's defaults will suite your needs.Configuring a postingsformat per field
There're several default postings formats configured by default which can be used in your mapping:
pulsing
- A postings format that encodes the postinglist for terms with low document frequency in the term directory.direct
- A codec that wraps the default postings format during write time, but loads the terms and postinglists into memory directly in memory during read time as raw arrays. This postings format is exceptional memory intensive, but can give a substantial increase in search performance.memory
- A codec that loads and stores terms and postinglists in memory using a FST. Acts like a cached postingslist.bloom_default
- Maintains a bloom filter for the indexed terms, which is stored to disk and builds on top of thedefault
postings format. This postings format is useful for low document frequency terms and offers a fail fast for seeks to terms that don't exist.bloom_pulsing
- Similar to thebloom_default
postings format, but builds on top of thepulsing
postings format.default
- The default postings format. The default if none is specified.On all fields it possible to configure a
postings_format
attribute. Example mapping:Configuring a custom postingsformat
It is possible the instantiate custom postingsformats. This can be specified via the index settings.
In the above example the
freq_cut_off
is set the 5 (defaults to 1). This tells the pulsing postings format to inline the postinglist of terms with a document frequency lower or equal to 5 in the term dictionary.Note: when we doc this, we need to properly doc and expose all the configuration options for all codecs.
The text was updated successfully, but these errors were encountered: