You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To efficiently get large number of messages from Elasticsearch, we have to use search-after queries in the future. Pagination or scrolling is not efficient and can lead to high load situations in Elasticsearch.
The usage of search-after queries requires the usage of a tie-breaker field for sorting the results. (see search-after documentation) This tie-breaker field must have a unique value per document in Elasticsearch. The _id field has a unique value per document but it's not recommended to use it as a tie-breaker value, because it's not stored as doc value but in fielddata. That means sorting on _id requires Elasticsearch to load lots of data into memory. To make this efficient it's recommended to use a doc value based field with a unique value.
Since we don't have such a field at the moment, we have to add one now so we can use search-after queries in the future.
Implementation Notes
The new message ID field needs a gl2_ prefix to make sure we don't overwrite user specific fields
Suggestion is to use gl2_message_id
The value should be mapped as "keyword" in Elasticsearch, using doc values
Instead of using the _id value for the gl2_message_id value, we plan to use an ULID instead
This results in shorter IDs (26 characters for ULID vs 36 for UUID) and thus reduced storage usage
They are lexicographically sortable (time based UUIDs are as well, but you have to use the correct variant - which we sometimes do and sometimes don't)
Description
To efficiently get large number of messages from Elasticsearch, we have to use search-after queries in the future. Pagination or scrolling is not efficient and can lead to high load situations in Elasticsearch.
The usage of search-after queries requires the usage of a tie-breaker field for sorting the results. (see search-after documentation) This tie-breaker field must have a unique value per document in Elasticsearch. The
_id
field has a unique value per document but it's not recommended to use it as a tie-breaker value, because it's not stored as doc value but in fielddata. That means sorting on_id
requires Elasticsearch to load lots of data into memory. To make this efficient it's recommended to use a doc value based field with a unique value.Since we don't have such a field at the moment, we have to add one now so we can use search-after queries in the future.
Implementation Notes
gl2_message_id
_id
value for thegl2_message_id
value, we plan to use an ULID insteadThe text was updated successfully, but these errors were encountered: