New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make field data changes immediately taken into account and add the ability to disallow field data loading. #4432
Conversation
…ility to disallow field data loading. This commit changes field data configuration updates so that they are immediately taken into account for loading new segments. The way it works is that field data configuration is now cached separately from the field data cache, meaning that it is now possible to clear the field data configuration from IndexFieldDataService while the cache will stay around. On the next time that Elasticsearch will reload field data configuration, it will check if there is already a cache entry, and reuse it if it exists. To disable field data loading, all that is required is to change the field data format to "none" (supported by all field data types) using the update mapping API. Elasticsearch will then refuse to load field data on any new segment, but field data which has been loaded on the previous segments will remain available. So you need to clear the field data cache in order to reclaim memory (otherwise memory will be reclaimed slower, as segments get merged). Close elastic#4430 Close elastic#4431
.put(Tuple.tuple("string", "doc_values"), new DocValuesIndexFieldData.Builder()) | ||
.put(Tuple.tuple("string", DOC_VALUES_FORMAT), new DocValuesIndexFieldData.Builder()) | ||
.put(Tuple.tuple("string", NONE_FORMAT), new NoneIndexFieldData.Builder()) | ||
.put(Tuple.tuple("string", "no"), new DocValuesIndexFieldData.Builder()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is no
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an typo :) will fix
I left some small comments, other than that LGMT - I am tempted to propose to default tokenized fields to this! |
I am not sure I like calling it |
I like the idea given that problems typically arise when loading by mistake field data on fields that are used for full-text search. However, I think it may be a problem for the out-of-the-box experience given that fields are tokenized by default? It's not the first time that I'm a bit torn between usability and production use when deciding on default values. Maybe we could have hints from the user on what he's doing with Elasticsearch in order to pick default values accordingly (just thinking out loud)...
To say the truth, I didn't like it either, I was just out of inspiration. |
field data type that prevents field data from being loaded into memory and | ||
will cause all requests that would need field data for this field to return | ||
an error. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since circuit breaker is going to happen, I think that this doc can be misleading, since the circuit breaker will kick in and not allow for the field to be loaded (and hopefully, in the future, blacklist a field using this feature, ha!, how things connect!). It is obviously still very much good to explicitly disable certain fields from loading to field data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a new commit to reword this part of the documentation. Does it look better?
LGTM |
LGTM |
This commit changes field data configuration updates so that they are
immediately taken into account for loading new segments. The way it works
is that field data configuration is now cached separately from the field
data cache, meaning that it is now possible to clear the field data
configuration from IndexFieldDataService while the cache will stay around. On
the next time that Elasticsearch will reload field data configuration, it will
check if there is already a cache entry, and reuse it if it exists.
To disable field data loading, all that is required is to change the field
data format to "none" (supported by all field data types) using the update
mapping API. Elasticsearch will then refuse to load field data on any new
segment, but field data which has been loaded on the previous segments will
remain available. So you need to clear the field data cache in order to
reclaim memory (otherwise memory will be reclaimed slower, as segments get
merged).
Close #4430
Close #4431