New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GET: Add parameter to GET for checking if generated fields can be retrieved #6676
Comments
Here is what happens: For multi-fields, the parent field is returned instead of However, the FastVectorHighlighter relies on this functionality to highlight on multi-fields (see here), so this is not really a solution unless we want to prevent highlighting with the FastVectorHighlighter on multi-fields. The other option is to simply catch the NumberFormatException and handle it like here: be999b1042 |
@brwe what is the status of this? |
@s1monw Need to write more tests, did not get to it yet. Will continue Friday. |
cool ok but it's going to be ready for 1.3 right? |
depends on when the release is |
If a field is a multi field and requested with a GET request then this will return the value of the parent field in case the document is retrieved from the transaction log. If the type is numeric but the value a string, the numeric value will be parsed. This caused the NumberFprmatException in case the field is of type `token_count`. closes elastic#6676
Fields of type `token_count` are generated only when indexing. If a GET requests accesses the transaction log (because no refresh between indexing and GET request) and the token_count field is requested, then GET will try to parse the number from the String and the result is a NumberFormatException. Now, GET accepts a parameter `ignore_errors_on_generated_fields` which has the following effect: - Throw exception with meaningful error message explaining the problem if set to false (default) - Ignore the field if set to true closes elastic#6676
A field of type
|
Following the discussion on pull request #6826 I checked all field mappers and tried to figure out what they should return upon GET. I will call a field "generated" if the content is only available after indexing. Here is what I think we should do: For some core field types ( There are currently four field types (detailed list below):
For 1-3 we simply have to implement To make the fields configurable we could add a parameter Pro: would be easy to do and also allow different types in plugin to very easily use the feature. Con: This would allow users to set For fields that are not configurable, the parameter List of types and their category: There is core types, root types, geo an ip. Core typesThese should be configurable:
The following two should not be configurable because they are always generated:
This should not be configurable because it is never stored:
ip an geoShould be configurable:
root typesNever generated and should not be configurable:
Always generated and should not be configurable:
The following should not be configurable, because they are never stored:
|
hmpf. while writing tests I figured there are actually more cases to consider. will update soon... |
There are two numeric fields that are currently generated ( These should only be returned with GET ( I am now unsure if we should make the core types configurable. By configurable, I actually meant adding a parameter to the type mapping such as
I'll make a pull request without that and then maybe we can discuss further. Just for completeness, below is a list of all ungenerated field types and how they behave with GET. Fields with fixed behavior:Never stored -> should never be returned via GET
Always stored -> should always be returned via GET
Stored or source enabled -> always return via GET, else never return
Stored (but independent of source) -> always return via GET, else never return
Fields that might be configurable
Special fields which can never be in the "fields" list returned by GET anyway
|
Fields of type `token_count`, `murmur3`, `_all` and `_field_names` are generated only when indexing. If a GET requests accesses the transaction log (because no refresh between indexing and GET request) then these fields cannot be retrieved at all. Before the behavior was so: `_all, _field_names`: The field was siletly ignored `murmur3, token_count`: `NumberFormatException` because GET tried to parse the values from the source. In addition, if these fields were not stored, the same behavior occured if the fields were retrieved with GET after a `refresh()` because here also the source was used to get the fields. Now, GET accepts a parameter `ignore_errors_on_generated_fields` which has the following effect: - Throw exception with meaningful error message explaining the problem if set to false (default) - Ignore the field if set to true - Always ignore the field if it was not set to stored This changes the behavior for `_all` and `_field_names` as now an Exception is thrown if a user tries to GET them before a `refresh()`. closes elastic#6676
Fields of type `token_count`, `murmur3`, `_all` and `_field_names` are generated only when indexing. If a GET requests accesses the transaction log (because no refresh between indexing and GET request) then these fields cannot be retrieved at all. Before the behavior was so: `_all, _field_names`: The field was siletly ignored `murmur3, token_count`: `NumberFormatException` because GET tried to parse the values from the source. In addition, if these fields were not stored, the same behavior occured if the fields were retrieved with GET after a `refresh()` because here also the source was used to get the fields. Now, GET accepts a parameter `ignore_errors_on_generated_fields` which has the following effect: - Throw exception with meaningful error message explaining the problem if set to false (default) - Ignore the field if set to true - Always ignore the field if it was not set to stored This changes the behavior for `_all` and `_field_names` as now an Exception is thrown if a user tries to GET them before a `refresh()`. closes #6676 closes #6973
token_count
and realtime getFields of type `token_count`, `murmur3`, `_all` and `_field_names` are generated only when indexing. If a GET requests accesses the transaction log (because no refresh between indexing and GET request) then these fields cannot be retrieved at all. Before the behavior was so: `_all, _field_names`: The field was siletly ignored `murmur3, token_count`: `NumberFormatException` because GET tried to parse the values from the source. In addition, if these fields were not stored, the same behavior occured if the fields were retrieved with GET after a `refresh()` because here also the source was used to get the fields. Now, GET accepts a parameter `ignore_errors_on_generated_fields` which has the following effect: - Throw exception with meaningful error message explaining the problem if set to false (default) - Ignore the field if set to true - Always ignore the field if it was not set to stored This changes the behavior for `_all` and `_field_names` as now an Exception is thrown if a user tries to GET them before a `refresh()`. closes #6676 closes #6973
I index a text field with type
token_count
. When indexing, this creates an additional field that holds the number of tokens in the text field.When a document is retrieved from the transaction log (because no flush happened yet), and I want to get the
token_count
of my text field, I would assume that thetoken_count
field is simply not retrieved, because it does not exist yet. Instead I get aNumberFormatException
.Here are the steps to reproduce:
The text was updated successfully, but these errors were encountered: