New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling "_timestamp" can cause bulk API to fail entire request instead of single operation #4745
Labels
Comments
ghost
assigned spinscale
Jan 16, 2014
spinscale
added a commit
that referenced
this issue
Jan 27, 2014
If a preparsing of the source is needed (due to mapping configuration, which extracts the routing/id value from the source) and the source is not valid JSON, then the whole bulk request is failed instead of a single BulkRequest. This commit ensures, that a broken JSON request is not forwarded to the destination shard and creates an appropriate BulkItemResponse, which includes a failure. This also implied changing the BulkItemResponse serialization, because one cannot be sure anymore, if a response includes an ID, in case it was not specified and could not be extracted from the JSON. Closes #4745
spinscale
added a commit
that referenced
this issue
Jan 27, 2014
If a preparsing of the source is needed (due to mapping configuration, which extracts the routing/id value from the source) and the source is not valid JSON, then the whole bulk request is failed instead of a single BulkRequest. This commit ensures, that a broken JSON request is not forwarded to the destination shard and creates an appropriate BulkItemResponse, which includes a failure. This also implied changing the BulkItemResponse serialization, because one cannot be sure anymore, if a response includes an ID, in case it was not specified and could not be extracted from the JSON. Closes #4745
spinscale
added a commit
that referenced
this issue
Jan 27, 2014
If a preparsing of the source is needed (due to mapping configuration, which extracts the routing/id value from the source) and the source is not valid JSON, then the whole bulk request is failed instead of a single BulkRequest. This commit ensures, that a broken JSON request is not forwarded to the destination shard and creates an appropriate BulkItemResponse, which includes a failure. This also implied changing the BulkItemResponse serialization, because one cannot be sure anymore, if a response includes an ID, in case it was not specified and could not be extracted from the JSON. Closes #4745
mute
pushed a commit
to mute/elasticsearch
that referenced
this issue
Jul 29, 2015
If a preparsing of the source is needed (due to mapping configuration, which extracts the routing/id value from the source) and the source is not valid JSON, then the whole bulk request is failed instead of a single BulkRequest. This commit ensures, that a broken JSON request is not forwarded to the destination shard and creates an appropriate BulkItemResponse, which includes a failure. This also implied changing the BulkItemResponse serialization, because one cannot be sure anymore, if a response includes an ID, in case it was not specified and could not be extracted from the JSON. Closes elastic#4745
mute
pushed a commit
to mute/elasticsearch
that referenced
this issue
Jul 29, 2015
If a preparsing of the source is needed (due to mapping configuration, which extracts the routing/id value from the source) and the source is not valid JSON, then the whole bulk request is failed instead of a single BulkRequest. This commit ensures, that a broken JSON request is not forwarded to the destination shard and creates an appropriate BulkItemResponse, which includes a failure. This also implied changing the BulkItemResponse serialization, because one cannot be sure anymore, if a response includes an ID, in case it was not specified and could not be extracted from the JSON. Closes elastic#4745
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As I understand it, the intention of the bulk API is that individual operations may fail, but a failure in an individual operation should generally not cause the failure of all operations in the request.
First let's verify that this is generally how it works. I am testing here with a simple case of malformed JSON (though I originally saw the problem with a subtler JSON-parsing issue of unexpected non-printable ASCII characters in JSON data).
This worked correctly - one item failed with an error, the other succeeded, and we do indeed find one item indexed in a subsequent search.
Now let's re-create that index and enable the magic "_timestamp" field this time:
This time the entire request errors out and returns a 400 response code, and no items are successfully indexed.
Since the malformed JSON is limited to a single action in the bulk request, I would expect only that action to fail, regardless of whether the "_timestamp" magic field is enabled or not.
Tested against latest ElasticSearch release:
The text was updated successfully, but these errors were encountered: