Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling "_timestamp" can cause bulk API to fail entire request instead of single operation #4745

Closed
carljm opened this issue Jan 15, 2014 · 0 comments · Fixed by #4781
Closed

Comments

@carljm
Copy link

carljm commented Jan 15, 2014

As I understand it, the intention of the bulk API is that individual operations may fail, but a failure in an individual operation should generally not cause the failure of all operations in the request.

First let's verify that this is generally how it works. I am testing here with a simple case of malformed JSON (though I originally saw the problem with a subtler JSON-parsing issue of unexpected non-printable ASCII characters in JSON data).

$ curl -XPUT http://localhost:9200/testing/
{"ok":true,"acknowledged":true}

$ curl -XPUT http://localhost:9200/testing/person/_mapping -d '{"person": {"dynamic": "strict", "properties": {"last_modified": {"type": "date", "format": "dateOptionalTime"},"name": {"type": "string"}}}}'
{"ok":true,"acknowledged":true}

$ cat baddata.txt
{"index": {"_id": "1"}}
{"name": "Malformed}
{"index": {"_id": "2"}}
{"name": "Good"}

$ curl -XPOST http://localhost:9200/testing/person/_bulk --data-binary @baddata.txt
{"took":72,"items":[{"index":{"_index":"testing","_type":"person","_id":"1","error":"MapperParsingException[failed to parse [name]]; nested: JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source: [B@27beb7ec; line: 1, column: 65]]; "}},{"index":{"_index":"testing","_type":"person","_id":"2","_version":1,"ok":true}}]}

This worked correctly - one item failed with an error, the other succeeded, and we do indeed find one item indexed in a subsequent search.

Now let's re-create that index and enable the magic "_timestamp" field this time:

$ curl -XDELETE http://localhost:9200/testing/
{"ok":true,"acknowledged":true}

$ curl -XPUT http://localhost:9200/testing/
{"ok":true,"acknowledged":true}

$ curl -XPUT $LOCAL/testing/person/_mapping -d '{"person": {"_timestamp": {"enabled": true, "path": "last_modified"}, "dynamic": "strict", "properties": {"last_modified": {"type": "date", "format": "dateOptionalTime"},"name": {"type": "string"}}}}'
{"ok":true,"acknowledged":true}

$ curl -XPOST $LOCAL/testing/person/_bulk --data-binary @baddata.txt
{"error":"ElasticSearchParseException[failed to parse doc to extract routing/timestamp]; nested: JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source: [B@68f55ff2; line: 1, column: 65]]; ","status":400}

This time the entire request errors out and returns a 400 response code, and no items are successfully indexed.

Since the malformed JSON is limited to a single action in the bulk request, I would expect only that action to fail, regardless of whether the "_timestamp" magic field is enabled or not.

Tested against latest ElasticSearch release:

$ curl http://localhost:9200
{
  "ok" : true,
  "status" : 200,
  "name" : "Pip the Troll",
  "version" : {
    "number" : "0.90.10",
    "build_hash" : "0a5781f44876e8d1c30b6360628d59cb2a7a2bbb",
    "build_timestamp" : "2014-01-10T10:18:37Z",
    "build_snapshot" : false,
    "lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}
@ghost ghost assigned spinscale Jan 16, 2014
spinscale added a commit that referenced this issue Jan 27, 2014
If a preparsing of the source is needed (due to mapping configuration,
which extracts the routing/id value from the source) and the source is not
valid JSON, then the whole bulk request is failed instead of a single
BulkRequest.

This commit ensures, that a broken JSON request is not forwarded to the
destination shard and creates an appropriate BulkItemResponse, which
includes a failure.

This also implied changing the BulkItemResponse serialization, because one
cannot be sure anymore, if a response includes an ID, in case it was not
specified and could not be extracted from the JSON.

Closes #4745
spinscale added a commit that referenced this issue Jan 27, 2014
If a preparsing of the source is needed (due to mapping configuration,
which extracts the routing/id value from the source) and the source is not
valid JSON, then the whole bulk request is failed instead of a single
BulkRequest.

This commit ensures, that a broken JSON request is not forwarded to the
destination shard and creates an appropriate BulkItemResponse, which
includes a failure.

This also implied changing the BulkItemResponse serialization, because one
cannot be sure anymore, if a response includes an ID, in case it was not
specified and could not be extracted from the JSON.

Closes #4745
spinscale added a commit that referenced this issue Jan 27, 2014
If a preparsing of the source is needed (due to mapping configuration,
which extracts the routing/id value from the source) and the source is not
valid JSON, then the whole bulk request is failed instead of a single
BulkRequest.

This commit ensures, that a broken JSON request is not forwarded to the
destination shard and creates an appropriate BulkItemResponse, which
includes a failure.

This also implied changing the BulkItemResponse serialization, because one
cannot be sure anymore, if a response includes an ID, in case it was not
specified and could not be extracted from the JSON.

Closes #4745
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
If a preparsing of the source is needed (due to mapping configuration,
which extracts the routing/id value from the source) and the source is not
valid JSON, then the whole bulk request is failed instead of a single
BulkRequest.

This commit ensures, that a broken JSON request is not forwarded to the
destination shard and creates an appropriate BulkItemResponse, which
includes a failure.

This also implied changing the BulkItemResponse serialization, because one
cannot be sure anymore, if a response includes an ID, in case it was not
specified and could not be extracted from the JSON.

Closes elastic#4745
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
If a preparsing of the source is needed (due to mapping configuration,
which extracts the routing/id value from the source) and the source is not
valid JSON, then the whole bulk request is failed instead of a single
BulkRequest.

This commit ensures, that a broken JSON request is not forwarded to the
destination shard and creates an appropriate BulkItemResponse, which
includes a failure.

This also implied changing the BulkItemResponse serialization, because one
cannot be sure anymore, if a response includes an ID, in case it was not
specified and could not be extracted from the JSON.

Closes elastic#4745
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants