Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk operation can create duplicates on primary relocation #7729

Closed
wants to merge 6 commits into from

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Sep 15, 2014

When executing a bulk request, with create index operation and auto generate id, if while the primary is relocating the bulk is executed, and the relocation is done while N items from the bulk have executed, the full shard bulk request will be retried on the new primary. This can create duplicates because the request is not makred as potentially holding conflicts.

This change carries over the response for each item on the request level, and if a conflict is detected on the primary shard, and the response is there (indicating that the request was executed once already), use the mentioned response as the actual response for that bulk shard item.

On top of that, when a primary fails and is retried, the change now marks the request as potentially causing duplicates, so the actual impl will do the extra lookup needed.

This change also fixes a bug in our exception handling on the replica, where if a specific item failed, and its not an exception we can ignore, we should actually cause the shard to fail.

this.primaryResponse = primaryResponse;
}

void setIgnoreOnReplica() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one merits a little java doc/comment to indicate when it is used / what it means

@bleskes
Copy link
Contributor

bleskes commented Sep 15, 2014

Main change looks good to me. There is a typo (s/false/true/) in the BWC commit. I also think we need a BWC test that indexes while relocating . testRecoverFromPreviousVersion looks like a good candidate for a change.

@kimchy
Copy link
Member Author

kimchy commented Sep 15, 2014

@bleskes I pushed removing the responses array, I think its ready

@bleskes
Copy link
Contributor

bleskes commented Sep 15, 2014

LGTM. The last change makes it cleaner imho. I'll add the BWC test tomorrow.

@s1monw
Copy link
Contributor

s1monw commented Sep 16, 2014

LGTM

@@ -81,6 +102,12 @@ public void readFrom(StreamInput in) throws IOException {
request = new UpdateRequest();
}
request.readFrom(in);
if (in.getVersion().onOrAfter(Version.V_1_4_0_Beta)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we port this fix also to the 1.3 branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets get this in to 1.4 and above for now, and see how it goes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we can always port it back later on.

When executing a bulk request, with create index operation and auto generate id, if while the primary is relocating the bulk is executed, and the relocation is done while N items from the bulk have executed, the full shard bulk request will be retried on the new primary. This can create duplicates because the request is not makred as potentially holding conflicts.

This change carries over the response for each item on the request level, and if a conflict is detected on the primary shard, and the response is there (indicating that the request was executed once already), use the mentioned response as the actual response for that bulk shard item.

On top of that, when a primary fails and is retried, the change now marks the request as potentially causing duplicates, so the actual impl will do the extra lookup needed.

This change also fixes a bug in our exception handling on the replica, where if a specific item failed, and its not an exception we can ignore, we should actually cause the shard to fail.

closes elastic#7729
@martijnvg
Copy link
Member

LGTM

@dakrone dakrone added the v1.5.0 label Sep 16, 2014
@kimchy kimchy closed this in 99f91f7 Sep 16, 2014
kimchy added a commit that referenced this pull request Sep 16, 2014
When executing a bulk request, with create index operation and auto generate id, if while the primary is relocating the bulk is executed, and the relocation is done while N items from the bulk have executed, the full shard bulk request will be retried on the new primary. This can create duplicates because the request is not makred as potentially holding conflicts.

This change carries over the response for each item on the request level, and if a conflict is detected on the primary shard, and the response is there (indicating that the request was executed once already), use the mentioned response as the actual response for that bulk shard item.

On top of that, when a primary fails and is retried, the change now marks the request as potentially causing duplicates, so the actual impl will do the extra lookup needed.

This change also fixes a bug in our exception handling on the replica, where if a specific item failed, and its not an exception we can ignore, we should actually cause the shard to fail.

closes #7729
kimchy added a commit that referenced this pull request Sep 16, 2014
When executing a bulk request, with create index operation and auto generate id, if while the primary is relocating the bulk is executed, and the relocation is done while N items from the bulk have executed, the full shard bulk request will be retried on the new primary. This can create duplicates because the request is not makred as potentially holding conflicts.

This change carries over the response for each item on the request level, and if a conflict is detected on the primary shard, and the response is there (indicating that the request was executed once already), use the mentioned response as the actual response for that bulk shard item.

On top of that, when a primary fails and is retried, the change now marks the request as potentially causing duplicates, so the actual impl will do the extra lookup needed.

This change also fixes a bug in our exception handling on the replica, where if a specific item failed, and its not an exception we can ignore, we should actually cause the shard to fail.

closes #7729
@kimchy kimchy deleted the bulk_primary_relocation branch September 16, 2014 13:33
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 17, 2014
bleskes added a commit that referenced this pull request Sep 22, 2014
bleskes added a commit that referenced this pull request Sep 22, 2014
bleskes added a commit that referenced this pull request Sep 22, 2014
kimchy added a commit that referenced this pull request Sep 22, 2014
When executing a bulk request, with create index operation and auto generate id, if while the primary is relocating the bulk is executed, and the relocation is done while N items from the bulk have executed, the full shard bulk request will be retried on the new primary. This can create duplicates because the request is not makred as potentially holding conflicts.

This change carries over the response for each item on the request level, and if a conflict is detected on the primary shard, and the response is there (indicating that the request was executed once already), use the mentioned response as the actual response for that bulk shard item.

On top of that, when a primary fails and is retried, the change now marks the request as potentially causing duplicates, so the actual impl will do the extra lookup needed.

This change also fixes a bug in our exception handling on the replica, where if a specific item failed, and its not an exception we can ignore, we should actually cause the shard to fail.

closes #7729
@kimchy kimchy added the v1.3.3 label Sep 22, 2014
@kimchy
Copy link
Member Author

kimchy commented Sep 22, 2014

pushed to 1.3.3 as well

kimchy added a commit that referenced this pull request Sep 22, 2014
kimchy added a commit that referenced this pull request Sep 22, 2014
kimchy added a commit that referenced this pull request Sep 22, 2014
@clintongormley clintongormley changed the title Bulk operation can create duplicates on primary relocation Bulk API: Bulk operation can create duplicates on primary relocation Sep 26, 2014
@jpountz jpountz removed the review label Oct 21, 2014
@s1monw s1monw mentioned this pull request Dec 9, 2014
@clintongormley clintongormley changed the title Bulk API: Bulk operation can create duplicates on primary relocation Bulk operation can create duplicates on primary relocation Jun 7, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
When executing a bulk request, with create index operation and auto generate id, if while the primary is relocating the bulk is executed, and the relocation is done while N items from the bulk have executed, the full shard bulk request will be retried on the new primary. This can create duplicates because the request is not makred as potentially holding conflicts.

This change carries over the response for each item on the request level, and if a conflict is detected on the primary shard, and the response is there (indicating that the request was executed once already), use the mentioned response as the actual response for that bulk shard item.

On top of that, when a primary fails and is retried, the change now marks the request as potentially causing duplicates, so the actual impl will do the extra lookup needed.

This change also fixes a bug in our exception handling on the replica, where if a specific item failed, and its not an exception we can ignore, we should actually cause the shard to fail.

closes elastic#7729
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
When executing a bulk request, with create index operation and auto generate id, if while the primary is relocating the bulk is executed, and the relocation is done while N items from the bulk have executed, the full shard bulk request will be retried on the new primary. This can create duplicates because the request is not makred as potentially holding conflicts.

This change carries over the response for each item on the request level, and if a conflict is detected on the primary shard, and the response is there (indicating that the request was executed once already), use the mentioned response as the actual response for that bulk shard item.

On top of that, when a primary fails and is retried, the change now marks the request as potentially causing duplicates, so the actual impl will do the extra lookup needed.

This change also fixes a bug in our exception handling on the replica, where if a specific item failed, and its not an exception we can ignore, we should actually cause the shard to fail.

closes elastic#7729
@lcawl lcawl added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Bulk labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >bug :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. resiliency v1.3.3 v1.4.0.Beta1 v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants