Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to properly ack translog ops during wait on mapping changes #13535

Closed
wants to merge 1 commit into from

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Sep 12, 2015

During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work.

Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/

…changes

During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propogated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work.

Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/
@bleskes bleskes added >bug review :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v2.1.0 v2.0.0 v5.0.0-alpha1 labels Sep 12, 2015
@bleskes
Copy link
Contributor Author

bleskes commented Sep 12, 2015

@s1monw can you take a look?

@@ -304,6 +304,7 @@ public void messageReceived(final RecoveryTranslogOperationsRequest request, fin
assert recoveryStatus.indexShard().recoveryState() == recoveryStatus.state();
try {
recoveryStatus.indexShard().performBatchRecovery(request.operations());
channel.sendResponse(TransportResponse.Empty.INSTANCE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

man, any chance we can unittest this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have those, but it will be a bigger rewrite - too many incoming dependencies. Next time we re-write something in this class...

@s1monw
Copy link
Contributor

s1monw commented Sep 13, 2015

LGTM left one comment

@bleskes bleskes closed this in a758eec Sep 13, 2015
bleskes added a commit that referenced this pull request Sep 13, 2015
…changes

During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work.

Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/

Closes #13535
bleskes added a commit that referenced this pull request Sep 13, 2015
…changes

During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work.

Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/

Closes #13535
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v2.0.0-beta2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants