New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to properly ack translog ops during wait on mapping changes #13535
Conversation
…changes During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propogated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work. Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/
@s1monw can you take a look? |
@@ -304,6 +304,7 @@ public void messageReceived(final RecoveryTranslogOperationsRequest request, fin | |||
assert recoveryStatus.indexShard().recoveryState() == recoveryStatus.state(); | |||
try { | |||
recoveryStatus.indexShard().performBatchRecovery(request.operations()); | |||
channel.sendResponse(TransportResponse.Empty.INSTANCE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
man, any chance we can unittest this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to have those, but it will be a bigger rewrite - too many incoming dependencies. Next time we re-write something in this class...
LGTM left one comment |
…changes During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work. Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/ Closes #13535
…changes During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work. Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/ Closes #13535
During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work.
Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/