New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resiliency: Perform write consistency check just before writing on the primary shard #7873
Resiliency: Perform write consistency check just before writing on the primary shard #7873
Conversation
@@ -506,6 +484,7 @@ public void handleException(TransportException exp) { | |||
void retry(@Nullable final Throwable failure) { | |||
if (observer.isTimedOut()) { | |||
// we running as a last attempt after a timeout has happened. don't retry | |||
raiseTimeoutFailure(internalRequest.request().timeout(), failure); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note sure, but this seems really crucial (and thus a good catch). Do we have a test that verifies indexing timeout works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh , I see. This is a protection against recursion during a last attempt we do on timeout, where the timeout flag is set (isTimeOut doesn't actually check for timeouts). This pattern smells (and is probably my fault, if I didn't take over an existing pattern) but I'm worried this change is no good? (and we need a comment explaining why).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without this change the WriteConsistencyLevelTests failed all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one line change is okay and not bad at all. Because the write validation check now happens on in performOnPrimary() the doStart() will return true even if the check fails, so the following check in the onTimeout() method on the observer listener doesn't help in case the write check fails:
if (doStart()) {
return;
}
raiseTimeoutFailure(timeout, failure);
If the write check fails we always invoke the retry(...) method, so I think it is fine to invoke raiseTimeoutFailure() there as well.
@bleskes I updated the PR. |
this LGTM @bleskes your call |
} | ||
blockException = checkRequestBlock(observer.observedState(), internalRequest); | ||
if (blockException != null) { | ||
if (blockException.retryable()) { | ||
logger.trace("cluster is blocked ({}), scheduling a retry", blockException.getMessage()); | ||
retry(blockException); | ||
return false; | ||
return; | ||
} else { | ||
throw blockException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the change, but I think this is bad - we don't always catch this exceptions - for example where we're on the onTimeout handler of the observer. I think we should change the observer listeners to have an on failure - but that would be another change. In the meantime - can we make sure doStart doesn't throw exceptions by design but rather calls the listener.onFailure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scratch that - it's caught later on in the code.
The new retry logic is much cleaner. I left some minor comments. O.w. LGTM. |
@bleskes Thanks for the review. I've updated the PR. |
Thx @martijnvg . Left one little comment about the change. I'm good with this, +1 |
Before this change the write consistency change was performed on the node that receives the write request and the node that holds the primary shard. This change removes the check on the node that receives the request, since it is redundant. Also this change moves the write consistency check on the node that holds the primary shard to a later moment after forking of the thread to perform the actual write on the primary shard. Closes elastic#7873
0fd8232
to
e1a8b02
Compare
Before this change the write consistency change was performed on the node that receives the write request and the node that holds the primary shard. This change removes the check on the node that receives the request, since it is redundant. Also this change moves the write consistency check on the node that holds the primary shard to a later moment after forking of the thread to perform the actual write on the primary shard. Closes #7873
Before this change the write consistency change was performed on the node that receives the write request and the node that holds the primary shard. This change removes the check on the node that receives the request, since it is redundant. Also this change moves the write consistency check on the node that holds the primary shard to a later moment after forking of the thread to perform the actual write on the primary shard. Closes elastic#7873
Before this change the write consistency change was performed on the node that receives the write request and the node that holds the primary shard. This change removes the check on the node that receives the request, since it is redundant. Also this change moves the write consistency check on the node that holds the primary shard to a later moment after forking of the thread to perform the actual write on the primary shard. Closes elastic#7873
Before this change the write consistency change was performed on the node that receives the write request and the node that holds the primary shard. This change removes the check on the node that receives the request, since it is redundant.
Also this change moves the write consistency check on the node that holds the primary shard to a later moment after forking of the thread to perform the actual write on the primary shard.