-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-7168: Treat connection close during SSL handshake as retriable #5371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, left a few comments.
@@ -260,12 +262,14 @@ public void handshake() throws IOException { | |||
maybeThrowSslAuthenticationException(); | |||
|
|||
// this exception could be due to a write. If there is data available to unwrap, | |||
// process the data so that any SSLExceptions are reported | |||
// process the data so that any SSL handshake exceptions are reported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to elaborate on why we need to do this as it's not obvious by just reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments in a new method to process the exceptions.
handshakeFailure(e1, false); | ||
} catch (SSLException e1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be rethrowing the IOException
in this case as per the comment "If we get here, this is not a handshake failure, throw the original IOException"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I assume the IOException
we get doesn't provide any clues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, IOException
didn't give any clues here. We were falling through to propagate the original exception earlier, but I have added a method to process the exception now.
@ijuma Thanks for the review. Added a couple more tests and in the end I had to rely on the exception String to cover some of the cases. Can you take another look? |
// We want to handle a) as a non-retriable SslAuthenticationException and b) as a retriable IOException. | ||
// To do this we need to rely on the exception string. Since it is safer to throw a retriable exception | ||
// when we are not sure, we will treat only the first exception string as a handshake exception. | ||
private void maybeProcessHandshakeFailure(SSLException sslException, boolean flush, IOException ioException) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easier to understand if this handled all of the unwrap exceptions after the IOException? And then we could call this method processUnwrapExceptionAfterIOException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ijuma Thanks for the review. At the moment, this method is used to process SSLException
in two places, regardless of whether the original exception was after IOException
or not. I thought it would be better to do the String check in a single method rather than separate out handling of IOException
. We need to handle both cases because SSLException
due to close_notify
may be processed before or after IOException
. I can use two methods if a single method is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. So the name doesn't work, but in both cases we have two catches like:
} catch (SSLHandshakeException | SSLProtocolException | SSLPeerUnverifiedException | SSLKeyException e) {
...
} catch (SSLException e1) {
...
}
Maybe we can catch SSLException and then pass it to the shared method. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ijuma Yes, I was in two minds about that - whether to use instanceof
in one place or catch the exception in two places. I have updated to use the common method. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, LGTM. Just a question and a minor suggestion.
if (sslException instanceof SSLHandshakeException || sslException instanceof SSLProtocolException || | ||
sslException instanceof SSLPeerUnverifiedException || sslException instanceof SSLKeyException) { | ||
handshakeFailure(sslException, flush); | ||
} else if (sslException.getMessage().contains("Unrecognized SSL message")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an additional ||
in the previous if
?
|
||
/** | ||
* Tests that if the remote end closes connection ungracefully during SSL handshake while writing data, | ||
* the disconnection is not treated as an authentication failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question: how would the test fail if the disconnection was treated as an authentication failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertTrue("Unexpected channel state " + state, state == ChannelState.State.AUTHENTICATE || state == ChannelState.State.READY)
All the tests check the state when the channel is disconnected. state
will be AUTHENTICATION_FAILED
if the disconnection is treated as an authentication failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, makes sense.
@ijuma Thanks for the reviews, merging to trunk and 2.0. |
…5371) SSL `close_notify` from broker connection close was processed as a handshake failure in clients while unwrapping the message if a handshake is in progress. Updated to handle this as a retriable IOException rather than a non-retriable SslAuthenticationException to avoid authentication exceptions in clients during rolling restart of brokers. Reviewers: Ismael Juma <ismael@juma.me.uk>
SSL
close_notify
from broker connection close is processed as anSSLException
while unwrapping the final message when the I/O exception due to remote close is processed. This should be handled as a retriableIOException
rather than a non-retriableSslAuthenticationException
.Committer Checklist (excluded from commit message)