Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-7168: Treat connection close during SSL handshake as retriable #5371

Merged
merged 4 commits into from
Jul 18, 2018

Conversation

rajinisivaram
Copy link
Contributor

SSL close_notify from broker connection close is processed as an SSLException while unwrapping the final message when the I/O exception due to remote close is processed. This should be handled as a retriable IOException rather than a non-retriable SslAuthenticationException.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Contributor

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, left a few comments.

@@ -260,12 +262,14 @@ public void handshake() throws IOException {
maybeThrowSslAuthenticationException();

// this exception could be due to a write. If there is data available to unwrap,
// process the data so that any SSLExceptions are reported
// process the data so that any SSL handshake exceptions are reported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to elaborate on why we need to do this as it's not obvious by just reading the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments in a new method to process the exceptions.

handshakeFailure(e1, false);
} catch (SSLException e1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be rethrowing the IOException in this case as per the comment "If we get here, this is not a handshake failure, throw the original IOException"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I assume the IOException we get doesn't provide any clues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, IOException didn't give any clues here. We were falling through to propagate the original exception earlier, but I have added a method to process the exception now.

@rajinisivaram
Copy link
Contributor Author

rajinisivaram commented Jul 17, 2018

@ijuma Thanks for the review. Added a couple more tests and in the end I had to rely on the exception String to cover some of the cases. Can you take another look?

// We want to handle a) as a non-retriable SslAuthenticationException and b) as a retriable IOException.
// To do this we need to rely on the exception string. Since it is safer to throw a retriable exception
// when we are not sure, we will treat only the first exception string as a handshake exception.
private void maybeProcessHandshakeFailure(SSLException sslException, boolean flush, IOException ioException) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to understand if this handled all of the unwrap exceptions after the IOException? And then we could call this method processUnwrapExceptionAfterIOException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma Thanks for the review. At the moment, this method is used to process SSLException in two places, regardless of whether the original exception was after IOException or not. I thought it would be better to do the String check in a single method rather than separate out handling of IOException. We need to handle both cases because SSLException due to close_notify may be processed before or after IOException. I can use two methods if a single method is confusing.

Copy link
Contributor

@ijuma ijuma Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. So the name doesn't work, but in both cases we have two catches like:

} catch (SSLHandshakeException | SSLProtocolException | SSLPeerUnverifiedException | SSLKeyException e) {
  ...
} catch (SSLException e1) {
 ...
}

Maybe we can catch SSLException and then pass it to the shared method. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma Yes, I was in two minds about that - whether to use instanceof in one place or catch the exception in two places. I have updated to use the common method. Thanks.

Copy link
Contributor

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, LGTM. Just a question and a minor suggestion.

if (sslException instanceof SSLHandshakeException || sslException instanceof SSLProtocolException ||
sslException instanceof SSLPeerUnverifiedException || sslException instanceof SSLKeyException) {
handshakeFailure(sslException, flush);
} else if (sslException.getMessage().contains("Unrecognized SSL message"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an additional || in the previous if?


/**
* Tests that if the remote end closes connection ungracefully during SSL handshake while writing data,
* the disconnection is not treated as an authentication failure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: how would the test fail if the disconnection was treated as an authentication failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertTrue("Unexpected channel state " + state, state == ChannelState.State.AUTHENTICATE || state == ChannelState.State.READY)

All the tests check the state when the channel is disconnected. state will be AUTHENTICATION_FAILED if the disconnection is treated as an authentication failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense.

@rajinisivaram
Copy link
Contributor Author

@ijuma Thanks for the reviews, merging to trunk and 2.0.

@rajinisivaram rajinisivaram merged commit 0875dde into apache:trunk Jul 18, 2018
rajinisivaram added a commit that referenced this pull request Jul 18, 2018
…5371)

SSL `close_notify` from broker connection close was processed as a handshake failure in clients while unwrapping the message if a handshake is in progress. Updated to handle this as a retriable IOException rather than a non-retriable SslAuthenticationException to avoid authentication exceptions in clients during rolling restart of brokers.

Reviewers: Ismael Juma <ismael@juma.me.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants