-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-6916: Refresh metadata in admin client if broker connection fails #5050
KAFKA-6916: Refresh metadata in admin client if broker connection fails #5050
Conversation
@@ -123,7 +123,11 @@ public void handleCompletedMetadataResponse(RequestHeader requestHeader, long no | |||
|
|||
@Override | |||
public void requestUpdate() { | |||
// Do nothing | |||
AdminMetadataManager.this.requestUpdate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that AdminMetadataManager.metadataFetchDelayMs()
always return 0 if state is UPDATE_REQUESTED
. This patch can make it more likely for AdminClient to ddos broker with a lot of MetadataRequest. Would it be safer to respect refreshBackoffMs
when the metadata update is explicitly requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this has already been fixed in trunk.
// For calls which have not yet been sent, update the node to send to if | ||
// metadata has been refreshed. This is to handle controller change and | ||
// node disconnections. | ||
if (metadataManager.updater().lastMetadataUpdateMs() > lastIterationTimeMs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more intuitive to put this logic in AdminClientRunnable.makeMetadataCall.handleResponse()? This allows us to reorder calls in callsToSend
only when the MetadataResponse has no error. And we don't have to check timestamp to know whether the metadata has been updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lindong28 Thanks for the review. Yes, makes sense. Updated.
b0bbea1
to
6ba1283
Compare
@@ -1138,7 +1138,18 @@ private Call makeMetadataCall(long now) { | |||
@Override | |||
public void handleResponse(AbstractResponse abstractResponse) { | |||
MetadataResponse response = (MetadataResponse) abstractResponse; | |||
metadataManager.update(response.cluster(), time.milliseconds()); | |||
long now = time.milliseconds(); | |||
metadataManager.update(response.cluster(), now); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good find. What happens if the metadata request itself is queued up to be sent to a node which is no longer online? Will it be stuck in callsToSend
until it times out? I am wondering if we should check NetworkClient.connectionFailed
after every poll() for all requests in callsToSend
and reenqueue them as we are doing here. This is what we do in ConsumerNetworkClient
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hachikuji Thanks for the review. I think metadata requests are handled differently because they use LeastLoadedNodeProvider
which assigns a node only if a ready node is available. If no ready node is available, then the call stays in pendingRequests
and gets moved to callsToSend
only when a node becomes ready after a subsequent poll. If a ready node is available, then it gets moved to callsToSend
and is removed from callsToSend
in the same iteration when the request is queued for send. If disconnection is processed in a subsequent poll, then the call is failed and retried using the retry path. We would never expect to see a metadata request in callsToSend
at this point. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure about that. The metadata request uses MetadataUpdateNodeIdProvider
, which just calls leastLoadedNode
directly. But the node chosen may not have an established connection, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hachikuji Sorry, that was my mistake. I thought that leastNodedNode
returned ready nodes, but that is not the case. Have updated the code.
* @param disconnectedOnly Reassign only calls to nodes that were disconnected | ||
* in the last poll | ||
*/ | ||
private void reassignUnsentCalls(long now, boolean disconnectedOnly) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we can use java 8 lambdas, I wonder if we can do this with a Predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the patch.
…ls (apache#5050) Refresh metadata if broker connection fails so that new calls are sent only to nodes that are alive and requests to controller are sent to the new controller if controller changes due to broker failure. Also reassign calls that could not be sent. Reviewers: Dong Lin <lindong28@gmail.com>, Jason Gustafson <jason@confluent.io>
Refresh metadata if broker connection fails so that new calls are sent only to nodes that are alive and requests to controller are sent to the new controller if controller changes due to broker failure. Also reassign calls that could not be sent.
Committer Checklist (excluded from commit message)