-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-14154; Ensure AlterPartition not sent to stale controller #12499
Conversation
handleResponse(request) | ||
)) | ||
controllerOpt.foreach { activeController => | ||
if (activeController.epoch >= request.minControllerEpoch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, this check is done on the broker side right? I guess you sort of allude to this in the PR description that potentially a more ideal solution would be for the controller to do the check server side, but that would require a version bump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right.
info(s"Recorded new controller, from now on will use broker $controllerNode") | ||
updateControllerAddress(controllerNode) | ||
metadataUpdater.setNodes(Seq(controllerNode).asJava) | ||
case Some(controllerNodeAndEpoch) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this where/how eventually the LeaderAndIsr
from the new controller gets applied?
So we prevent sending the request if the epoch is lower? And is it the case, that there is always a controller with an epoch at least as large? Or in some cases would we need to wait/retry until such a controller exists? |
@jolshan Yes, that is right. Ensuring some level of monotonicity seems like a good general change even outside the original bug. It is weird to allow the broker to send requests to a controller that it knows for sure is stale, and it makes the system harder to reason about. One thing I have been trying to think through is how this bug affects kraft. The kraft controller will also return |
I am going to close this PR. On the one hand, it does not address the problem for KRaft; on the other, we have thought of a simpler fix for the zk controller, which I will open shortly. |
It is possible currently for a leader to send an
AlterPartition
request to a stale controller which does not have the latest leader epoch discovered through aLeaderAndIsr
request. In this case, the stale controller returnsFENCED_LEADER_EPOCH
, which causes the partition leader to get stuck. This is a change in behavior following #12032. Prior to that patch, the request would either be accepted (potentially incorrectly) if theLeaderAndIsr
state matched that on the controller, or it would have returnedNOT_CONTROLLER
after the stale controller failed to apply the update to Zookeeper.This patch fixes the problem by ensuring that
AlterPartition
is sent to a controller with an epoch which is at least as large as that of the controller which sent theLeaderAndIsr
request. The way this is achieved is by tracking the controller epoch inBrokerToControllerChannelManager
and ensuring that it is only updated monotonically regardless of the source. If we find a controller epoch throughLeaderAndIsr
which is larger than what we have in theMetadataCache
, then the controller node is reset and we wait until we have discovered the controller node with a higher epoch. This ensures that theFENCED_LEADER_EPOCH
error from the controller can be trusted.A more elegant solution to this problem would probably be to include the controller epoch in the
AlterPartition
request, but this would require a version bump. Alternatively, we considered letting the controller returnUNKNOWN_LEADER_EPOCH
instead ofFENCED_LEADER_EPOCH
when the epoch is larger than what it has in its context. This too likely would require a version bump. Finally, we considered reverting #12032, which would restore the looser validation logic which allows the controller to acceptAlterPartition
requests with larger leader epochs. We rejected this option because we feel it can lead to correctness violations.Committer Checklist (excluded from commit message)