KAFKA-14154; Ensure AlterPartition not sent to stale controller #12499

hachikuji · 2022-08-10T01:40:12Z

It is possible currently for a leader to send an AlterPartition request to a stale controller which does not have the latest leader epoch discovered through a LeaderAndIsr request. In this case, the stale controller returns FENCED_LEADER_EPOCH, which causes the partition leader to get stuck. This is a change in behavior following #12032. Prior to that patch, the request would either be accepted (potentially incorrectly) if the LeaderAndIsr state matched that on the controller, or it would have returned NOT_CONTROLLER after the stale controller failed to apply the update to Zookeeper.

This patch fixes the problem by ensuring that AlterPartition is sent to a controller with an epoch which is at least as large as that of the controller which sent the LeaderAndIsr request. The way this is achieved is by tracking the controller epoch in BrokerToControllerChannelManager and ensuring that it is only updated monotonically regardless of the source. If we find a controller epoch through LeaderAndIsr which is larger than what we have in the MetadataCache, then the controller node is reset and we wait until we have discovered the controller node with a higher epoch. This ensures that the FENCED_LEADER_EPOCH error from the controller can be trusted.

A more elegant solution to this problem would probably be to include the controller epoch in the AlterPartition request, but this would require a version bump. Alternatively, we considered letting the controller return UNKNOWN_LEADER_EPOCH instead of FENCED_LEADER_EPOCH when the epoch is larger than what it has in its context. This too likely would require a version bump. Finally, we considered reverting #12032, which would restore the looser validation logic which allows the controller to accept AlterPartition requests with larger leader epochs. We rejected this option because we feel it can lead to correctness violations.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

andymg3 · 2022-08-10T03:09:10Z

core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala

-            handleResponse(request)
-          ))
+        controllerOpt.foreach { activeController =>
+          if (activeController.epoch >= request.minControllerEpoch) {


To confirm, this check is done on the broker side right? I guess you sort of allude to this in the PR description that potentially a more ideal solution would be for the controller to do the check server side, but that would require a version bump.

Yes, that's right.

andymg3 · 2022-08-10T03:11:28Z

core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala

-          info(s"Recorded new controller, from now on will use broker $controllerNode")
-          updateControllerAddress(controllerNode)
-          metadataUpdater.setNodes(Seq(controllerNode).asJava)
+        case Some(controllerNodeAndEpoch) =>


Is this where/how eventually the LeaderAndIsr from the new controller gets applied?

jolshan · 2022-08-10T16:31:15Z

This patch fixes the problem by ensuring that AlterPartition is sent to a controller with an epoch which is at least as large as that of the controller which sent the LeaderAndIsr request. This ensures that the FENCED_LEADER_EPOCH error from the controller can be trusted.

So we prevent sending the request if the epoch is lower? And is it the case, that there is always a controller with an epoch at least as large? Or in some cases would we need to wait/retry until such a controller exists?

hachikuji · 2022-08-10T20:39:19Z

So we prevent sending the request if the epoch is lower? And is it the case, that there is always a controller with an epoch at least as large? Or in some cases would we need to wait/retry until such a controller exists?

@jolshan Yes, that is right. Ensuring some level of monotonicity seems like a good general change even outside the original bug. It is weird to allow the broker to send requests to a controller that it knows for sure is stale, and it makes the system harder to reason about.

One thing I have been trying to think through is how this bug affects kraft. The kraft controller will also return FENCED_LEADER_EPOCH if the leader epoch is higher than what it has in its cache. But does kraft give us a stronger guarantee to work with? I think it could, but at the moment, we do not proactively reset the controller in BrokerToControllerChannelManager after we discover a new controller. We only reset it after we receive a NOT_CONTROLLER error in a request. So it seems to me that we could still hit the same problem with kraft. As a matter of fact, I think this patch does not fix the kraft problem because we do not propagate the controller epoch down to Partition so that it can be used in AlterPartition requests. cc @jsancio

hachikuji · 2022-08-11T16:40:46Z

I am going to close this PR. On the one hand, it does not address the problem for KRaft; on the other, we have thought of a simpler fix for the zk controller, which I will open shortly.

KAFKA-14154; Ensure AlterPartition not sent to stale controller

0b4b303

andymg3 reviewed Aug 10, 2022

View reviewed changes

jsancio added core Kafka Broker controller Kafka Controller 3.3 labels Aug 10, 2022

Jason Gustafson added 2 commits August 10, 2022 10:29

Fix AlterPartitionManagerTest failures

33499af

Reset current controller if we see a larger epoch

5693ed4

hachikuji force-pushed the KAFKA-14154 branch from 94c4f06 to 5693ed4 Compare August 10, 2022 19:50

Rename misleading variable

4f06b6b

hachikuji closed this Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-14154; Ensure AlterPartition not sent to stale controller #12499

KAFKA-14154; Ensure AlterPartition not sent to stale controller #12499

hachikuji commented Aug 10, 2022 •

edited

Loading

andymg3 Aug 10, 2022

hachikuji Aug 10, 2022

andymg3 Aug 10, 2022 •

edited

Loading

jolshan commented Aug 10, 2022

hachikuji commented Aug 10, 2022

hachikuji commented Aug 11, 2022

KAFKA-14154; Ensure AlterPartition not sent to stale controller #12499

KAFKA-14154; Ensure AlterPartition not sent to stale controller #12499

Conversation

hachikuji commented Aug 10, 2022 • edited Loading

Committer Checklist (excluded from commit message)

andymg3 Aug 10, 2022

Choose a reason for hiding this comment

hachikuji Aug 10, 2022

Choose a reason for hiding this comment

andymg3 Aug 10, 2022 • edited Loading

Choose a reason for hiding this comment

jolshan commented Aug 10, 2022

hachikuji commented Aug 10, 2022

hachikuji commented Aug 11, 2022

hachikuji commented Aug 10, 2022 •

edited

Loading

andymg3 Aug 10, 2022 •

edited

Loading