New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-15036: Add a test case for controller failover #13777
Conversation
@showuon @KarboniteKream @hachikuji I think this bug is very serious since it will stop any leader failover from functioning normally, so please take a look as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed locally and on a deployed cluster that the error is no longer being reproduced
metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the new test – testCreateClusterAndRestartControllerNode
checking for this issue?
metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments @soarez @KarboniteKream , I have fixed them. @cmccabe do you have some suggestions for this change, you are familiar with the changes in QuorumController
and ControllerServer
.
metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
Outdated
Show resolved
Hide resolved
@@ -96,6 +96,32 @@ class KRaftClusterTest { | |||
} | |||
} | |||
|
|||
@Test | |||
def testCreateClusterAndRestartControllerNode(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soarez Yes, we shutdown the active controller, then restart it to make it send ApiVersionRequest
to a standby controller.
this will be fixed in #13799 |
More detailed description of your change
We introduced a bug when updating
handleApiVersionRequest
but it's not detected by CI, the bug has been fixed and we should add a test case for controller failover.Summary of testing strategy (including rationale)
KRaftClusterTest.testCreateClusterAndRestartControllerNode()
Committer Checklist (excluded from commit message)