New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-14498: reduce the startup nodes to avoid timeout error #13016
Conversation
@dengziming , please take a look. Thanks. |
@ClusterTest(clusterType = Type.CO_KRAFT, brokers = 2, controllers = 3), | ||
@ClusterTest(clusterType = Type.KRAFT, brokers = 2, controllers = 3) | ||
@ClusterTest(clusterType = Type.CO_KRAFT, brokers = 1, controllers = 1), | ||
@ClusterTest(clusterType = Type.KRAFT, brokers = 1, controllers = 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name indicates its testing replication, that doesn't happen if there's a single broker & controller, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Let me use 2 + 2 nodes instead
@ijuma , PR updated. I changed to 2 brokers + 2 controllers for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wish this can fix the flaky tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, do we support running kraft with just 2 members in the controller quorum?
Yes, I don't see why we can't support that. The quorum of 2 members will be 2. It just doesn't tolerate missing any node. |
I guess you're saying we don't care about the ability to tolerate failures in this particular test. Fair enough. |
Yes, you're right. Thanks for the comment. |
Failed tests are unrelated
|
@ableegoldman , this is to help make some tests reliable by decreasing the testing nodes (low risk). Do you think this should backport to 3.4? |
@showuon yes, thanks, I'll backport it to 3.4 |
In MetadataQuorumCommandTest, we sometimes got the error: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Received a fatal error while waiting for the broker to catch up with the current cluster metadata. Since we tried to bring up 3 broker + 3 controllers at the same time, and the config initial.broker.registration.timeout.ms (default 1 min) is sometimes not enough for them to start up. Checking the tests, it doesn't require so many nodes. Reducing the nodes number to make these tests reliable. Reviewers: dengziming <dengziming1993@gmail.com>, Ismael Juma <ismael@juma.me.uk>
…13016) In MetadataQuorumCommandTest, we sometimes got the error: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Received a fatal error while waiting for the broker to catch up with the current cluster metadata. Since we tried to bring up 3 broker + 3 controllers at the same time, and the config initial.broker.registration.timeout.ms (default 1 min) is sometimes not enough for them to start up. Checking the tests, it doesn't require so many nodes. Reducing the nodes number to make these tests reliable. Reviewers: dengziming <dengziming1993@gmail.com>, Ismael Juma <ismael@juma.me.uk>
In
MetadataQuorumCommandTest
, we sometimes got the error:Since we tried to bring up 3 broker + 3 controllers at the same time, and the config
initial.broker.registration.timeout.ms
(default 1 min) is sometimes not enough for them to start up. Checking the tests, it doesn't require so many nodes. Reducing the nodes number to make these tests reliable.Committer Checklist (excluded from commit message)