Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8967 Flaky test kafka.api.SaslSslAdminIntegrationTest.testCreat… #8137

Closed
wants to merge 1 commit into from

Conversation

chia7712
Copy link
Contributor

@chia7712 chia7712 commented Feb 19, 2020

The other brokers sync ACLs from zk notification so the sync may be slower than the Assert. The fix is to wait all brokers to sync the ACLs.

https://issues.apache.org/jira/browse/KAFKA-8967

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@chia7712
Copy link
Contributor Author

org.apache.kafka.clients.admin.KafkaAdminClientTest.testDefaultApiTimeoutOverride
kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

they are unrelated flaky

@chia7712
Copy link
Contributor Author

retest this please

2 similar comments
@chia7712
Copy link
Contributor Author

retest this please

@chia7712
Copy link
Contributor Author

retest this please

@chia7712
Copy link
Contributor Author

retest this please

@chia7712
Copy link
Contributor Author

looped SaslSslAdminIntegrationTest 50 times. pass

@mjsax could you take a look? thanks!

@ijuma
Copy link
Contributor

ijuma commented Mar 30, 2020

Not sure this approach is best since user applications don't do this.

@chia7712
Copy link
Contributor Author

Not sure this approach is best since user applications don't do this.

the another approach is to change the "target node" from "LeastLoadedNode" to "ControllerNode" for all methods of Admin so the later operations sent by same Admin benefit from the acl cache (in ControllerNode). However, the downside is a bunch of traffic will be on single node.

On the other side, the document of #createAcls does not include the acl asynchronization on zk. Personally, we should highlight this scenario for users. There is a good reference on Confluent website.

Note that ACLs are stored in ZooKeeper and they are propagated to the brokers asynchronously so there may be a delay before the change takes effect even after the command returns.

@mjsax mjsax added core tests Test fixes (including flaky tests) labels Mar 30, 2020
@mjsax
Copy link
Member

mjsax commented Mar 30, 2020

I am not familiar with the details of this test and won't be able to review. Maybe @rajinisivaram @cmccabe @hachikuji can help?

@@ -291,6 +291,9 @@ default CreateAclsResult createAcls(Collection<AclBinding> acls) {
* If you attempt to add an ACL that duplicates an existing ACL, no error will be raised, but
* no changes will be made.
* <p>
* Note that ACLs are stored in ZooKeeper and they are propagated to the brokers asynchronously so there may be a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add document to highlight the behavior of syncing acls to all brokers.

@chia7712 chia7712 closed this Jul 25, 2020
@chia7712 chia7712 deleted the fix_8967 branch March 25, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core tests Test fixes (including flaky tests)
Projects
None yet
3 participants