Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 11096][Tests] Attempt to fix flaky test ResourceGroupConfigListenerTest #11601

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Aug 9, 2021

Fixes #11096

Motivation

ResourceGroupConfigListenerTest is very flaky. See #11096 .

Modifications

Attempt to reduce flakiness by:

  • using unique resource group name for each test method
  • using unique tenant name for testResourceGroupAttachToNamespace test

@lhotari lhotari added this to the 2.9.0 milestone Aug 9, 2021
@lhotari lhotari self-assigned this Aug 9, 2021
@lhotari
Copy link
Member Author

lhotari commented Aug 9, 2021

@bharanic-dev please review this PR

@lhotari lhotari force-pushed the lh-fix-flaky-ResourceGroupConfigListenerTest branch from f069201 to baf6ba8 Compare August 9, 2021 17:26
@bharanic-dev
Copy link
Contributor

@lhotari thank you for making the changes. The changes look good to me in general and I think they are in the right direction. But I am not sure it explains the failure in the test case. The real issue is that with the metadata-store implementation, there is a race condition that can result in ZK the watch events getting missed. I filed the following issue for it, also explaining the cases where the race condition can happen.

#11157

I discussed this with @merlimat. @merlimat has a PR to address the issue, but it is not yet merged.

#11198

Your changes will likely not fix the problem, but change the timing such that it is difficult to get the race condition to happen. I suggest we wait for the above PR to be merged before we merge the changes you did to the test case.

@lhotari
Copy link
Member Author

lhotari commented Aug 9, 2021

@bharanic-dev Thanks for the providing the details of the work that has been going on behind the scenes to fix the root cause.

@lhotari
Copy link
Member Author

lhotari commented Aug 9, 2021

the changes don't remove the flakiness. The problem occured in the most recent build:

Error:  Tests run: 24, Failures: 6, Errors: 0, Skipped: 18, Time elapsed: 32.989 s <<< FAILURE! - in org.apache.pulsar.broker.resourcegroup.ResourceGroupConfigListenerTest
Error:  testResourceGroupAttachToNamespace(org.apache.pulsar.broker.resourcegroup.ResourceGroupConfigListenerTest)  Time elapsed: 10.123 s  <<< FAILURE!
org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a lambda expression in org.apache.pulsar.broker.resourcegroup.ResourceGroupConfigListenerTest expected object to not be null within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679)
	at org.apache.pulsar.broker.resourcegroup.ResourceGroupConfigListenerTest.createResourceGroup(ResourceGroupConfigListenerTest.java:80)
	at org.apache.pulsar.broker.resourcegroup.ResourceGroupConfigListenerTest.testResourceGroupAttachToNamespace(ResourceGroupConfigListenerTest.java:135)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.AssertionError: expected object to not be null
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.assertNotNull(Assert.java:942)
	at org.testng.Assert.assertNotNull(Assert.java:926)
	at org.apache.pulsar.broker.resourcegroup.ResourceGroupConfigListenerTest.lambda$createResourceGroup$0(ResourceGroupConfigListenerTest.java:83)
	at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:222)
	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:209)
	... 4 more

@merlimat
Copy link
Contributor

@lhotari @bharanic-dev #11198 should be ready to go

@lhotari
Copy link
Member Author

lhotari commented Aug 10, 2021

closing this since #11198 should fix the flakiness issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky-test: ResourceGroupConfigListenerTest. testResourceGroupAttachToNamespace
3 participants