Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Reduce flakiness of BacklogQuotaManagerTest #10572

Merged

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented May 13, 2021

Motivation

BacklogQuotaManagerTest is flaky.

one example:

Error:  Tests run: 21, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 280.77 s <<< FAILURE! - in org.apache.pulsar.broker.service.BacklogQuotaManagerTest
Error:  testTriggerBacklogQuotaSizeWithReader(org.apache.pulsar.broker.service.BacklogQuotaManagerTest)  Time elapsed: 4.371 s  <<< FAILURE!
java.lang.AssertionError: expected [1] but found [0]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotEquals(Assert.java:1037)
	at org.testng.Assert.assertEqualsImpl(Assert.java:140)
	at org.testng.Assert.assertEquals(Assert.java:122)
	at org.testng.Assert.assertEquals(Assert.java:907)
	at org.testng.Assert.assertEquals(Assert.java:917)
	at org.apache.pulsar.broker.service.BacklogQuotaManagerTest.testTriggerBacklogQuotaSizeWithReader(BacklogQuotaManagerTest.java:255)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

this happeneded in https://github.com/apache/pulsar/pull/10566/checks?check_run_id=2573662149#step:10:1928

Several test methods are flaky. Locally testConsumerBacklogEvictionWithAckTimeQuota was very flaky and that issue was also fixed.

Modifications

  • use Awaitility for checking PersistentTopicInternalStats in a few test methods
  • disable batching in all producers
  • cleanup test by using the TIME_TO_CHECK_BACKLOG_QUOTA constant
  • speed up test by reducing TIME_TO_CHECK_BACKLOG_QUOTA from 5 seconds to 3 seconds
  • fix race condition in testConsumerBacklogEvictionWithAckTimeQuota which was causing flakiness
    • resolve the race by pausing for 1 second after acknowledging 10 messages for the 1. consumer. This makes the ledgers to expire at different times so that the last ledger with 4 messages will expire later than others.

@lhotari lhotari added this to the 2.8.0 milestone May 13, 2021
@lhotari lhotari self-assigned this May 13, 2021
@lhotari lhotari requested a review from eolivelli May 13, 2021 10:12
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eolivelli
Copy link
Contributor

it looks like that the test you wanted to fix failed on CI

https://github.com/apache/pulsar/pull/10572/checks?check_run_id=2575259667

image

- use Awaitility for checking PersistentTopicInternalStats
@lhotari lhotari force-pushed the lh-reduce-BacklogQuotaManagerTest-flakiness branch from f3aef92 to e7f14e2 Compare May 13, 2021 14:24
@lhotari
Copy link
Member Author

lhotari commented May 13, 2021

it looks like that the test you wanted to fix failed on CI

@eolivelli Thanks for pointing that out. I made some more changes in the PR to address flakiness. In particular, there was a challenge with the backlog size which seems to be off by one in some cases. I added some comments in the changes about it. PTAL

@eolivelli eolivelli merged commit 3831cd1 into apache:master May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants