Skip to content

[fix][test] Fix flaky ResendRequestTest.testFailoverSingleAckedNormalTopic#25343

Merged
lhotari merged 2 commits intoapache:masterfrom
merlimat:fix/flaky-ResendRequestTest
Mar 18, 2026
Merged

[fix][test] Fix flaky ResendRequestTest.testFailoverSingleAckedNormalTopic#25343
lhotari merged 2 commits intoapache:masterfrom
merlimat:fix/flaky-ResendRequestTest

Conversation

@merlimat
Copy link
Copy Markdown
Contributor

@merlimat merlimat commented Mar 17, 2026

Flaky test failure

ResendRequestTest.testFailoverSingleAckedNormalTopic:281 expected [0] but found [10]

Summary

  • Fix flaky ResendRequestTest.testFailoverSingleAckedNormalTopic which fails because messages are produced before the failover dispatcher has settled the consumer assignment.
  • Added an Awaitility wait to ensure both consumers are registered and consumer-1 is confirmed as the active consumer before producing messages.

Documentation

  • doc-not-needed
    (Your PR doesn't need any doc update)

Matching PR in forked repository

No response

Tip

Add the labels ready-to-test and area/test to trigger the CI.

…Topic

The test assumes consumer-1 is the active failover consumer, but
messages are produced before the dispatcher has settled the consumer
assignment. Add an Awaitility wait for the dispatcher to have both
consumers registered and consumer-1 as the active consumer before
producing messages.
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Mar 17, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.78%. Comparing base (be87cd9) to head (a7a76ce).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #25343       +/-   ##
=============================================
+ Coverage     37.37%   72.78%   +35.40%     
- Complexity    13188    33882    +20694     
=============================================
  Files          1897     1954       +57     
  Lines        150557   154745     +4188     
  Branches      17215    17709      +494     
=============================================
+ Hits          56276   112635    +56359     
+ Misses        86612    33084    -53528     
- Partials       7669     9026     +1357     
Flag Coverage Δ
inttests 25.87% <ø> (+0.09%) ⬆️
systests 22.49% <ø> (+0.02%) ⬆️
unittests 73.76% <ø> (+39.62%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1416 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lhotari lhotari merged commit 2aad1ce into apache:master Mar 18, 2026
101 of 105 checks passed
@merlimat merlimat deleted the fix/flaky-ResendRequestTest branch March 18, 2026 19:30
@lhotari
Copy link
Copy Markdown
Member

lhotari commented Mar 19, 2026

Remaining isssue:

  Error:  Tests run: 15, Failures: 1, Errors: 0, Skipped: 10, Time elapsed: 47.06 s <<< FAILURE! -- in org.apache.pulsar.broker.service.ResendRequestTest
  Error:  org.apache.pulsar.broker.service.ResendRequestTest.testFailoverSingleAckedNormalTopic -- Time elapsed: 10.05 s <<< FAILURE!
  org.awaitility.core.ConditionTimeoutException: Assertion condition expected [consumer-1] but found [consumer-2] within 10 seconds.
  	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
  	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
  	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
  	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
  	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)
  	at org.apache.pulsar.broker.service.ResendRequestTest.testFailoverSingleAckedNormalTopic(ResendRequestTest.java:253)
  	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  	at java.base/java.lang.Thread.run(Thread.java:1583)
  Caused by: java.lang.AssertionError: expected [consumer-1] but found [consumer-2]
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.failNotEquals(Assert.java:1577)
  	at org.testng.Assert.assertEqualsImpl(Assert.java:149)
  	at org.testng.Assert.assertEquals(Assert.java:131)
  	at org.testng.Assert.assertEquals(Assert.java:655)
  	at org.testng.Assert.assertEquals(Assert.java:665)
  	at org.apache.pulsar.broker.service.ResendRequestTest.lambda$testFailoverSingleAckedNormalTopic$0(ResendRequestTest.java:259)
  	at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
  	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:248)
  	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:235)
  	... 4 more

This happened after this PR was merged.
https://github.com/apache/pulsar/actions/runs/23275934973/job/67696607863?pr=25358#step:11:954

merlimat added a commit to merlimat/pulsar that referenced this pull request Mar 19, 2026
…Topic

Follow-up to apache#25343. The previous fix waited for consumer-1 to become
active before creating consumer-2, but the failover dispatcher
re-evaluates active consumer assignment when consumer-2 subscribes
(pickAndScheduleActiveConsumer), which can flip the active consumer.

Instead of assuming consumer-1 is always active, dynamically identify
which consumer is active after both are subscribed by querying the
dispatcher. Assign the active consumer to the "consumer1" variable
(used for receiving, acking, redelivery, close) and the standby to
"consumer2" (expected to receive 0 messages until consumer1 closes).

Verified 100/100 passes with invocationCount=100.
merlimat added a commit to merlimat/pulsar that referenced this pull request Mar 19, 2026
…Topic

Follow-up to apache#25343. The previous fix waited for consumer-1 to become
active before creating consumer-2, but the failover dispatcher
re-evaluates active consumer assignment when consumer-2 subscribes
(pickAndScheduleActiveConsumer), which can flip the active consumer.

Instead of assuming a specific consumer is active, use a
ConsumerEventListener with an AtomicReference to track the latest
active consumer. After both consumers are subscribed, read the
reference to identify which is active and which is standby, then
assign them to the test's consumer1 (active) and consumer2 (standby)
variables accordingly.

Verified 100/100 passes with invocationCount=100.
@lhotari lhotari added this to the 4.2.0 milestone Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/test doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants