[fix][test] Flaky SameAuthParamsLookupAutoClusterFailoverTest by lhotari · Pull Request #25566 · apache/pulsar

lhotari · 2026-04-22T11:16:32Z

Motivation

The SameAuthParamsLookupAutoClusterFailoverTest.testAutoClusterFailover test is flaky. A prior fix (#25388) addressed one root cause (stale Healthy state in findFailoverTo), but the test still times out intermittently at the "Test recover 2 --> 1" step, where Awaitility waits up to 60s for state to reach [Failed, Healthy, Healthy].

Root cause: with checkHealthyIntervalMs(300) and recoverThreshold(5), recovery of index 1 requires 5 successful probe cycles. Each cycle sequentially probes every index up to currentPulsarServiceIndex (so 3 probes per cycle once currentIndex=2). Each probe is bounded by a 3s deadline in SameAuthParamsLookupAutoClusterFailover.probeAvailable(). Under CI load, the accumulated wall time (cycles × probes × per-probe time) can approach or exceed the 60s Awaitility budget — especially if a transient probe failure of index 1 resets the PreRecover counter, forcing another 5 cycles.

Modifications

Reduce checkHealthyIntervalMs from 300ms to 100ms in the test. This tightens the cycle cadence so recovery completes sooner without weakening the threshold-based state machine — the full five-step PreRecover -> Healthy transition is still exercised.

Verifying this change

Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as SameAuthParamsLookupAutoClusterFailoverTest.testAutoClusterFailover. Verified locally with @Test(invocationCount = 5) (10 runs total across both enabledTls values): 10/10 passes, each invocation consistent at ~16.5s (vs. the flaky runs that previously exceeded the 60s awaitility budget).

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc-not-needed

…LookupAutoClusterFailoverTest Under CI load, the "Test recover 2 --> 1" step in testAutoClusterFailover could exceed its 60s awaitility budget because each recovery cycle sequentially probes every index up to currentPulsarServiceIndex with a 3s per-probe deadline, and recoverThreshold=5 requires five successful cycles. Transient probe failures reset the PreRecover counter, compounding delay. Tightening checkHealthyIntervalMs from 300ms to 100ms shortens the cycle cadence without weakening the threshold-based state machine — the full five-step PreRecover -> Healthy transition is still exercised.

(cherry picked from commit 1071cc9)

…#25566) (cherry picked from commit 1071cc9) (cherry picked from commit 68a2d04)

lhotari requested review from Technoboy-, dao-jun, merlimat and nodece April 22, 2026 11:16

lhotari added this to the 4.3.0 milestone Apr 22, 2026

lhotari added release/4.1.4 release/4.0.10 release/4.2.1 labels Apr 22, 2026

nodece approved these changes Apr 22, 2026

View reviewed changes

dao-jun approved these changes Apr 22, 2026

View reviewed changes

merlimat approved these changes Apr 22, 2026

View reviewed changes

merlimat merged commit 1071cc9 into apache:master Apr 22, 2026
80 of 82 checks passed

lhotari added a commit that referenced this pull request Apr 22, 2026

[fix][test] Flaky SameAuthParamsLookupAutoClusterFailoverTest (#25566)

8342786

(cherry picked from commit 1071cc9)

lhotari added a commit that referenced this pull request Apr 22, 2026

[fix][test] Flaky SameAuthParamsLookupAutoClusterFailoverTest (#25566)

68a2d04

(cherry picked from commit 1071cc9)

lhotari added a commit that referenced this pull request Apr 22, 2026

[fix][test] Flaky SameAuthParamsLookupAutoClusterFailoverTest (#25566)

145c466

(cherry picked from commit 1071cc9)

lhotari added cherry-picked/branch-4.0 cherry-picked/branch-4.1 cherry-picked/branch-4.2 labels Apr 22, 2026

srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 23, 2026

[fix][test] Flaky SameAuthParamsLookupAutoClusterFailoverTest (apache…

b0a4328

…#25566) (cherry picked from commit 1071cc9) (cherry picked from commit 68a2d04)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][test] Flaky SameAuthParamsLookupAutoClusterFailoverTest#25566

[fix][test] Flaky SameAuthParamsLookupAutoClusterFailoverTest#25566
merlimat merged 1 commit intoapache:masterfrom
lhotari:lh-fix-flaky-SameAuthParamsLookupAutoClusterFailoverTest

lhotari commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lhotari commented Apr 22, 2026

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants