Skip to content

[fix][test] Fix flaky AdminApiOffloadTest.testOffload status race#25674

Merged
merlimat merged 1 commit into
apache:masterfrom
merlimat:mmerli/fix-flaky-admin-api-offload-test
May 5, 2026
Merged

[fix][test] Fix flaky AdminApiOffloadTest.testOffload status race#25674
merlimat merged 1 commit into
apache:masterfrom
merlimat:mmerli/fix-flaky-admin-api-offload-test

Conversation

@merlimat
Copy link
Copy Markdown
Contributor

@merlimat merlimat commented May 5, 2026

Motivation

AdminApiOffloadTest.testOffload (parameterized as testOffloadV1 and testOffloadV2) checks the offload status immediately after completing the offload promise exceptionally:

promise.completeExceptionally(new Exception("Some random failure"));

assertEquals(admin.topics().offloadStatus(topicName).getStatus(),
                    LongRunningProcessStatus.Status.ERROR);

The failure handler that updates the offload status to ERROR runs asynchronously after the promise completes. The synchronous assertion races with that handler and occasionally observes RUNNING, producing the flake expected [ERROR] but found [RUNNING].

Example failure: https://scans.gradle.com/s/nv4xn5dwytjvs/tests/task/:pulsar-broker:test/details/org.apache.pulsar.broker.admin.AdminApiOffloadTest/testOffloadV2/2/output

Modifications

Wrap the ERROR status assertion in Awaitility.untilAsserted(...) so the test waits for the failure handler to update the status, mirroring the success path immediately below it (which already used Awaitility).

Verifying this change

This change is already covered by existing tests:

  • AdminApiOffloadTest.testOffloadV1
  • AdminApiOffloadTest.testOffloadV2

Locally I ran both 3 times in a row with fresh Gradle daemons; each run took ~0.4–0.9s and all passed.

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

The post-failure status check ran synchronously right after
`promise.completeExceptionally(...)`, but the offload failure handler
runs asynchronously after the promise completes. This left a window
where `offloadStatus()` still reported `RUNNING` when the assertion
fired, producing the observed flake `expected [ERROR] but found [RUNNING]`.

Wrap the assertion in `Awaitility.untilAsserted(...)` to allow the
status to converge to ERROR, mirroring the success path below.
Copy link
Copy Markdown
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@merlimat merlimat merged commit f84bc38 into apache:master May 5, 2026
80 of 84 checks passed
poorbarcode pushed a commit to poorbarcode/pulsar that referenced this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants