Skip to content

[fix][test] Fix flaky compaction disable test by controlling delete concurrency#25794

Open
void-ptr974 wants to merge 1 commit into
apache:masterfrom
void-ptr974:fix-25775-compaction-flaky
Open

[fix][test] Fix flaky compaction disable test by controlling delete concurrency#25794
void-ptr974 wants to merge 1 commit into
apache:masterfrom
void-ptr974:fix-25775-compaction-flaky

Conversation

@void-ptr974
Copy link
Copy Markdown
Contributor

Fixes #25775

Motivation

CompactionConcurrencyTest.testDisableCompactionConcurrently is flaky because it assumes two __compaction subscription delete requests will overlap.

With the shared in-memory metadata store, the first delete can finish before the second reaches the broker. In that case both requests may complete successfully, so the test does not
reliably verify the disablingCompaction guard.

Modifications

Make the concurrency window deterministic.

The test now blocks currentCompaction before sending the first __compaction delete, waits until disablingCompaction=true, then sends the second delete and verifies it fails with
PreconditionFailedException.

After that, it completes the blocked compaction future and verifies the first delete succeeds and disablingCompaction is reset.

Verifying this change

  • Make sure that the change passes the CI checks.

This change updates an existing test and can be verified as follows:

./gradlew :pulsar-broker:test --tests org.apache.pulsar.broker.service.persistent.CompactionConcurrencyTest --no-daemon

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

CompactionConcurrencyTest.testDisableCompactionConcurrently verifies that only one __compaction subscription delete can run through the compaction disable path at a time. While one delete is waiting for the current compaction to finish and has set disablingCompaction=true, a second delete request should be rejected by the guard and returned to the admin client as PreconditionFailedException.

The test was flaky because it created two deleteSubscriptionAsync requests from separate threads and assumed they would overlap. With the shared in-memory metadata store, the first request can delete the subscription before the second request reaches the broker. In that ordering the second request no longer exercises the concurrent-disable path, and both requests can complete successfully, making the assertion unreliable.

Make the concurrent window deterministic by replacing the topic currentCompaction future with an incomplete future before starting the first __compaction delete. The first delete is then held in the disable path after setting disablingCompaction=true. The test waits for that state, sends the second delete, and verifies it fails with PulsarAdminException.PreconditionFailedException. It then completes the blocked compaction future and verifies the first delete finishes successfully and disablingCompaction is reset.

The test now verifies the intended concurrent compaction-disable behavior without depending on thread scheduling or metadata-store timing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky-test: CompactionConcurrencyTest. testDisableCompactionConcurrently

1 participant