[fix][test] Drop racy concurrent-delete assertion in CompactionConcurrencyTest#25827
[fix][test] Drop racy concurrent-delete assertion in CompactionConcurrencyTest#25827merlimat wants to merge 1 commit into
Conversation
…rencyTest
testDisableCompactionConcurrently fires two threads that both call
deleteSubscriptionAsync("__compaction") and asserts that at least one
fails. That assertion was flaky after the test was migrated to
SharedPulsarBaseTest in apache#25381.
The original test used MockZooKeeper.delay(...) to inject a delay into
the cursor delete operation, forcing the two calls to overlap inside
PersistentTopic#asyncDeleteCursorWithCleanCompactionLedger's synchronized
block, which made one of them fail with SubscriptionBusyException.
SharedPulsarBaseTest uses an in-memory metadata store with no such hook,
so the first delete can complete (and remove the subscription) before the
second thread even reaches the broker. The second call then finds
subscriptions.get("__compaction") == null and returns success as a no-op
— both succeed legally.
Drop the "at least one fails" assertion; the broker handles either
outcome correctly. Keep the deadlock check (both futures complete) and
add a verification that the compaction subscription ends up removed,
which is the actual invariant we care about.
Example failure:
https://scans.gradle.com/s/2odpjlsucvuly/tests/task/:pulsar-broker:test/details/org.apache.pulsar.broker.service.persistent.CompactionConcurrencyTest/testDisableCompactionConcurrently/2/output
void-ptr974
left a comment
There was a problem hiding this comment.
This makes sense to me. This is a simpler and more direct fix for the flaky assertion. Thanks for the update.
|
Closing as superseded by #25794, which was merged earlier and fixes the same flakiness in #25794 deterministically forces the concurrent-delete race by blocking the |
|
Thanks @merlimat, appreciate it! |
Motivation
CompactionConcurrencyTest.testDisableCompactionConcurrentlyfires two threads that both calldeleteSubscriptionAsync(\"__compaction\")on the same topic and asserts that at least one of them fails:That assertion has been flaky since the test was migrated from
ProducerConsumerBasetoSharedPulsarBaseTestin #25381.The original test relied on
MockZooKeeper.delay(...)to inject a delay into the cursor delete operation. This forced the two calls to overlap insidePersistentTopic#asyncDeleteCursorWithCleanCompactionLedger'ssynchronizedblock — the second one then hit thedisablingCompaction.compareAndSet(false, true) == falsebranch and failed withSubscriptionBusyException.SharedPulsarBaseTestuses an in-memory metadata store and there is no equivalent delay hook. The first delete can complete (and remove the subscription) before the second thread even reaches the broker. The second call then runs throughasyncDeleteCursorWithClearDelayedMessage, seessubscriptions.get(\"__compaction\") == null, and returns success as a no-op:Both futures complete successfully — perfectly valid behavior — and the assertion fails.
Example failure: https://scans.gradle.com/s/2odpjlsucvuly/tests/task/:pulsar-broker:test/details/org.apache.pulsar.broker.service.persistent.CompactionConcurrencyTest/testDisableCompactionConcurrently/2/output
Modifications
Drop the "at least one fails" assertion — the broker handles either outcome correctly (concurrent overlap →
SubscriptionBusyException; sequential → no-op).Keep the deadlock check (both futures complete in bounded time) and add a verification that the compaction subscription ends up removed, which is the actual invariant we care about.
Verifying this change
This change is already covered by
CompactionConcurrencyTest.testDisableCompactionConcurrently. Locally I ran 5 times with--rerun-tasks; all passed.Does this pull request potentially affect one of the following parts: