CAMEL-23437: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest#22989
Conversation
The test was disabled since 2019 due to multiple flakiness issues: - Fragile wall-clock timing assertion (< 2000ms) with only 400ms margin over the theoretical 1600ms drain time, easily exceeded on slow CI. - Race between producer and reconfiguration: the ConsumerEventHandler processes events asynchronously (calls processor.process with a no-op callback and returns immediately), so the ring buffer drains in microseconds. The producer could send all messages to the OLD disruptor before addRoutes() nullified the reference, causing mock:b to receive far fewer messages than expected. - Non-volatile exception field across threads and tight 5-second timeouts. Fix by using explicit CountDownLatch synchronization: the producer sends a first batch, signals, then waits for reconfiguration to complete before sending the second batch. This ensures the second batch deterministically goes through both consumers. Also use blockWhenFull=true (matching the test name), increase timeouts, and make the exception field volatile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
|
🧪 CI tested the following changed modules:
All tested modules (8 modules)
|
apupier
left a comment
There was a problem hiding this comment.
How many times have you launched it locally?
Were you able to reproduce the flakiness locally without the change?
gnodet
left a comment
There was a problem hiding this comment.
Thanks for the review @apupier!
How many times have you launched it locally?
30 consecutive runs (10 + 20), all passed. The test completes consistently in under 1 second.
Were you able to reproduce the flakiness locally without the change?
Yes. With just removing @Disabled (keeping the original test logic), it failed 3 out of 5 runs — mock:b would receive only 6 messages instead of the expected minimum 10, timing out at 30 seconds. The root cause is that the ConsumerEventHandler processes events asynchronously (calls processor.process() with a no-op callback and returns immediately), so the ring buffer drains in microseconds. The producer could send all 12 messages to the old disruptor before addRoutes() completed the reconfiguration.
Claude Code on behalf of Guillaume Nodet
CAMEL-23437
Summary
DisruptorReconfigureWithBlockingProducerTest, which has been@Disabledsince 2019Root causes of flakiness
watch.taken() < 2000): only 400ms margin over the theoretical 1600ms drain time, easily exceeded on slow CI machinesConsumerEventHandlerprocesses events asynchronously (callsprocessor.process()with a no-op callback and returns immediately), so the ring buffer drains in microseconds. The producer could send all 12 messages to the OLD disruptor beforeaddRoutes()nullified the reference, causingmock:bto receive far fewer messages than expectedexceptionfield across threads and tight 5-second timeoutsChanges
@Disabledannotation to re-enable the testCountDownLatchsynchronization: producer sends a first batch of 8 messages, signals, then waits for reconfiguration to complete before sending the second batch of 12 messages. This ensures the second batch deterministically goes through both consumersblockWhenFull=trueon the producer URI (matching the test's name)StopWatchtiming assertionexceptionfieldvolatilefor proper cross-thread visibilityMockEndpoint.assertIsSatisfiedfinallyforresultLatchVerified stable with 30 consecutive passes locally and full disruptor test suite (102 tests).