CAMEL-23437: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest by gnodet · Pull Request #22989 · apache/camel

gnodet · 2026-05-06T08:24:41Z

CAMEL-23437

Summary

Fix and re-enable DisruptorReconfigureWithBlockingProducerTest, which has been @Disabled since 2019

Root causes of flakiness

Fragile wall-clock timing assertion (watch.taken() < 2000): only 400ms margin over the theoretical 1600ms drain time, easily exceeded on slow CI machines
Race between producer and reconfiguration: the ConsumerEventHandler processes events asynchronously (calls processor.process() with a no-op callback and returns immediately), so the ring buffer drains in microseconds. The producer could send all 12 messages to the OLD disruptor before addRoutes() nullified the reference, causing mock:b to receive far fewer messages than expected
Non-volatile exception field across threads and tight 5-second timeouts

Changes

Remove @Disabled annotation to re-enable the test
Use explicit CountDownLatch synchronization: producer sends a first batch of 8 messages, signals, then waits for reconfiguration to complete before sending the second batch of 12 messages. This ensures the second batch deterministically goes through both consumers
Use blockWhenFull=true on the producer URI (matching the test's name)
Remove the fragile StopWatch timing assertion
Make exception field volatile for proper cross-thread visibility
Increase timeouts to generous values (10s/30s) and add explicit timeout to MockEndpoint.assertIsSatisfied
Add proper error handling with finally for resultLatch

Verified stable with 30 consecutive passes locally and full disruptor test suite (102 tests).

The test was disabled since 2019 due to multiple flakiness issues: - Fragile wall-clock timing assertion (< 2000ms) with only 400ms margin over the theoretical 1600ms drain time, easily exceeded on slow CI. - Race between producer and reconfiguration: the ConsumerEventHandler processes events asynchronously (calls processor.process with a no-op callback and returns immediately), so the ring buffer drains in microseconds. The producer could send all messages to the OLD disruptor before addRoutes() nullified the reference, causing mock:b to receive far fewer messages than expected. - Non-volatile exception field across threads and tight 5-second timeouts. Fix by using explicit CountDownLatch synchronization: the producer sends a first batch, signals, then waits for reconfiguration to complete before sending the second batch. This ensures the second batch deterministically goes through both consumers. Also use blockWhenFull=true (matching the test name), increase timeouts, and make the exception field volatile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-06T08:25:36Z

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

First-time contributors require MANUAL approval for the GitHub Actions to run
You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
You can label PRs using skip-tests and test-dependents to fine-tune the checks executed by this PR.
Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

github-actions · 2026-05-06T08:43:18Z

🧪 CI tested the following changed modules:

components/camel-disruptor

All tested modules (8 modules)

Camel :: Disruptor
Camel :: JBang :: MCP
Camel :: JBang :: Plugin :: Route Parser
Camel :: JBang :: Plugin :: TUI
Camel :: JBang :: Plugin :: Validate
Camel :: Launcher :: Container
Camel :: YAML DSL :: Validator
Camel :: YAML DSL :: Validator Maven Plugin

⚙️ View full build and test results

apupier

How many times have you launched it locally?
Were you able to reproduce the flakiness locally without the change?

gnodet

Thanks for the review @apupier!

How many times have you launched it locally?

30 consecutive runs (10 + 20), all passed. The test completes consistently in under 1 second.

Were you able to reproduce the flakiness locally without the change?

Yes. With just removing @Disabled (keeping the original test logic), it failed 3 out of 5 runs — mock:b would receive only 6 messages instead of the expected minimum 10, timing out at 30 seconds. The root cause is that the ConsumerEventHandler processes events asynchronously (calls processor.process() with a no-op callback and returns immediately), so the ring buffer drains in microseconds. The producer could send all 12 messages to the old disruptor before addRoutes() completed the reconfiguration.

Claude Code on behalf of Guillaume Nodet

github-actions Bot added the components label May 6, 2026

gnodet marked this pull request as ready for review May 6, 2026 08:46

gnodet requested review from davsclaus and oscerd May 6, 2026 08:46

apupier approved these changes May 6, 2026

View reviewed changes

gnodet commented May 6, 2026

View reviewed changes

davsclaus approved these changes May 6, 2026

View reviewed changes

gnodet changed the title ~~CAMEL-13629: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest~~ CAMEL-23437: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest May 6, 2026

gnodet merged commit 1af9a15 into apache:main May 6, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAMEL-23437: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest#22989

CAMEL-23437: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest#22989
gnodet merged 1 commit intoapache:mainfrom
gnodet:CAMEL-13629-fix-flaky-disruptor-reconfigure-test

gnodet commented May 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

apupier left a comment

Uh oh!

gnodet left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gnodet commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root causes of flakiness

Changes

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

apupier left a comment

Choose a reason for hiding this comment

Uh oh!

gnodet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gnodet commented May 6, 2026 •

edited

Loading