Skip to content

CAMEL-23437: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest#22989

Merged
gnodet merged 1 commit intoapache:mainfrom
gnodet:CAMEL-13629-fix-flaky-disruptor-reconfigure-test
May 6, 2026
Merged

CAMEL-23437: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest#22989
gnodet merged 1 commit intoapache:mainfrom
gnodet:CAMEL-13629-fix-flaky-disruptor-reconfigure-test

Conversation

@gnodet
Copy link
Copy Markdown
Contributor

@gnodet gnodet commented May 6, 2026

CAMEL-23437

Summary

  • Fix and re-enable DisruptorReconfigureWithBlockingProducerTest, which has been @Disabled since 2019

Root causes of flakiness

  1. Fragile wall-clock timing assertion (watch.taken() < 2000): only 400ms margin over the theoretical 1600ms drain time, easily exceeded on slow CI machines
  2. Race between producer and reconfiguration: the ConsumerEventHandler processes events asynchronously (calls processor.process() with a no-op callback and returns immediately), so the ring buffer drains in microseconds. The producer could send all 12 messages to the OLD disruptor before addRoutes() nullified the reference, causing mock:b to receive far fewer messages than expected
  3. Non-volatile exception field across threads and tight 5-second timeouts

Changes

  • Remove @Disabled annotation to re-enable the test
  • Use explicit CountDownLatch synchronization: producer sends a first batch of 8 messages, signals, then waits for reconfiguration to complete before sending the second batch of 12 messages. This ensures the second batch deterministically goes through both consumers
  • Use blockWhenFull=true on the producer URI (matching the test's name)
  • Remove the fragile StopWatch timing assertion
  • Make exception field volatile for proper cross-thread visibility
  • Increase timeouts to generous values (10s/30s) and add explicit timeout to MockEndpoint.assertIsSatisfied
  • Add proper error handling with finally for resultLatch

Verified stable with 30 consecutive passes locally and full disruptor test suite (102 tests).

The test was disabled since 2019 due to multiple flakiness issues:

- Fragile wall-clock timing assertion (< 2000ms) with only 400ms margin
  over the theoretical 1600ms drain time, easily exceeded on slow CI.

- Race between producer and reconfiguration: the ConsumerEventHandler
  processes events asynchronously (calls processor.process with a no-op
  callback and returns immediately), so the ring buffer drains in
  microseconds. The producer could send all messages to the OLD
  disruptor before addRoutes() nullified the reference, causing mock:b
  to receive far fewer messages than expected.

- Non-volatile exception field across threads and tight 5-second
  timeouts.

Fix by using explicit CountDownLatch synchronization: the producer
sends a first batch, signals, then waits for reconfiguration to
complete before sending the second batch. This ensures the second
batch deterministically goes through both consumers. Also use
blockWhenFull=true (matching the test name), increase timeouts, and
make the exception field volatile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run
  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
  • You can label PRs using skip-tests and test-dependents to fine-tune the checks executed by this PR.
  • Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🧪 CI tested the following changed modules:

  • components/camel-disruptor
All tested modules (8 modules)
  • Camel :: Disruptor
  • Camel :: JBang :: MCP
  • Camel :: JBang :: Plugin :: Route Parser
  • Camel :: JBang :: Plugin :: TUI
  • Camel :: JBang :: Plugin :: Validate
  • Camel :: Launcher :: Container
  • Camel :: YAML DSL :: Validator
  • Camel :: YAML DSL :: Validator Maven Plugin

⚙️ View full build and test results

@gnodet gnodet marked this pull request as ready for review May 6, 2026 08:46
@gnodet gnodet requested review from davsclaus and oscerd May 6, 2026 08:46
Copy link
Copy Markdown
Contributor

@apupier apupier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many times have you launched it locally?
Were you able to reproduce the flakiness locally without the change?

Copy link
Copy Markdown
Contributor Author

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @apupier!

How many times have you launched it locally?

30 consecutive runs (10 + 20), all passed. The test completes consistently in under 1 second.

Were you able to reproduce the flakiness locally without the change?

Yes. With just removing @Disabled (keeping the original test logic), it failed 3 out of 5 runs — mock:b would receive only 6 messages instead of the expected minimum 10, timing out at 30 seconds. The root cause is that the ConsumerEventHandler processes events asynchronously (calls processor.process() with a no-op callback and returns immediately), so the ring buffer drains in microseconds. The producer could send all 12 messages to the old disruptor before addRoutes() completed the reconfiguration.

Claude Code on behalf of Guillaume Nodet

@gnodet gnodet changed the title CAMEL-13629: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest CAMEL-23437: Fix and re-enable flaky DisruptorReconfigureWithBlockingProducerTest May 6, 2026
@gnodet gnodet merged commit 1af9a15 into apache:main May 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants