DistributedDoubleBarrier - Barrier bypass due to spurious wakeups or SyncConnected events
Present since Curator 4.1.0 (CURATOR-495).
DistributedDoubleBarrier.enter() can return true before the required number of participants have arrived. A single participant can pass through the barrier with no other participants. We discovered this by tracing test flakiness in an internal software test suite.
The bug has two parts:
- internalEnter() uses do { ... } while (false), which executes exactly once. After wait() returns, the method exits without re-checking the participant count despite Object.wait() being subject to spurious wakeups.
- The watcher fires on any ZooKeeper event, including SyncConnected session events. Since CURATOR-495 (4.1.0), the notification is async via runSafe(), so hasBeenNotified can be set to true by a session event during wait(). Since the actual member count isn't checked, this results in a barrier participant proceeding erroneously.
The untimed enter() is not affected - it always takes an unconditional wait() path that doesn't rely on hasBeenNotified.