-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPE during replay #8830
Comments
A rough timeline of broker 0, partition 2 where the exception occurred: At 05:18:42 At 05:18:54 At 05:21:00 At 05:21:18 Detected 'UNHEALTHY' components. The current health status of components: At 09:56:48 |
@romansmirnov and I investigated this further and we think that we have found the cause.
|
Thanks for the awesome report @oleschoenburg 🙇 |
8994: Fix notification of snapshot replication listeners about missed events r=oleschoenburg a=oleschoenburg ## Description Whenever a raft partition transitions to a new role, we must reset `missedSnapshotReplicationEvents` so that registering new snapshot replication listeners does not trigger the listener for snapshot replication events that occurred for a different role. This is to guard against a known case where raft received a snapshot and transitioned to leader before role change and snapshot replication listeners were registered. Once the registration of the snapshot replication listener registration went through, the listener was informed about the missed replication and caused the zeebe partition to transition to follower, while the raft partition was leader. Additionally, because we keep track of missed snapshot replication events to notify listeners on registering, the partition is not necessarily in follower role at that point. This breaks the assumption of the listener, so we add a condition here to only trigger listeners on register if the raft partition is still follower. ## Related issues relates to #8830 closes #8978 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
Describe the bug
I collected the data folders of all three brokers here: https://drive.google.com/drive/folders/1KZcyeFzxY3wy12ODMSRb-ovqptRuGzvQ
Unfortunately, I was a bit too late and was only able to collect the data after broker 0 partially recovered when it received a new snapshot.
The text was updated successfully, but these errors were encountered: