-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jonmv/zookeeper 4541 take 2 #2154
base: master
Are you sure you want to change the base?
Conversation
… down from Learner
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/SendAckRequestProcessor.java
Outdated
Show resolved
Hide resolved
Will this fix be backported in the 3.8 train? We just hit this bug on one of our clusters, it's a shame we've had various fixes up for review for over 20 months, I would appreciate you guys pushing this through the finish line so this critical issue can be closed. Thanks! |
One more thing, the resources (Threads/Processors...) created in |
This is a pretty small, targeted fix now, is there anything controversial about it or would it be possible to merge it and cut a release soon? |
This PR fixes the shutdown errors that were added in #157, and also avoids a common NPE during ZK shutdown from a learner, when the leader shuts down (first commit).
Together with #2111 and #2152, this should cover all the fixes in #1925.
We've had the forked ZK from #1925 running embedded in hundreds, if not thousands, of ZK clusters, with rolling restarts most days, and we've had zero cases of inconsistent data since we patched—one or a few cases per week before that.
(We still sometimes see ephemeral nodes remain after the leader is brutally taken down, i.e., with
Runtime.halt()
, but this looks different; it seems clearing out client sessions, and their ephemeral nodes, simply isn't done when death is too sudden.)