Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jonmv/zookeeper 4541 take 2 #2154

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jonmv
Copy link

@jonmv jonmv commented Apr 4, 2024

This PR fixes the shutdown errors that were added in #157, and also avoids a common NPE during ZK shutdown from a learner, when the leader shuts down (first commit).
Together with #2111 and #2152, this should cover all the fixes in #1925.

We've had the forked ZK from #1925 running embedded in hundreds, if not thousands, of ZK clusters, with rolling restarts most days, and we've had zero cases of inconsistent data since we patched—one or a few cases per week before that.
(We still sometimes see ephemeral nodes remain after the leader is brutally taken down, i.e., with Runtime.halt(), but this looks different; it seems clearing out client sessions, and their ephemeral nodes, simply isn't done when death is too sudden.)

@AlphaCanisMajoris
Copy link

LGTM.

Together with #2111 and #2152, this should cover all the fixes in #1925.

Yes I believe so.

BTW, could u please recommit this pr, since it wasn't built successfully.

@tsuna
Copy link

tsuna commented Jun 13, 2024

Will this fix be backported in the 3.8 train? We just hit this bug on one of our clusters, it's a shame we've had various fixes up for review for over 20 months, I would appreciate you guys pushing this through the finish line so this critical issue can be closed. Thanks!

@changruill
Copy link

One more thing, the resources (Threads/Processors...) created in startupWithServerState(State.INITIAL) won't be released in shutdown, cause of canShutdown does not contain the condition state == State.INITIAL. This LEAK would occur before ZooKeeperServer.state changes to State.RUNNING (follower read a UPTODATE packet).

@tsuna
Copy link

tsuna commented Jun 19, 2024

This is a pretty small, targeted fix now, is there anything controversial about it or would it be possible to merge it and cut a release soon?
The change needs to be rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants