New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-4203: Leader swallows the ZooKeeperServer.State.ERROR from Leader.LearnerCnxAcceptor in some concurrency condition #1596
base: master
Are you sure you want to change the base?
Changes from 1 commit
0c98d1d
eccda87
544fd98
0657332
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -189,9 +189,18 @@ public void dumpConf(PrintWriter pwriter) { | |
pwriter.print(self.getQuorumVerifier().toString()); | ||
} | ||
|
||
private final Object stateChangeMutex = new Object(); | ||
|
||
@Override | ||
protected void setState(State state) { | ||
this.state = state; | ||
synchronized (stateChangeMutex) { | ||
if (this.state == State.ERROR) { | ||
if (state == State.RUNNING || state == State.INITIAL) { | ||
return; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should add a relevant log in order to see if this case is happening. Creating a test that reproduces the problem and demonstrates that this fix actually resolves the problem would be better There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To add the test, do we need to consider using a fault injection tool? See ZOOKEEPER-3601 and #1135. I have provided Byteman injection script in ZOOKEEPER-4203 so it can be somehow translated into a test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 100% agree with @eolivelli on adding a For the test, I would suggest mocking and/or overriding some methods to explicitly generate the fault, if possible. I haven't tried doing so, but perhaps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Okay, I will add the log later.
Okay, I will take a try. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @functioner, I am very interested in this problem, I have written a test case, I don't konw if it can be used to test this problem.Maybe you can use it as a reference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you want to use Using AtomicReference is better and easier to understand. |
||
} | ||
} | ||
this.state = state; | ||
} | ||
} | ||
|
||
@Override | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a closer look I find that setState can be a single implementation in
ZooKeeperServer
. Just as is there. And you can apply this check as well as logging to verify as @eolivelli said.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @eolivelli that
setState()
has to be "careful," becauseZooKeeperServerListenerImpl.notifyStopping
calls it from "random" threads.But @Tison's point still holds; it seems to me that this could be simplified by making
setState()
asynchronized
method. Do you have a specific reason of using a separate object?(The parent
setState()
does not need to besynchronized
because the field isvolatile
.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ztzg The rationale for not making
setState()
a synchronized method is that if a thread is running another synchronized method of this ZooKeeperServer object for some time, it may block thesetState()
invoked by another thread, and this invocation may be critical.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the single-node ZooKeeperServer can handle this issue with the handler:
zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java
Lines 779 to 789 in 0c98d1d
because it is registered at the very beginning:
zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerMain.java
Line 150 in 0c98d1d
This handling logic is:
zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerShutdownHandler.java
Lines 42 to 46 in 0c98d1d
and then
zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerMain.java
Lines 181 to 185 in 0c98d1d
However, in the case of
QuorumZooKeeperServer
, including Leader, Follower, etc., the aforementioned logic is not used.