ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit overflow problem #262
Conversation
Thinking about some abnormal situations, maybe 24 bit for |
Seems like all test cases passed, but some problems happened in [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1298: The following error occurred while executing this line:
[exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1308: exec returned: 2
[exec]
[exec] Total time: 15 minutes 45 seconds
[exec] /bin/kill -9 16911
[exec] [exec] Zookeeper_operations::testAsyncWatcher1 : assertion : elapsed 1044
[exec] [exec] Zookeeper_operations::testAsyncGetOperation : elapsed 4 : OK
[exec] [exec] Zookeeper_operations::testOperationsAndDisconnectConcurrently1FAIL: zktest-mt
[exec] [exec] ==========================================
[exec] [exec] 1 of 2 tests failed
[exec] [exec] Please report to user@zookeeper.apache.org
[exec] [exec] ==========================================
[exec] [exec] make[1]: Leaving directory `/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build/test/test-cppunit`
[exec] [exec] /bin/bash: line 5: 15116 Segmentation fault ZKROOT=/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/src/c/../.. CLASSPATH=$CLASSPATH:$CLOVER_HOME/lib/clover.jar ${dir}$tst
[exec] [exec] make[1]: *** [check-TESTS] Error 1
[exec] [exec] make: *** [check-am] Error 2
[exec]
[exec] Running contrib tests.
[exec] ======================================================================
[exec]
[exec] /home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant -DZookeeperPatchProcess= -Dtest.junit.output.format=xml -Dtest.output=yes test-contrib
[exec] Buildfile: /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml
[exec]
[exec] test-contrib:
[exec]
[exec] BUILD SUCCESSFUL
[exec] Total time: 0 seconds |
@@ -121,8 +121,8 @@ public JsonGenerator(LogIterator iter) { | |||
} else if ((m = newElectionP.matcher(e.getEntry())).find()) { | |||
Iterator<Integer> iterator = servers.iterator(); | |||
long zxid = Long.valueOf(m.group(2)); | |||
int count = (int)zxid;// & 0xFFFFFFFFL; | |||
int epoch = (int)Long.rotateRight(zxid, 32);// >> 32; | |||
long count = zxid & 0xffffffffffL; |
nerdyyatrice
May 24, 2017
How can this be all over the code base instead of a function somewhere in a util file
How can this be all over the code base instead of a function somewhere in a util file
asdf2014
May 24, 2017
Author
Member
Yeah, you are right!
Yeah, you are right!
int count = (int)zxid;// & 0xFFFFFFFFL; | ||
int epoch = (int)Long.rotateRight(zxid, 32);// >> 32; | ||
long count = zxid & 0xffffffffffL; | ||
int epoch = (int)Long.rotateRight(zxid, 40);// >> 40; |
nerdyyatrice
May 24, 2017
same, 40 shouldn't fly around in the code base like this
same, 40 shouldn't fly around in the code base like this
asdf2014
May 24, 2017
Author
Member
Already unify all code those processing ZXID
into using ZixdUtils
.
Already unify all code those processing ZXID
into using ZixdUtils
.
Due to this jvm bug, JDK7 cannot recognition [javac] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/JsonGenerator.java:129: error: cannot find symbol
[javac] long epoch = getEpochFromZxid(zxid); |
1.change the Zxid from high32_low32 to high24_low40 can avoid concurrent problem you hava referred to? |
Hi, @maoling. Thanks for your discussion. Maybe due to my description is problematic, so make you confused.
|
Hi, @asdf2014 .Thanks for your explanation! But I still have some confusions about the question one:
it all depends on that zxid can not be altered(no write operation after zxid has generated at the first time) in the multithread situation,otherwise epoch and count isn't idempotent.should zxid be decorated by final? |
Hi, I think 48 bits low is better for large throughput zk cluster. |
Hi, @yunfan123 . Thank you for your suggestion. As you said in the opinion, so that it can guarantee a smooth upgrade. However, if the 16-bit |
Hi, @asdf2014 |
@yunfan123 @asdf2014 i have seen this issue a twice over a month period. is there anything one can do to prevent this from happening? maybe allowing for leader restarts at "off peak hours" weekly?(yuck i know) it sound like if we can move forward with this if we move to 48 bits low correct? note version: |
@JarenGlover It's a good idea, but not the best solution. Still we can use the |
Are you seeing this behavior with ZOOKEEPER-1277 applied? If so it's a bug in that change, because after that's applied the leader should shutdown as we approach the rollover. It would be nice to address this by changing the zxid semantics, but I don't believe that's a great idea. Instead I would rather see us address any shortcoming in my original fix (1277) fwiw - what I have seen people do in this situation is to monitor the zxid and when it gets close (say within 10%) of the rollover they have an automated script which restarts the leader, which forces a re-election. However 1277 should be doing this for you. Given you are seeing this issue perhaps you can help with resolving any bugs in 1277? thanks! |
Hi, @phunt . Thank you for your comment. Yeah, we discuss here is due to the ZOOKEEPER-1277 solution is not very well. It causes so many times leader restart. And the restart process even could spend few minutes, which is some situations cannot tolerate it. |
Ok, thanks for the update. fwiw restarting taking a few minutes is going to be an issue regardless, no? Any regular type issue, such as a temporary network outage, could cause the quorum to be lost and leader election triggered. |
Hi, @phunt . Indeed, the |
i think it would be much better to extend ZOOKEEPER-1277 to more transparently do the rollover without a full leader election. the main issue i have with shortening the epoch size is that once the epoch hits the maximum value the ensemble is stuck, nothing can proceed, so we really need to keep the epoch size big enough that we would never hit that condition. i don't think a 16-bit epoch satisfies that requirement. |
Hi, @breed . Thanks for your comment. You are right, we should keep the enough epoch value to avoid meet the epoch overflow. So i offered a better solution is 24-bit epoch in second comment. Even if the frequency of leader election is once by every single hours, we will not experience the epoch overflow until 1915.2 years later. |
@asdf2014
|
Hi, @maoling. Thank you for your comments. As you said, if we cannot carry the version of server, it will be too difficult to maintain backward compatibility. The reversion in Etcd is to implement the MVCC feature, which seems to be equivalent to the Zookeeper counter, not the entire ZXID. If we consider that design, then maybe we should use more 64bits, convert ZXID from |
If it is
1k/s
ops, then as long as $2^{32} / (86400 * 1000) \approx 49.7$ days ZXID will exhausted. But, if we reassign theZXID
into 16bit forepoch
and 48bit forcounter
, then the problem will not occur until after $Math.min(2^{16} / 365, 2^{48} / (86400 * 1000 * 365)) \approx Math.min(179.6, 8925.5) = 179.6$ years.However, i thought the ZXID is
long
type, reading and writing the long type (anddouble
type the same) in JVM, is divided into high 32bit and low 32bit part of the operation, and because theZXID
variable is not modified withvolatile
and is not boxed for the corresponding reference type (Long
/Double
), so it belongs to non-atomic operation. Thus, if the lower 16 bits of the upper 32 bits are divided into the low 32 bits of the entirelong
and become 48 bits low, there may be a concurrent problem.