Fix ZkHelixPropertyStore loses Zookeeper notification issue#924
Fix ZkHelixPropertyStore loses Zookeeper notification issue#924jiajunwang merged 10 commits intoapache:masterfrom
Conversation
ZkHelixPropertyStore loses ZK notification after session expires. THe issue was caused by a bug in Share ZkClient code path. More specifically, Share ZkClient would not call fireAllEvent when ZK session expires. Thus, ZkHelixPropertyStore would not install watches for corresponding ZkPath. Thus, lose Zookeeper nofiticaition when changes happens.
helix-core/src/test/java/org/apache/helix/store/zk/TestZkHelixPropertyStore.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/manager/zk/ZkCacheBaseDataAccessor.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/store/zk/ZkHelixPropertyStore.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/store/zk/TestZkHelixPropertyStore.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/impl/client/SharedZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/store/zk/TestZkHelixPropertyStore.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/store/zk/TestZkHelixPropertyStore.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/store/zk/TestZkHelixPropertyStore.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/store/zk/TestZkHelixPropertyStore.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/store/zk/TestZkHelixPropertyStore.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
jiajunwang
left a comment
There was a problem hiding this comment.
I see the main issue is about the additional public method for fetching the client. I don't like that either.
An alternative way is that, since you are for sure given a shared zkclient in that property store, you can call ShareZkClientFactory with the same parameters and it will return you with another shared ZkClient instance. But, since they will use the same connection manager, when you expires the newly created client, the one that in the HelixPropertyStore will also be expired for once.
Can you have a try? In this way, no new method is required.
How do we get the zkconnection manager from the new sharedZkClient? It is not public. If we make zkConnectionManager protected, we need to access it from the module in zookeeper-api, which means move (rewrite) this test case there. But this case the test case uses the code from HelixPropertyStore, which is helix-core. This is circular dependency. |
Why you need to get zkconnection manager? The object you operate on is the send shared zkclient that you get from the factory. |
|
As discussed offline, ZkTestHelper.expireSession() only works with zkConnectionmanager, but not shared zkclient. But we can add another expireSharedZkClientSession() though for ZkTestHelper. |
1/ move the test from TestZkHelixPropertyStore to TestZkCacheAsyncOpSingleThread This is to use TestHelper.VerifyWithTimeout() 2/ add expireSession with SharedZkClient featue in ZkTestHelper and remove all the public accessor to get ZkClient for test purpose
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
use the concise logic in zkClient.
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
huizhilu
left a comment
There was a problem hiding this comment.
@kaisun2000 There are a few places that need to be changed. But the threads are resolved without any changes. Can you take a look again? Thanks.
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
|
@kaisun2000, please follow the merge steps to finish this PR merge process if you think it is good enough. @pkuwm, please take another look to ensure this looks good to you. |
|
This PR is ready to be merged, approved by @jiajunwang Final message: |
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkCacheAsyncOpSingleThread.java
Show resolved
Hide resolved
|
@kaisun2000 Just a reminder: please run the tests in an appropriate module before merging the PR. Thanks. |
|
I did run another round of mvn test; same results. 4 flaking test. Running independently they succeed. The same as before this fix and added test, |
ZkHelixPropertyStore loses ZK notification after session expires. THe issue was caused by a bug in Share ZkClient code path. More specifically, Share ZkClient would not call fireAllEvent when ZK session expires. Thus, ZkHelixPropertyStore would not install watches for corresponding ZkPath. Thus, lose Zookeeper nofiticaition when changes happens. Co-authored-by: Kai Sun <ksun@ksun-mn1.linkedin.biz>
Issues
Fixes #921
Description
Here are some details about my PR, including screenshots of any UI changes:
ZkHelixPropertyStore loses ZK notification after session expires.
The issue was caused by a bug in Share ZkClient code path. More
specifically, Share ZkClient would not call fireAllEvent when ZK
session expires. Thus, ZkHelixPropertyStore would not install
watches for corresponding ZkPath. Thus, lose Zookeeper
nofiticaition when changes happens.
Tests
testSessionExpirationWithSharedZkClient
ksun-mn1:helix-core ksun$ mvn test
Failed tests:
TestZkConnectionLost.testLostZkConnection » ThreadTimeout Method org.testng.in...
TestJobFailureDependence.testWorkflowFailureJobThreshold » ThreadTimeout Metho...
TestHelixAdminCli.testInstanceOperations:469 » ZkClient Failed to delete /Test...
Tests run: 1093, Failures: 3, Errors: 0, Skipped: 4
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:37 h
[INFO] Finished at: 2020-04-06T18:55:19-07:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project helix-core: There are test failures.
[ERROR]
[ERROR] Please refer to /Users/ksun/dev_branch_helix/helix/helix-core/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Individual test case run will all succeed.
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)