New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16684. Exclude the current JournalNode #4786
HDFS-16684. Exclude the current JournalNode #4786
Conversation
The JournalNodeSyncer will include the local instance in syncing when using a bind host (e.g. 0.0.0.0). There is a mechanism that is supposed to exclude the local instance, but it doesn't recognize the meta-address as a local address. Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log attempts to sync with itself as part of the normal syncing rotation. For an HA configuration running 3 JournalNodes, the "other" list used by the JournalNodeSyncer will include 3 proxies. Exclude bound local addresses, including the use of a wildcard address in the bound host configurations, while still allowing multiple instances on the same host. Allow sync attempts with unresolved addresses, so that sync attempts can drive resolution as servers become available.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
I've run all of the failed test classes locally without an issue. I've opened a ticket specifically for this problem. |
💔 -1 overall
This message was automatically generated. |
Looking at failed tests..... Run 1 hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Run 2 hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | hadoop.hdfs.server.namenode.TestNameNodeMXBean Run 3 hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | hadoop.hdfs.server.datanode.TestBPOfferService .... the one common failure is hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade. Let me try another run... |
I've been looking at all of these classes as part of HDFS-16740. All of these classes will run without issue when executed individually or serially. Each uses |
💔 -1 overall
This message was automatically generated. |
In this last test run, TestDataNodeRollingUpgrade passes: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/6/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeRollingUpgrade/ I'm classing these test failures as flakies. Hopefully the likes of https://issues.apache.org/jira/browse/HDFS-16740 will help. Meantime, let me merge this PR and close out the JIRA. |
Backport from trunk. The JournalNodeSyncer will include the local instance in syncing when using a bind host (e.g. 0.0.0.0). There is a mechanism that is supposed to exclude the local instance, but it doesn't recognize the meta-address as a local address.
Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log attempts to sync with itself as part of the normal syncing rotation. For an HA configuration running 3 JournalNodes, the "other" list used by the JournalNodeSyncer will include 3 proxies.
Exclude bound local addresses, including the use of a wildcard address in the bound host configurations, while still allowing multiple instances on the same host.
Allow sync attempts with unresolved addresses, so that sync attempts can drive resolution as servers become available.
LICENSE
,LICENSE-binary
,NOTICE-binary
files?