Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16684. Exclude the current JournalNode #4786

Merged
merged 1 commit into from Aug 28, 2022

Conversation

snmvaughan
Copy link
Contributor

Backport from trunk. The JournalNodeSyncer will include the local instance in syncing when using a bind host (e.g. 0.0.0.0). There is a mechanism that is supposed to exclude the local instance, but it doesn't recognize the meta-address as a local address.

Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log attempts to sync with itself as part of the normal syncing rotation. For an HA configuration running 3 JournalNodes, the "other" list used by the JournalNodeSyncer will include 3 proxies.

Exclude bound local addresses, including the use of a wildcard address in the bound host configurations, while still allowing multiple instances on the same host.

Allow sync attempts with unresolved addresses, so that sync attempts can drive resolution as servers become available.

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

The JournalNodeSyncer will include the local instance in syncing when using a bind host (e.g. 0.0.0.0).  There is a mechanism that is supposed to exclude the local instance, but it doesn't recognize the meta-address as a local address.

Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log attempts to sync with itself as part of the normal syncing rotation.  For an HA configuration running 3 JournalNodes, the "other" list used by the JournalNodeSyncer will include 3 proxies.

Exclude bound local addresses, including the use of a wildcard address in the bound host configurations, while still allowing multiple instances on the same host.

Allow sync attempts with unresolved addresses, so that sync attempts can drive resolution as servers become available.
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 10m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 39m 46s branch-3.3 passed
+1 💚 compile 1m 31s branch-3.3 passed
+1 💚 checkstyle 1m 4s branch-3.3 passed
+1 💚 mvnsite 1m 38s branch-3.3 passed
+1 💚 javadoc 1m 48s branch-3.3 passed
+1 💚 spotbugs 3m 40s branch-3.3 passed
+1 💚 shadedclient 28m 51s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 25s the patch passed
+1 💚 compile 1m 17s the patch passed
+1 💚 javac 1m 17s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 48s the patch passed
+1 💚 mvnsite 1m 24s the patch passed
+1 💚 javadoc 1m 25s the patch passed
+1 💚 spotbugs 3m 26s the patch passed
+1 💚 shadedclient 28m 8s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 219m 30s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 0s The patch does not generate ASF License warnings.
344m 34s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
hadoop.hdfs.server.datanode.TestBPOfferService
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/1/artifact/out/Dockerfile
GITHUB PR #4786
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 10685164b950 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / f332c1e
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/1/testReport/
Max. process+thread count 2180 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/1/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 39m 37s branch-3.3 passed
+1 💚 compile 1m 29s branch-3.3 passed
+1 💚 checkstyle 1m 5s branch-3.3 passed
+1 💚 mvnsite 1m 37s branch-3.3 passed
+1 💚 javadoc 1m 44s branch-3.3 passed
+1 💚 spotbugs 3m 40s branch-3.3 passed
+1 💚 shadedclient 28m 46s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 22s the patch passed
+1 💚 compile 1m 14s the patch passed
+1 💚 javac 1m 14s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 47s the patch passed
+1 💚 mvnsite 1m 21s the patch passed
+1 💚 javadoc 1m 23s the patch passed
+1 💚 spotbugs 3m 27s the patch passed
+1 💚 shadedclient 28m 6s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 217m 17s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 3s The patch does not generate ASF License warnings.
332m 12s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
hadoop.hdfs.server.namenode.TestNameNodeMXBean
hadoop.hdfs.TestDecommissionWithStriped
hadoop.hdfs.server.namenode.TestFsck
hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/4/artifact/out/Dockerfile
GITHUB PR #4786
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d85ed4270070 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / f332c1e
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/4/testReport/
Max. process+thread count 2291 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/4/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@snmvaughan
Copy link
Contributor Author

I've run all of the failed test classes locally without an issue. I've opened a ticket specifically for this problem.
https://issues.apache.org/jira/browse/HDFS-16740

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 39m 44s branch-3.3 passed
+1 💚 compile 1m 31s branch-3.3 passed
+1 💚 checkstyle 1m 6s branch-3.3 passed
+1 💚 mvnsite 1m 40s branch-3.3 passed
+1 💚 javadoc 1m 44s branch-3.3 passed
+1 💚 spotbugs 3m 42s branch-3.3 passed
+1 💚 shadedclient 29m 3s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 22s the patch passed
+1 💚 compile 1m 16s the patch passed
+1 💚 javac 1m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 48s the patch passed
+1 💚 mvnsite 1m 23s the patch passed
+1 💚 javadoc 1m 25s the patch passed
+1 💚 spotbugs 3m 32s the patch passed
+1 💚 shadedclient 28m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 228m 29s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 57s The patch does not generate ASF License warnings.
344m 40s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
hadoop.hdfs.server.datanode.TestBPOfferService
hadoop.hdfs.server.balancer.TestBalancer
hadoop.hdfs.TestDecommission
hadoop.hdfs.server.namenode.TestFsck
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/5/artifact/out/Dockerfile
GITHUB PR #4786
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux bb0d70e0e75f 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / f332c1e
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/5/testReport/
Max. process+thread count 2201 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/5/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@saintstack
Copy link
Contributor

Looking at failed tests.....

Run 1

hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade

  | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
  | hadoop.hdfs.server.datanode.TestBPOfferService

Run 2

hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade

  | hadoop.hdfs.server.namenode.TestNameNodeMXBean
  | hadoop.hdfs.TestDecommissionWithStriped
  | hadoop.hdfs.server.namenode.TestFsck
  | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes

Run 3

hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade

  | hadoop.hdfs.server.datanode.TestBPOfferService
  | hadoop.hdfs.server.balancer.TestBalancer
  | hadoop.hdfs.TestDecommission
  | hadoop.hdfs.server.namenode.TestFsck

.... the one common failure is hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade. Let me try another run...

@snmvaughan
Copy link
Contributor Author

I've been looking at all of these classes as part of HDFS-16740. All of these classes will run without issue when executed individually or serially. Each uses MiniDFSCluster for testing but there is a lot of variations in the handling of the base directory and cleanup, and some try to restart the cluster for each test method instead of just allocating a new cluster.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 39m 32s branch-3.3 passed
+1 💚 compile 1m 26s branch-3.3 passed
+1 💚 checkstyle 1m 4s branch-3.3 passed
+1 💚 mvnsite 1m 37s branch-3.3 passed
+1 💚 javadoc 1m 45s branch-3.3 passed
+1 💚 spotbugs 3m 41s branch-3.3 passed
+1 💚 shadedclient 28m 47s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 25s the patch passed
+1 💚 compile 1m 17s the patch passed
+1 💚 javac 1m 17s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 46s the patch passed
+1 💚 mvnsite 1m 24s the patch passed
+1 💚 javadoc 1m 28s the patch passed
+1 💚 spotbugs 3m 28s the patch passed
+1 💚 shadedclient 28m 9s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 226m 22s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 57s The patch does not generate ASF License warnings.
341m 24s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
hadoop.hdfs.TestReconstructStripedFileWithValidator
hadoop.hdfs.server.namenode.TestFsck
hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/6/artifact/out/Dockerfile
GITHUB PR #4786
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 2ba5d80a8761 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / f332c1e
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/6/testReport/
Max. process+thread count 2208 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/6/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@saintstack
Copy link
Contributor

In this last test run, TestDataNodeRollingUpgrade passes: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4786/6/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeRollingUpgrade/

I'm classing these test failures as flakies. Hopefully the likes of https://issues.apache.org/jira/browse/HDFS-16740 will help. Meantime, let me merge this PR and close out the JIRA.

@saintstack saintstack merged commit 833fc64 into apache:branch-3.3 Aug 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants