Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-15654/15674: Fix TestBPOfferService#testMissBlocksWhenReregister #5089

Closed
wants to merge 4 commits into from

Conversation

xinglin
Copy link
Contributor

@xinglin xinglin commented Oct 28, 2022

Description of PR

Cherry-pick a couple of PRs from trunk to branch-3.3 for fixing TestBPOfferService#testMissBlocksWhenReregister

HDFS-15654: minor conflict due to testCommandProcessingThreadExit being backported to branch-3.3 before HDFS-15654.
HDFS-15674: clean cherry-pick.
HDFS-16224: clean cherry-pick.

How was this patch tested?

Run tests three times without error.

mvn test -Dtest="TestBPOfferService"

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 10m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 39m 29s branch-3.3 passed
+1 💚 compile 1m 21s branch-3.3 passed
+1 💚 checkstyle 1m 0s branch-3.3 passed
+1 💚 mvnsite 1m 33s branch-3.3 passed
+1 💚 javadoc 1m 41s branch-3.3 passed
+1 💚 spotbugs 3m 39s branch-3.3 passed
+1 💚 shadedclient 28m 27s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 20s the patch passed
+1 💚 compile 1m 13s the patch passed
+1 💚 javac 1m 13s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 45s the patch passed
+1 💚 mvnsite 1m 20s the patch passed
+1 💚 javadoc 1m 27s the patch passed
+1 💚 spotbugs 3m 28s the patch passed
+1 💚 shadedclient 27m 51s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 218m 51s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 55s The patch does not generate ASF License warnings.
341m 42s
Reason Tests
Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor
hadoop.hdfs.TestRollingUpgrade
hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/1/artifact/out/Dockerfile
GITHUB PR #5089
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 635fa4db048e 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / d5d5616
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/1/testReport/
Max. process+thread count 2200 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/1/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@xinglin
Copy link
Contributor Author

xinglin commented Oct 29, 2022

This PR only changed TestBPOfferService. It does not change any other files. The failed hdfs unit tests are probably due to flaky minihdfs.

@ashutoshcipher
Copy link
Contributor

@xinglin - Can you trigger to trigger once again. Let's have a happy Yetus

@xinglin
Copy link
Contributor Author

xinglin commented Oct 29, 2022

Hi @ashutoshcipher,

@xinglin - Can you trigger to trigger once again. Let's have a happy Yetus

Made an empty commit to trigger another build. Let's whether we will get a happy yetus.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 39m 38s branch-3.3 passed
+1 💚 compile 1m 22s branch-3.3 passed
+1 💚 checkstyle 1m 0s branch-3.3 passed
+1 💚 mvnsite 1m 30s branch-3.3 passed
+1 💚 javadoc 1m 45s branch-3.3 passed
+1 💚 spotbugs 3m 44s branch-3.3 passed
+1 💚 shadedclient 28m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 22s the patch passed
+1 💚 compile 1m 14s the patch passed
+1 💚 javac 1m 14s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 45s the patch passed
+1 💚 mvnsite 1m 17s the patch passed
+1 💚 javadoc 1m 25s the patch passed
+1 💚 spotbugs 3m 23s the patch passed
+1 💚 shadedclient 28m 21s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 219m 14s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 55s The patch does not generate ASF License warnings.
333m 30s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.TestFsck
hadoop.hdfs.server.mover.TestMover
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier
hadoop.hdfs.server.namenode.TestNameNodeMXBean
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/artifact/out/Dockerfile
GITHUB PR #5089
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d180c017e220 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / 40e025b
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/testReport/
Max. process+thread count 2239 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@xinglin
Copy link
Contributor Author

xinglin commented Oct 30, 2022

It looks like four different unit tests failed the second time and there is no overlap between the first and second build. I guess unit tests were run in different orders between different builds and thus we see different unit tests failed each time.

@ashutoshcipher, I don't think these non-deterministic unit tests failures are related with this PR, since we only modified TestBPOfferService.java. What do you think?

@ashutoshcipher
Copy link
Contributor

Thanks @xinglin. Can you verify the failed tests once by running them in your local with the patch?

@xinglin
Copy link
Contributor Author

xinglin commented Oct 31, 2022

[INFO] Running org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.157 s - in org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
--
[INFO] Running org.apache.hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor
[ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 150.891 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor
--
[INFO] Running org.apache.hadoop.hdfs.TestRollingUpgrade
[INFO] Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 194.055 s - in org.apache.hadoop.hdfs.TestRollingUpgrade
--
[INFO] Running org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.377 s - in org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized
--
[INFO] Running org.apache.hadoop.hdfs.server.namenode.TestFsck
[INFO] Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 193.754 s - in org.apache.hadoop.hdfs.server.namenode.TestFsck
--
[INFO] Running org.apache.hadoop.hdfs.server.mover.TestMover
[INFO] Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 247.094 s - in org.apache.hadoop.hdfs.server.mover.TestMover
--
[INFO] Running org.apache.hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier
[INFO] Tests run: 29, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 470.211 s - in org.apache.hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier
--
[INFO] Running org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.597 s - in org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean
--
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 81.953 s - in org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

TestDecommissionWithStripedBackoffMonitor is still failing sometimes. It is also reported in HDFS-15840 for trunk (unresolved).

[ERROR] testDecommissionWithMissingBlock(org.apache.hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor)  Time elapsed: 5.233 s  <<< FAILURE!
java.lang.AssertionError: expected:<10> but was:<11>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910)

TestDFSInotifyEventInputStreamKerberized sometimes also failed. It is reported in HDFS-16167 for trunk (unresolved).

TestExternalStoragePolicySatisfier sometimes failed due to timeout. But also got some success runs. If needed, we could increase the timeout for testMultipleFilesForSatisfyStoragePolicy, as what has been done in HDFS-16609, for testSPSWhenFileHasExcessRedundancyBlocks.

[ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 480.456 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier
[ERROR] testMultipleFilesForSatisfyStoragePolicy(org.apache.hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier)  Time elapsed: 41.857 s  <<< ERROR!
java.util.concurrent.TimeoutException:
Timed out waiting for condition.
Thread diagnostics:
Timestamp: 2022-10-30 04:39:45,299

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 40m 48s branch-3.3 passed
+1 💚 compile 1m 25s branch-3.3 passed
+1 💚 checkstyle 1m 5s branch-3.3 passed
+1 💚 mvnsite 1m 39s branch-3.3 passed
+1 💚 javadoc 1m 52s branch-3.3 passed
+1 💚 spotbugs 3m 44s branch-3.3 passed
+1 💚 shadedclient 28m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 20s the patch passed
+1 💚 compile 1m 12s the patch passed
+1 💚 javac 1m 12s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 44s the patch passed
+1 💚 mvnsite 1m 21s the patch passed
+1 💚 javadoc 1m 27s the patch passed
+1 💚 spotbugs 3m 23s the patch passed
+1 💚 shadedclient 28m 9s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 215m 35s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
331m 7s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
hadoop.hdfs.server.balancer.TestBalancer
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/3/artifact/out/Dockerfile
GITHUB PR #5089
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 9730025dcbf8 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / 2e4f45b
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/3/testReport/
Max. process+thread count 2290 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/3/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@xinglin
Copy link
Contributor Author

xinglin commented Oct 31, 2022

Tried a couple of times to run these unit tests locally. They did fail sometimes but also managed to get a few success runs. I don't think we should block this PR by these non-deterministic/not relevant unit tests failures.

[INFO] Running org.apache.hadoop.hdfs.TestRollingUpgrade
[INFO] Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 181.336 s - in org.apache.hadoop.hdfs.TestRollingUpgrade
--
[INFO] Running org.apache.hadoop.hdfs.server.balancer.TestBalancer
[INFO] Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 284.437 s - in org.apache.hadoop.hdfs.server.balancer.TestBalancer
--
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 134.764 s - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade

@xinglin
Copy link
Contributor Author

xinglin commented Jun 10, 2023

close this PR as two of these commits have been backported to branch-3.3 by others.

Author:     Masatake Iwasaki <iwasakims@apache.org>
AuthorDate: Wed Nov 18 16:11:09 2020 +0900
Commit:     Ayush Saxena <ayushsaxena@apache.org>
CommitDate: Sun Feb 12 01:50:39 2023 +0530

    HDFS-15674. TestBPOfferService#testMissBlocksWhenReregister fails on trunk. (#2467)

commit c17734b7472f7c92f933f2c6a7ab7e7ef5f17dea
Author:     Ahmed Hussein <50450311+amahussein@users.noreply.github.com>
AuthorDate: Wed Oct 28 18:24:34 2020 -0500
Commit:     Ayush Saxena <ayushsaxena@apache.org>
CommitDate: Sun Feb 12 01:50:26 2023 +0530

    HDFS-15654. TestBPOfferService#testMissBlocksWhenReregister fails intermittently (#2419)

@xinglin xinglin closed this Jun 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants