Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-25156 TestMasterFailover.testSimpleMasterFailover is flaky #2507

Merged
merged 1 commit into from Oct 8, 2020

Conversation

ndimiduk
Copy link
Member

@ndimiduk ndimiduk commented Oct 7, 2020

Change the test to wait for evidence that the active master has seen that the backup master killed by the test has gone away. This is done before proceeding to validate that the dead backup is correctly omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using assertEquals instead of assertTrue(...equals(...)) and correcting expected vs. actual ordering of assertion arguments.

Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.
@ndimiduk
Copy link
Member Author

ndimiduk commented Oct 8, 2020

adhoc_run_tests.sh clears 20 iterations locally with this patch. Without, makes it only 3 to 7.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 52s master passed
+1 💚 checkstyle 1m 5s master passed
+1 💚 spotbugs 2m 3s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 25s the patch passed
+1 💚 checkstyle 1m 5s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 17m 34s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 36s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
41m 16s
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2507
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 921da3989eb5 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a8c49a6
Max. process+thread count 94 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 11s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 17s master passed
+1 💚 compile 0m 59s master passed
+1 💚 shadedjars 7m 11s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 52s the patch passed
+1 💚 compile 0m 59s the patch passed
+1 💚 javac 0m 59s the patch passed
+1 💚 shadedjars 7m 11s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s the patch passed
_ Other Tests _
-1 ❌ unit 207m 47s hbase-server in the patch failed.
236m 34s
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #2507
Optional Tests javac javadoc unit shadedjars compile
uname Linux 5fa752a6ad86 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a8c49a6
Default Java 1.8.0_232
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/testReport/
Max. process+thread count 3177 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 3m 9s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 6m 46s master passed
+1 💚 compile 1m 50s master passed
+1 💚 shadedjars 10m 23s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 48s hbase-server in master failed.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 49s the patch passed
+1 💚 compile 1m 39s the patch passed
+1 💚 javac 1m 39s the patch passed
+1 💚 shadedjars 8m 47s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 1m 5s hbase-server in the patch failed.
_ Other Tests _
+1 💚 unit 204m 46s hbase-server in the patch passed.
246m 11s
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2507
Optional Tests javac javadoc unit shadedjars compile
uname Linux dbecb2e7a838 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a8c49a6
Default Java 2020-01-14
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/testReport/
Max. process+thread count 3205 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2507/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@ndimiduk
Copy link
Member Author

ndimiduk commented Oct 8, 2020

Failure in TestBucketCache is a known flaky. Thanks for the review @saintstack .

@ndimiduk ndimiduk merged commit b82d8a5 into apache:master Oct 8, 2020
@ndimiduk ndimiduk deleted the 25156-flaky-TestMasterFailover branch October 8, 2020 21:23
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Oct 8, 2020
…che#2507)

Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
ndimiduk added a commit that referenced this pull request Oct 8, 2020
Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Oct 8, 2020
…che#2507)

Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
ndimiduk added a commit that referenced this pull request Oct 8, 2020
Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Oct 8, 2020
…che#2507)

Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
ndimiduk added a commit that referenced this pull request Oct 8, 2020
Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Oct 8, 2020
…che#2507)

Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
ndimiduk added a commit that referenced this pull request Oct 8, 2020
Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
@ndimiduk
Copy link
Member Author

ndimiduk commented Oct 9, 2020

Saw this test fail again around the same place up on Jenkins. Active master did not acknowledge receipt of backup departure after waiting 30 seconds.

[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 59.294 s <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover
[ERROR] org.apache.hadoop.hbase.master.TestMasterFailover.testSimpleMasterFailover  Time elapsed: 35.418 s  <<< FAILURE!
java.lang.AssertionError: Waiting timed out after [30,000] msec
	at org.apache.hadoop.hbase.master.TestMasterFailover.testSimpleMasterFailover(TestMasterFailover.java:131)

clarax pushed a commit to clarax/hbase that referenced this pull request Nov 15, 2020
…che#2507)

Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
wchevreuil pushed a commit to wchevreuil/hbase that referenced this pull request May 24, 2021
…che#2507)

Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.

Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.

Signed-off-by: Michael Stack <stack@apache.org>
(cherry picked from commit e760d9d)

Change-Id: I2816cf18fc169c6e8d7e6b5aaafee40e4682d6fc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants