Skip to content

HBASE-23984 [Flakey Tests] TestMasterAbortAndRSGotKilled fails in tea…#1311

Merged
saintstack merged 1 commit intoapache:branch-2from
saintstack:HBASE-23984
Mar 20, 2020
Merged

HBASE-23984 [Flakey Tests] TestMasterAbortAndRSGotKilled fails in tea…#1311
saintstack merged 1 commit intoapache:branch-2from
saintstack:HBASE-23984

Conversation

@saintstack
Copy link
Contributor

…rdown

@saintstack
Copy link
Contributor Author

Updated the changeset comment:



    HBASE-23984 [Flakey Tests] TestMasterAbortAndRSGotKilled fails in teardown

    hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
     Change parameter name and add javadoc to make it more clear what the
     param actually is.

    hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
     Move postOpenDeployTasks so if it fails to talk to the Master -- which
     can happen on cluster shutdown -- then we will do cleanup of state;
     without this the RS can get stuck and won't go down.

    hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
     Add handleException so CRH looks more like UnassignRegionHandler and
     AssignRegionHandler around exception handling. Add a bit of doc on
     why CRH.

    hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
     Right shift most of the body of process so can add in a finally
     that cleans up rs.getRegionsInTransitionInRS is on exception
     (otherwise outstanding entries can stop a RS going down on cluster
     shutdown)

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+1 💚 mvninstall 6m 15s branch-2 passed
+1 💚 checkstyle 1m 17s branch-2 passed
+1 💚 spotbugs 2m 11s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 35s the patch passed
+1 💚 checkstyle 1m 18s hbase-server: The patch generated 0 new + 53 unchanged - 4 fixed = 53 total (was 57)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 17m 44s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
+1 💚 spotbugs 2m 20s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
45m 39s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #1311
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 5f26a9ea0127 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ffb2359
Max. process+thread count 83 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 49s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 6m 36s branch-2 passed
+1 💚 compile 1m 11s branch-2 passed
-1 ❌ shadedjars 0m 10s branch has 7 errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 42s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+1 💚 mvninstall 6m 17s the patch passed
+1 💚 compile 1m 6s the patch passed
+1 💚 javac 1m 6s the patch passed
-1 ❌ shadedjars 0m 10s patch has 7 errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 40s hbase-server in the patch failed.
_ Other Tests _
-0 ⚠️ unit 105m 39s hbase-server in the patch failed.
125m 49s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #1311
Optional Tests javac javadoc unit shadedjars compile
uname Linux eda6b4b0615f 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ffb2359
Default Java 2020-01-14
shadedjars https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-jdk11-hadoop3-check/output/branch-shadedjars.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
shadedjars https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-jdk11-hadoop3-check/output/patch-shadedjars.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/testReport/
Max. process+thread count 5729 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 43s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 5m 50s branch-2 passed
+1 💚 compile 0m 59s branch-2 passed
+1 💚 shadedjars 4m 50s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 35s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 18s the patch passed
+1 💚 compile 1m 7s the patch passed
+1 💚 javac 1m 7s the patch passed
+1 💚 shadedjars 6m 55s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 59s the patch passed
_ Other Tests _
-1 ❌ unit 106m 15s hbase-server in the patch failed.
135m 54s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #1311
Optional Tests javac javadoc unit shadedjars compile
uname Linux 1e3dc7b68916 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ffb2359
Default Java 1.8.0_232
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/testReport/
Max. process+thread count 5504 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/1/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

// opening can not be interrupted by a close request any more.
region = HRegion.openHRegion(regionInfo, htd, rs.getWAL(regionInfo), rs.getConfiguration(),
rs, null);
rs.postOpenDeployTasks(new PostOpenDeployContext(region, openProcId, masterSystemTime));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes! Yeah, this seems better here. Good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No...

IIRC, the design here is that, postOpenDeployTasks is the PONR, if we arrive here, then we can not revert back, the only way to address the exception is to abort the region server.

The fact is that, if we haven't told master anything, it is fine for us to close the region and tell master the failure, but once we have already called master with the succeeded message, even if the rpc call fails, we do not know whether the other side(the master) has received and processed the request already, so the only way is to retry for ever, and if this can not be done, the only way is to abort ourselves...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bq. IIRC, the design here is that, postOpenDeployTasks is the PONR, if we arrive here, then we can not revert back, the only way to address the exception is to abort the region server.

Ok. That helps. Let me add above as comment and ensure the above happens and that I get my fix in.

* <p>Expects that the close has been registered in the hosting RegionServer before
* submitting this Handler; i.e. <code>rss.getRegionsInTransitionInRS().putIfAbsent(
* this.regionInfo.getEncodedNameAsBytes(), Boolean.FALSE);</code> has been called first.
* In here when done, we do the deregister.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helpful observation.

}
}

@Override protected void handleException(Throwable t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's just because I'm new to the *Handler code, but it's not clear to me why one would handle exceptions locally vs. handle them from this handleException method. I guess it's all hooks for operating within the confines of a Runnable off on a thread pool somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, inconsistently used. Here trying to keep w/ the herd.

// the master to split our logs in order to recover the data.
server.abort("Unrecoverable exception while closing region " +
regionInfo.getRegionNameAsString() + ", still finishing close", ioe);
throw new RuntimeException(ioe);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the above comment and seeing you discarded the throwing of this exception, I initially choked. But reading through the actual use of these Handlers in the ExecutorService instance hanging off of HRegionServer, and HMaster I can only conclude that the above throw was only wishful thinking. There's even a comment (emphasis mine):

Start up all services. If any of these threads gets an unhandled exception
then they just die with a logged message. This should be fine because
in general, we do not expect the master to get such unhandled exceptions
as OOMEs; it should be lightly loaded. See what HRegionServer does if
need to install an unexpected exception handler.

The author of the above comment speaks wistfully of what i can only assume is HRegionServer#uncaughtExceptionHandler. However, it doesn't appear that this is threaded down into the executor service, which means this line's throw statement is simply logged and ignored.

So yes, I think removing the throw is the right choice. It removes the false sense of handling this error condition correctly. It's really the abort that protects the content of the memstore.

Also, why is there not a named exception thrown by the memstore when it cannot flush? Seems like a useful point in that data structure's API.

rs.finishRegionProcedure(closeProcId);
LOG.info("Closed {}", regionName);
} finally {
rs.getRegionsInTransitionInRS().remove(encodedNameBytes, Boolean.FALSE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good.

@saintstack
Copy link
Contributor Author

Thanks @ndimiduk . Will wait see if this patch good by @Apache9

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's focus on just making the UT pass here, without changing other code.

I suggest we open a follow on issue, to discuss the abort behavior. To me, the operations in abort method do not make sense. Maybe we just need to try our best to close the connection to zk to let master know we are dead, and then just do a System.exit(1). For now we will do lots of clean up work and even want to flush all the regions? This is not a abort I'd say, it is almost like a graceful shutdown...

// opening can not be interrupted by a close request any more.
region = HRegion.openHRegion(regionInfo, htd, rs.getWAL(regionInfo), rs.getConfiguration(),
rs, null);
rs.postOpenDeployTasks(new PostOpenDeployContext(region, openProcId, masterSystemTime));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No...

IIRC, the design here is that, postOpenDeployTasks is the PONR, if we arrive here, then we can not revert back, the only way to address the exception is to abort the region server.

The fact is that, if we haven't told master anything, it is fine for us to close the region and tell master the failure, but once we have already called master with the succeeded message, even if the rpc call fails, we do not know whether the other side(the master) has received and processed the request already, so the only way is to retry for ever, and if this can not be done, the only way is to abort ourselves...

// Cache the close region procedure id after report region transition succeed.
rs.finishRegionProcedure(closeProcId);
LOG.info("Closed {}", regionName);
} finally {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the actual fix here?

If you really want to do this to let the test pass, I suggest you add the removal in the handleException method, and add a FIXME or TODO comment to say that this is just for making test pass, should be addressed later.

@saintstack
Copy link
Contributor Author

bq. Let's focus on just making the UT pass here, without changing other code.

It is not just about unit test.

bq. I suggest we open a follow on issue, to discuss the abort behavior.

You are welcome to. I'm current just interested in landing a fix for cluster shutdown/RS aborts and concurrent assign/unassigns which causes flakey test failures and hangs in the wild.

bq. To me, the operations in abort method do not make sense. Maybe we just need to try our best to close the connection to zk to let master know we are dead, and then just do a System.exit(1). For now we will do lots of clean up work and even want to flush all the regions? This is not a abort I'd say, it is almost like a graceful shutdown...

For new issue.

@saintstack
Copy link
Contributor Author

bq. If you really want to do this to let the test pass, I suggest you add the removal in the handleException method, and add a FIXME or TODO comment to say that this is just for making test pass, should be addressed later.

I can move it to handleException, np. I will NOT note that it a UT fix only. There is an obvious hole here holds up shutdowns and shutdowns are not UT only.

These Handlers strike me as arbitrary regards where stuff goes; no wonder there are holes.

Let me put up another patch w/ your suggestions.

@saintstack
Copy link
Contributor Author

New push. Enjoys the benefit of @Apache9 feedback. Main change is restoring these Handlers to as they were (but w/ the PONR comment added) and then in the handleException, just removing entry from RS RIT map just before call to abort. Gets me what I want and leaves rest of code as was.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 56s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+1 💚 mvninstall 7m 49s branch-2 passed
+1 💚 checkstyle 1m 26s branch-2 passed
+1 💚 spotbugs 2m 40s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 6m 48s the patch passed
+1 💚 checkstyle 1m 27s hbase-server: The patch generated 0 new + 53 unchanged - 4 fixed = 53 total (was 57)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 22m 19s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
+1 💚 spotbugs 2m 27s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
56m 6s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #1311
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux c521f1c30def 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ffb2359
Max. process+thread count 83 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for now.

Let's open another issue to address the shutdown issue.

// Done! Region is closed on this RS
this.rsServices.getRegionsInTransitionInRS().
remove(this.regionInfo.getEncodedNameAsBytes(), Boolean.FALSE);
LOG.debug("Closed " + region.getRegionInfo().getRegionNameAsString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG.debug("Closed {}", region.getRegionInfo().getRegionNameAsString());

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 6m 30s branch-2 passed
+1 💚 compile 1m 7s branch-2 passed
-1 ❌ shadedjars 0m 12s branch has 7 errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 41s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+1 💚 mvninstall 6m 11s the patch passed
+1 💚 compile 1m 6s the patch passed
+1 💚 javac 1m 6s the patch passed
-1 ❌ shadedjars 0m 11s patch has 7 errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 37s hbase-server in the patch failed.
_ Other Tests _
-0 ⚠️ unit 68m 26s hbase-server in the patch failed.
87m 43s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #1311
Optional Tests javac javadoc unit shadedjars compile
uname Linux 18256abb643f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ffb2359
Default Java 2020-01-14
shadedjars https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-jdk11-hadoop3-check/output/branch-shadedjars.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
shadedjars https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-jdk11-hadoop3-check/output/patch-shadedjars.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/testReport/
Max. process+thread count 5611 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 50s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 6m 6s branch-2 passed
+1 💚 compile 1m 1s branch-2 passed
+1 💚 shadedjars 4m 39s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 17s the patch passed
+1 💚 compile 0m 58s the patch passed
+1 💚 javac 0m 58s the patch passed
+1 💚 shadedjars 4m 27s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 34s the patch passed
_ Other Tests _
-1 ❌ unit 67m 34s hbase-server in the patch failed.
94m 32s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #1311
Optional Tests javac javadoc unit shadedjars compile
uname Linux 5de009b95d02 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ffb2359
Default Java 1.8.0_232
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/testReport/
Max. process+thread count 5609 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/2/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

…rdown

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 Change parameter name and add javadoc to make it more clear what the
 param actually is.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
 Move postOpenDeployTasks so if it fails to talk to the Master -- which
 can happen on cluster shutdown -- then we will do cleanup of state;
 without this the RS can get stuck and won't go down.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 Add handleException so CRH looks more like UnassignRegionHandler and
 AssignRegionHandler around exception handling. Add a bit of doc on
 why CRH.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
 Right shift most of the body of process so can add in a finally
 that cleans up rs.getRegionsInTransitionInRS is on exception
 (otherwise outstanding entries can stop a RS going down on cluster
 shutdown)
@saintstack
Copy link
Contributor Author

One of the test failures -- A perversion around Region handling in TestRegionObserverInterface -- exposed issue w/ the CloseRegionHandler refactor trying to make it look like other handlers around regionsInTransitionInRS handling. Fixed (and fixed the issue @Apache9 noted above). Added region name logging to this journal stuff -- otherwise its just opaque... thats just log change.

@saintstack
Copy link
Contributor Author

Thanks for review @Apache9 . I'd filed HBASE-24015 a few days ago because it seemed plain this issue had opened a can of worms -- and that was before you showed up. You want to go more radical that the scope of HBASE-24015, so I made HBASE-24026 for shutdown redo. Thanks.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+1 💚 mvninstall 6m 18s branch-2 passed
+1 💚 checkstyle 1m 21s branch-2 passed
+1 💚 spotbugs 2m 9s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 43s the patch passed
+1 💚 checkstyle 1m 23s hbase-server: The patch generated 0 new + 259 unchanged - 4 fixed = 259 total (was 263)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 54s Patch does not cause any errors with Hadoop 2.10.0 or 3.1.2.
+1 💚 spotbugs 2m 18s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
40m 14s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #1311
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 0053669810d7 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 8320f73
Max. process+thread count 83 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 23s Docker mode activated.
-0 ⚠️ yetus 0m 8s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 10m 43s branch-2 passed
+1 💚 compile 1m 51s branch-2 passed
-1 ❌ shadedjars 0m 13s branch has 7 errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 1m 2s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+1 💚 mvninstall 9m 49s the patch passed
+1 💚 compile 1m 46s the patch passed
+1 💚 javac 1m 46s the patch passed
-1 ❌ shadedjars 0m 24s patch has 7 errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 1m 9s hbase-server in the patch failed.
_ Other Tests _
+1 💚 unit 82m 49s hbase-server in the patch passed.
113m 27s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #1311
Optional Tests javac javadoc unit shadedjars compile
uname Linux 9bf12a52404a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 8320f73
Default Java 2020-01-14
shadedjars https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/artifact/yetus-jdk11-hadoop3-check/output/branch-shadedjars.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
shadedjars https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/artifact/yetus-jdk11-hadoop3-check/output/patch-shadedjars.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/testReport/
Max. process+thread count 5634 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 25s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 6m 8s branch-2 passed
+1 💚 compile 1m 1s branch-2 passed
+1 💚 shadedjars 4m 57s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 44s the patch passed
+1 💚 compile 1m 0s the patch passed
+1 💚 javac 1m 0s the patch passed
+1 💚 shadedjars 4m 58s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 35s the patch passed
_ Other Tests _
-1 ❌ unit 101m 0s hbase-server in the patch failed.
129m 6s
Subsystem Report/Notes
Docker Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #1311
Optional Tests javac javadoc unit shadedjars compile
uname Linux 3e382637ced1 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 8320f73
Default Java 1.8.0_232
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/testReport/
Max. process+thread count 4172 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1311/3/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@saintstack
Copy link
Contributor Author

Test report shows no failures. The test output got dropped because of below. I've been running this in tests local and seems fine. Will merge and keep an eye on it.

Post stage
[Pipeline] junit
[2020-03-20T21:24:40.217Z] Recording test results
[2020-03-20T21:24:43.747Z] Remote call on H1 failed
Error when executing always post condition:
java.io.IOException: Remote call on H1 failed
at hudson.remoting.Channel.call(Channel.java:963)
at hudson.FilePath.act(FilePath.java:1072)
at hudson.FilePath.act(FilePath.java:1061)
at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:114)
at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:137)
at hudson.tasks.junit.JUnitResultArchiver.parseAndAttach(JUnitResultArchiver.java:167)
at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:52)
at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:25)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class jenkins.model.Jenkins
at hudson.ExtensionList.lookup(ExtensionList.java:433)
at hudson.tasks.junit.TestNameTransformer.all(TestNameTransformer.java:40)
at hudson.tasks.junit.TestNameTransformer.getTransformedName(TestNameTransformer.java:33)
at hudson.tasks.junit.CaseResult.getTransformedTestName(CaseResult.java:273)
at hudson.tasks.junit.SuiteResult.casesByName(SuiteResult.java:134)
at hudson.tasks.junit.SuiteResult.addCase(SuiteResult.java:297)
at hudson.tasks.junit.SuiteResult.(SuiteResult.java:270)
at hudson.tasks.junit.SuiteResult.parseSuite(SuiteResult.java:209)
at hudson.tasks.junit.SuiteResult.parse(SuiteResult.java:181)
at hudson.tasks.junit.TestResult.parse(TestResult.java:348)
at hudson.tasks.junit.TestResult.parsePossiblyEmpty(TestResult.java:281)
at hudson.tasks.junit.TestResult.parse(TestResult.java:206)
at hudson.tasks.junit.TestResult.parse(TestResult.java:178)
at hudson.tasks.junit.TestResult.(TestResult.java:143)
at hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:146)
at hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:118)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3052)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
... 4 more
Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to H1
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1743)
at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
at hudson.remoting.Channel.call(Channel.java:957)
at hudson.FilePath.act(FilePath.java:1072)
at hudson.FilePath.act(FilePath.java:1061)
at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:114)
at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:137)
at hudson.tasks.junit.JUnitResultArchiver.parseAndAttach(JUnitResultArchiver.java:167)
at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:52)
at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:25)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
... 4 more

@saintstack saintstack merged commit 392bce0 into apache:branch-2 Mar 20, 2020
asfgit pushed a commit that referenced this pull request Mar 20, 2020
…rdown (#1311)

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 Change parameter name and add javadoc to make it more clear what the
 param actually is.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
 Move postOpenDeployTasks so if it fails to talk to the Master -- which
 can happen on cluster shutdown -- then we will do cleanup of state;
 without this the RS can get stuck and won't go down.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 Add handleException so CRH looks more like UnassignRegionHandler and
 AssignRegionHandler around exception handling. Add a bit of doc on
 why CRH.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
 Right shift most of the body of process so can add in a finally
 that cleans up rs.getRegionsInTransitionInRS is on exception
 (otherwise outstanding entries can stop a RS going down on cluster
 shutdown)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
asfgit pushed a commit that referenced this pull request Mar 20, 2020
…rdown (#1311)

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 Change parameter name and add javadoc to make it more clear what the
 param actually is.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
 Move postOpenDeployTasks so if it fails to talk to the Master -- which
 can happen on cluster shutdown -- then we will do cleanup of state;
 without this the RS can get stuck and won't go down.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 Add handleException so CRH looks more like UnassignRegionHandler and
 AssignRegionHandler around exception handling. Add a bit of doc on
 why CRH.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
 Right shift most of the body of process so can add in a finally
 that cleans up rs.getRegionsInTransitionInRS is on exception
 (otherwise outstanding entries can stop a RS going down on cluster
 shutdown)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
thangTang pushed a commit to thangTang/hbase that referenced this pull request Apr 16, 2020
…rdown (apache#1311)

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 Change parameter name and add javadoc to make it more clear what the
 param actually is.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
 Move postOpenDeployTasks so if it fails to talk to the Master -- which
 can happen on cluster shutdown -- then we will do cleanup of state;
 without this the RS can get stuck and won't go down.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 Add handleException so CRH looks more like UnassignRegionHandler and
 AssignRegionHandler around exception handling. Add a bit of doc on
 why CRH.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
 Right shift most of the body of process so can add in a finally
 that cleans up rs.getRegionsInTransitionInRS is on exception
 (otherwise outstanding entries can stop a RS going down on cluster
 shutdown)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
thangTang pushed a commit to thangTang/hbase that referenced this pull request Apr 16, 2020
…rdown (apache#1311)

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 Change parameter name and add javadoc to make it more clear what the
 param actually is.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
 Move postOpenDeployTasks so if it fails to talk to the Master -- which
 can happen on cluster shutdown -- then we will do cleanup of state;
 without this the RS can get stuck and won't go down.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 Add handleException so CRH looks more like UnassignRegionHandler and
 AssignRegionHandler around exception handling. Add a bit of doc on
 why CRH.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
 Right shift most of the body of process so can add in a finally
 that cleans up rs.getRegionsInTransitionInRS is on exception
 (otherwise outstanding entries can stop a RS going down on cluster
 shutdown)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
infraio pushed a commit to infraio/hbase that referenced this pull request Aug 17, 2020
…rdown (apache#1311)

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 Change parameter name and add javadoc to make it more clear what the
 param actually is.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
 Move postOpenDeployTasks so if it fails to talk to the Master -- which
 can happen on cluster shutdown -- then we will do cleanup of state;
 without this the RS can get stuck and won't go down.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 Add handleException so CRH looks more like UnassignRegionHandler and
 AssignRegionHandler around exception handling. Add a bit of doc on
 why CRH.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
 Right shift most of the body of process so can add in a finally
 that cleans up rs.getRegionsInTransitionInRS is on exception
 (otherwise outstanding entries can stop a RS going down on cluster
 shutdown)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants