Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-24885 STUCK RIT by hbck2 assigns #2989

Merged
merged 2 commits into from Mar 9, 2021

Conversation

symat
Copy link
Contributor

@symat symat commented Feb 25, 2021

This PR is about to backport HBASE-24885 to branch-2.2.
The backport jira ticket: HBASE-25606
I put the original commit message below.
The cherry-pick was based on the commit on branch-2.3. It was quite clean, only a few trivial conflicts to resolve.
(cherry picked from commit c7e31f7)


Adds region state check on hbck2 assigns/unassigns. Returns pid of -1
if in inappropriate state with logging explaination which suggests
passing override if operator wants to assign/unassign anyways. Here
is an example of what happens now if hbck2 tries an unassign and
Region already unassigned:

2020-08-19 11:22:06,926 INFO [RpcServer.default.FPBQ.Fifo.handler=1,queue=0,port=50086] assignment.AssignmentManager(820): Failed {ENCODED => d1112e553991e938b6852f87774c91ee, NAME => 'TestHbck,zzzzz,1597861310769.d1112e553991e938b6852f87774c91ee.', STARTKEY => 'zzzzz', ENDKEY => ''} unassign, override=false; set override to by-pass state checks.
org.apache.hadoop.hbase.client.DoNotRetryRegionException: Unexpected state for state=CLOSED, location=null, table=TestHbck, region=d1112e553991e938b6852f87774c91ee
at org.apache.hadoop.hbase.master.assignment.AssignmentManager.preTransitCheck(AssignmentManager.java:583)
at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createOneUnassignProcedure(AssignmentManager.java:812)
at org.apache.hadoop.hbase.master.MasterRpcServices.unassigns(MasterRpcServices.java:2616)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:397)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

Previous it would just create the unassign anyways. Now must pass override
to queue the procedure regardless. Safer.

hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
javadoc on assigns/unassigns. Minor refactor in assigns/unassigns to cater to
case where procedure may come back null (if override not set and fails state checks).

hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
checkstyle cleanups.
Clarifying javadoc on how there is no state checking when bulk assigns creating/enabling
tables.

createOneAssignProcedure and createOneUnassignProcedure now handle exceptions which now
can be thrown if no override and region state is not appropriate.

Aggregation of createAssignProcedure and createUnassignProcedure instances adding in
region state check invoked if override is NOT set.

hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateNode.java
Change to setProcedure so it returns passed proc as result instead of void

Signed-off-by: Duo Zhang zhangduo@apache.org

Adds region state check on hbck2 assigns/unassigns. Returns pid of -1
if in inappropriate state with logging explaination which suggests
passing override if operator wants to assign/unassign anyways. Here
is an example of what happens now if hbck2 tries an unassign and
Region already unassigned:

  2020-08-19 11:22:06,926 INFO  [RpcServer.default.FPBQ.Fifo.handler=1,queue=0,port=50086] assignment.AssignmentManager(820): Failed {ENCODED => d1112e553991e938b6852f87774c91ee, NAME => 'TestHbck,zzzzz,1597861310769.d1112e553991e938b6852f87774c91ee.', STARTKEY => 'zzzzz', ENDKEY => ''} unassign, override=false; set override to by-pass state checks.
  org.apache.hadoop.hbase.client.DoNotRetryRegionException: Unexpected state for state=CLOSED, location=null, table=TestHbck, region=d1112e553991e938b6852f87774c91ee
          at org.apache.hadoop.hbase.master.assignment.AssignmentManager.preTransitCheck(AssignmentManager.java:583)
          at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createOneUnassignProcedure(AssignmentManager.java:812)
          at org.apache.hadoop.hbase.master.MasterRpcServices.unassigns(MasterRpcServices.java:2616)
          at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$2.callBlockingMethod(MasterProtos.java)
          at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:397)
          at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
          at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
          at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

Previous it would just create the unassign anyways. Now must pass override
to queue the procedure regardless. Safer.

hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
 javadoc on assigns/unassigns. Minor refactor in assigns/unassigns to cater to
 case where procedure may come back null (if override not set and fails state checks).

hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 checkstyle cleanups.
 Clarifying javadoc on how there is no state checking when bulk assigns creating/enabling
 tables.

 createOneAssignProcedure and createOneUnassignProcedure now handle exceptions which now
 can be thrown if no override and region state is not appropriate.

 Aggregation of createAssignProcedure and createUnassignProcedure instances adding in
 region state check invoked if override is NOT set.

hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateNode.java
 Change to setProcedure so it returns passed proc as result instead of void

Signed-off-by: Duo Zhang <zhangduo@apache.org>
(cherry picked from commit c7e31f7)
@symat
Copy link
Contributor Author

symat commented Feb 25, 2021

the original PR: #2283

Copy link
Contributor

@wchevreuil wchevreuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I guess the slight change in behaviour not an issue to backport it to branch-2.2?

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-2.2 Compile Tests _
+1 💚 mvninstall 3m 43s branch-2.2 passed
+1 💚 compile 0m 58s branch-2.2 passed
+1 💚 checkstyle 1m 31s branch-2.2 passed
+1 💚 shadedjars 4m 29s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 44s branch-2.2 passed
+0 🆗 spotbugs 4m 18s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 4m 16s branch-2.2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 28s the patch passed
+1 💚 compile 1m 1s the patch passed
+1 💚 javac 1m 1s the patch passed
-1 ❌ checkstyle 1m 32s hbase-server: The patch generated 1 new + 14 unchanged - 6 fixed = 15 total (was 20)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedjars 4m 26s patch has no errors when building our shaded downstream artifacts.
+1 💚 hadoopcheck 33m 14s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 2.10.0 or 3.1.2 3.2.1.
+1 💚 javadoc 0m 44s the patch passed
+1 💚 findbugs 4m 24s the patch passed
_ Other Tests _
+1 💚 unit 133m 30s hbase-server in the patch passed.
+1 💚 asflicense 0m 31s The patch does not generate ASF License warnings.
209m 7s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2989/1/artifact/out/Dockerfile
GITHUB PR #2989
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname Linux fb65d5706c63 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-2989/out/precommit/personality/provided.sh
git revision branch-2.2 / a43fd04
Default Java Oracle Corporation-1.8.0_282-b08
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2989/1/artifact/out/diff-checkstyle-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2989/1/testReport/
Max. process+thread count 2555 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2989/1/console
versions git=2.20.1 maven=3.6.3 findbugs=3.1.11
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@symat
Copy link
Contributor Author

symat commented Feb 26, 2021

@wchevreuil thanks for the quick review!

LGTM. I guess the slight change in behaviour not an issue to backport it to branch-2.2?

this change is only introducing extra checks on the master Rpc API called by hbck2. Even if there would be difference inthe assignment procedure between 2.2 and 2.3, these checks happen before the procedure would be even triggered. I tested the change on a 2.2 cluster, and after applying this change:

  • assign/unassign of regions both in OPEN or CLOSED state works as expected using the HBase shell
  • HBCK2 command 'assigns' on an OPEN region doesn't succeed and doesn't screw up anything in HBase master. Using the --override flag will bypass the check and will result stuck RIT and also OPENING state in the meta.
  • HBCK2 command 'assigns' on a CLOSED region works as expected and assigns the region
  • HBCK2 command 'unassigns' on an OPEN region works as expected and unassigns the region
  • HBCK2 command 'unassigns' on an CLOSED region doesn't succeed. You need to providee the --override flag to bypass the checks. But actually in this case it is harmless (AFAICS HBsae master state won't change after the unassign procedure on a CLOSED region)

p.s. I'll check the checkstyle problems soon

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 12s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-2.2 Compile Tests _
+1 💚 mvninstall 3m 31s branch-2.2 passed
+1 💚 compile 0m 56s branch-2.2 passed
+1 💚 checkstyle 1m 29s branch-2.2 passed
+1 💚 shadedjars 4m 20s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s branch-2.2 passed
+0 🆗 spotbugs 3m 38s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 35s branch-2.2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 9s the patch passed
+1 💚 compile 0m 55s the patch passed
+1 💚 javac 0m 55s the patch passed
+1 💚 checkstyle 1m 29s hbase-server: The patch generated 0 new + 14 unchanged - 6 fixed = 14 total (was 20)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedjars 4m 16s patch has no errors when building our shaded downstream artifacts.
+1 💚 hadoopcheck 28m 4s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 2.10.0 or 3.1.2 3.2.1.
+1 💚 javadoc 0m 35s the patch passed
+1 💚 findbugs 3m 40s the patch passed
_ Other Tests _
+1 💚 unit 117m 21s hbase-server in the patch passed.
+1 💚 asflicense 0m 27s The patch does not generate ASF License warnings.
182m 38s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2989/2/artifact/out/Dockerfile
GITHUB PR #2989
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname Linux 12c4991cfe64 4.15.0-126-generic #129-Ubuntu SMP Mon Nov 23 18:53:38 UTC 2020 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-agent/workspace/Base-PreCommit-GitHub-PR_PR-2989/out/precommit/personality/provided.sh
git revision branch-2.2 / b08c133
Default Java Oracle Corporation-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2989/2/testReport/
Max. process+thread count 2842 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2989/2/console
versions git=2.20.1 maven=3.6.3 findbugs=3.1.11
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@wchevreuil wchevreuil merged commit ae2bbfe into apache:branch-2.2 Mar 9, 2021
wchevreuil pushed a commit to wchevreuil/hbase that referenced this pull request May 24, 2021
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
(cherry picked from commit ae2bbfe)
Change-Id: I8a920c021843fbb2d369ca7d1c112bf131c51cd3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants