Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-25297 [HBCK2] Regenerate missing table descriptors by hbck2 #79

Merged
merged 2 commits into from
Dec 21, 2020

Conversation

symat
Copy link
Contributor

@symat symat commented Dec 15, 2020

We have been in a situation, when we saw many ServerCrashProcedure to be blocked, because of the AssignProcedure is blocked because of the missing .tabledesc files.

java.io.IOException: Missing table descriptor for dd0676f57bdbff5d04ab735b7daf5e9b

In our case it was OK for us to get rid of these tables and we were able to use setRegionState to move all these regions to FAILED_OPEN state, then disable and drop the tables. But this took a lot of time and we might not always have the option to drop these tables.

HBCK 1 had a functionality (fixTableOrphans) to regenerate the table descriptors according to the memory cache or hdfs table directory structure. In this patch I implemented the same logic for HBCK 2. I created tests for the new feature and also tested it with a real HBase 2.6.1 cluster.

We just have been in a situation, when we saw many ServerCrashProcedure to
be blocked, because of the AssignProcedure is blocked because of the missing
.tabledesc files.

java.io.IOException: Missing table descriptor for dd0676f57bdbff5d04ab735b7daf5e9b

In our case it was OK for us to get rid of these tables and we were able to
use setRegionState to move all these regions to FAILED_OPEN state, then disable
and drop the tables. But this took a lot of time and we might not always have
the option to drop these tables.

HBCK 1 had a functionality (fixTableOrphans) to regenerate the table descriptors
according to the memory cache or hdfs table directory structure. In this patch
I implemented the same logic for HBCK 2. I created tests for the new feature and
also tested it with a real HBase 2.6.1 cluster.
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 13s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+0 🆗 spotbugs 0m 0s spotbugs executables are not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 1m 6s master passed
+1 💚 compile 0m 9s master passed
+1 💚 checkstyle 0m 7s master passed
+1 💚 javadoc 0m 7s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 11s the patch passed
+1 💚 compile 0m 9s the patch passed
+1 💚 javac 0m 9s the patch passed
-1 ❌ checkstyle 0m 4s hbase-hbck2: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
-1 ❌ whitespace 0m 0s The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 javadoc 0m 6s the patch passed
_ Other Tests _
-1 ❌ unit 6m 41s hbase-hbck2 in the patch failed.
+1 💚 asflicense 0m 5s The patch does not generate ASF License warnings.
10m 7s
Reason Tests
Failed junit tests hbase.TestHBCK2
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/1/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #79
Optional Tests dupname asflicense markdownlint javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux afd8694830ae 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool maven
git revision master / 5cdc0e2
Default Java Oracle Corporation-1.8.0_275-b01
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/1/artifact/yetus-precommit-check/output/diff-checkstyle-hbase-hbck2.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/1/artifact/yetus-precommit-check/output/whitespace-eol.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/1/artifact/yetus-precommit-check/output/patch-unit-hbase-hbck2.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/1/testReport/
Max. process+thread count 939 (vs. ulimit of 1000)
modules C: hbase-hbck2 U: hbase-hbck2
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/1/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -200,6 +200,26 @@ Command:
for how to generate new report.
SEE ALSO: reportMissingRegionsInMeta

generateMissingTableDescriptorFile <TABLENAME>
Trying to fix an orphan table by generating a missing table descriptor
file. This command will have no affect if the table folder is missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "effect", not "affect"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

import org.apache.hadoop.hbase.client.TableDescriptor;
import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
import org.apache.hadoop.hbase.util.FSTableDescriptors;
import org.apache.hadoop.hbase.util.FSUtils;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FSTableDescriptors and FSUtils are IA Private. This has caused frequently problems to maintain operators tools compiling, or even compatible at runtime. To solve that, we have been duplicating these utility classes in operator-tools project. See HBCKFsUtils and HBCKMetaTableAccessor for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'll duplicate these as well

try {
if (!fs.exists(tableDir)) {
throw new IllegalStateException("Exiting without changing anything. " +
"Table folder not exists: " + tableDir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Table folder does not exist"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

fstd.createTableDescriptor(tableDescriptorFromMaster.get(), false);
LOG.info("Table descriptor written successfully. Orphan table {} fixed.", tableName);
} else {
generateDefaultTableInfo(fstd, tableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to refresh master's cache with the table descriptor? Had quick checked master rpc interface, didn't find any available method, maybe something we could add on a next jira.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good idea, I'll create a follow-up Jira. Also I'll mention in the usage that currently a master restart might be required to force the cache to reinitialize.

This definitely can be improved further, but it might need some more investigation. During my manual tests I saw the table to reappear (shown by the list command) quickly after the missing tableinfo file got generated. So something must have been trying to open the table periodically. However, the scan operations failed on the table until I did a rolling restart. (I haven't checked the procedures before restarting the cluster, I guess something got stucked in the Region Server still)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my manual tests I saw the table to reappear (shown by the list command) quickly after the missing tableinfo file got generated.

Interesting. Have you tried disable/enable the table after re-creating the table info?

@symat
Copy link
Contributor Author

symat commented Dec 16, 2020

@wchevreuil thanks for the review and for the comments! I tried to address all the issues.

I had to copy quite a few InterfaceAudit.Private functions / classes in the end. (I wonder if we should move these to a different package...)

I also fixed all existing checkstyle issues in hbase-hbck2, so mvn checkstyle:check -pl hbase-hbck2 should pass now.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 5s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+0 🆗 spotbugs 0m 0s spotbugs executables are not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 0m 58s master passed
+1 💚 compile 0m 8s master passed
+1 💚 checkstyle 0m 6s master passed
+1 💚 javadoc 0m 7s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 11s the patch passed
+1 💚 compile 0m 8s the patch passed
+1 💚 javac 0m 8s the patch passed
+1 💚 checkstyle 0m 4s hbase-hbck2: The patch generated 0 new + 0 unchanged - 10 fixed = 0 total (was 10)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 5s the patch passed
_ Other Tests _
-1 ❌ unit 9m 58s hbase-hbck2 in the patch failed.
+1 💚 asflicense 0m 6s The patch does not generate ASF License warnings.
13m 9s
Reason Tests
Failed junit tests hbase.TestHBCKMetaTableAccessor
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/2/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #79
Optional Tests dupname asflicense markdownlint javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux d063b671cea7 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool maven
git revision master / 5cdc0e2
Default Java Oracle Corporation-1.8.0_275-b01
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/2/artifact/yetus-precommit-check/output/patch-unit-hbase-hbck2.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/2/testReport/
Max. process+thread count 583 (vs. ulimit of 1000)
modules C: hbase-hbck2 U: hbase-hbck2
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/2/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+0 🆗 spotbugs 0m 0s spotbugs executables are not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 1m 23s master passed
+1 💚 compile 0m 10s master passed
+1 💚 checkstyle 0m 7s master passed
+1 💚 javadoc 0m 8s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 10s the patch passed
+1 💚 compile 0m 10s the patch passed
+1 💚 javac 0m 10s the patch passed
+1 💚 checkstyle 0m 4s hbase-hbck2: The patch generated 0 new + 0 unchanged - 10 fixed = 0 total (was 10)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 6s the patch passed
_ Other Tests _
+1 💚 unit 5m 58s hbase-hbck2 in the patch passed.
+1 💚 asflicense 0m 6s The patch does not generate ASF License warnings.
9m 40s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/3/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #79
Optional Tests dupname asflicense markdownlint javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 7aba398bfddd 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool maven
git revision master / 5cdc0e2
Default Java Oracle Corporation-1.8.0_275-b01
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/3/testReport/
Max. process+thread count 929 (vs. ulimit of 1000)
modules C: hbase-hbck2 U: hbase-hbck2
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Operator-Tools-PreCommit/job/PR-79/3/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@wchevreuil wchevreuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@wchevreuil wchevreuil merged commit b461d58 into apache:master Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants