Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-27698 Migrate meta locations from zookeeper to master data may … #5167

Open
wants to merge 1 commit into
base: branch-2
Choose a base branch
from

Conversation

chrajeshbabu
Copy link
Contributor

…not always possible if we migrate from 1.x HBase

…not always possible if we migrate from 1.x HBase
@chrajeshbabu chrajeshbabu requested a review from taklwu April 9, 2023 18:52
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 39s branch-2 passed
+1 💚 compile 2m 25s branch-2 passed
+1 💚 checkstyle 0m 37s branch-2 passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 31s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 20s the patch passed
+1 💚 compile 2m 23s the patch passed
+1 💚 javac 2m 23s the patch passed
+1 💚 checkstyle 0m 34s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 17m 45s Patch does not cause any errors with Hadoop 2.10.2 or 3.2.4 3.3.4.
+1 💚 spotless 0m 42s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 35s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
38m 26s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5167
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux eed655c843a8 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / a67a8f7
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 49s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 4m 52s branch-2 passed
+1 💚 compile 0m 49s branch-2 passed
+1 💚 shadedjars 5m 26s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 28s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 7s the patch passed
+1 💚 compile 0m 48s the patch passed
+1 💚 javac 0m 48s the patch passed
+1 💚 shadedjars 5m 10s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s the patch passed
_ Other Tests _
-1 ❌ unit 205m 19s hbase-server in the patch failed.
232m 18s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5167
Optional Tests javac javadoc unit shadedjars compile
uname Linux d013c5df2212 5.4.0-1097-aws #105~18.04.1-Ubuntu SMP Mon Feb 13 17:50:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / a67a8f7
Default Java Eclipse Adoptium-11.0.17+8
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/testReport/
Max. process+thread count 2750 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 56s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 2m 59s branch-2 passed
+1 💚 compile 0m 40s branch-2 passed
+1 💚 shadedjars 4m 10s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 42s the patch passed
+1 💚 compile 0m 41s the patch passed
+1 💚 javac 0m 41s the patch passed
+1 💚 shadedjars 4m 12s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s the patch passed
_ Other Tests _
-1 ❌ unit 218m 44s hbase-server in the patch failed.
240m 24s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #5167
Optional Tests javac javadoc unit shadedjars compile
uname Linux ec9a8276b47e 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / a67a8f7
Default Java Temurin-1.8.0_352-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/testReport/
Max. process+thread count 2192 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@chrajeshbabu
Copy link
Contributor Author

The test case failures are not related to this change. @taklwu could you please review the change. Thanks.

@Apache9
Copy link
Contributor

Apache9 commented Apr 11, 2023

Why change from throwing an exception to returning false?

@chrajeshbabu
Copy link
Contributor Author

chrajeshbabu commented Apr 11, 2023

Why change from throwing an exception to returning false?

@Apache9
During the express upgrade as mentioned in comment here there is no way to to detect the meta location in the cluster because meta location znode as well as meta wal file gets deleted when the region server stopped gracefully. So region states info for meta is not getting added which leads to init meta procedure. During the meta initialization checking whether any meta table directory present in the cluster or not, if present and proper then we are throwing the IOException saying meta need to be rebuild or meta znode should be created manually. Since this procedure persisted and cannot proceed further even with multiple restarts of master until unless manual steps of meta rebuild or znode creation and procedure store deletion are performed. All these manual steps are error prone and not required because during graceful shutdowns of cluster anyway meta won't be assigned to any server so allowing meta assignment to random server during init meta procedure is correct that is what happening with my patch. Returning false because just to log that meta table directory is not deleted.

Here I have mentioned how the upgrade went smooth with my patch.

@virajjasani
Copy link
Contributor

Thank you @chrajeshbabu. It seems the changes make sense from the upgrade viewpoint. On the other hand, if this rare scenario were to happen on a healthy 2.x cluster, does this change make any difference?

All these manual steps are error prone and not required because during graceful shutdowns of cluster anyway meta won't be assigned to any server so allowing meta assignment to random server during init meta procedure is correct that is what happening with my patch. Returning false because just to log that meta table directory is not deleted.

If this happens on 2.x cluster, what would now be different? Only that meta would be assigned on any random server, correct?

@chrajeshbabu
Copy link
Contributor Author

Thank you @chrajeshbabu. It seems the changes make sense from the upgrade viewpoint. On the other hand, if this rare scenario were to happen on a healthy 2.x cluster, does this change make any difference?

If this happens on 2.x cluster, what would now be different? Only that meta would be assigned on any random server, >correct?

I have tried two scenarios in healthy 2.x cluster with the change

  1. Removed zookeeper data alone and restarted cluster, things came up properly
  2. Removed both zookeeper data as well as master data so that we hit the same scenario of meta initialisation procedure, meta region assigned to random server and one server regions came up properly and remaining servers become unknown(this is not related to this issue will check why it's happening and work on another JIRA), recovered those with hbck2 scheduleRecoveries option and cluster is normal.

By throwing exception also we need to delete master data because of failed init meta procedure not bring the master up and need to follow one of these steps

  1. meta server znode creation which can also leads to unknown servers issue(we should use hbck2 to recover)
    2)delete meta table data in hdfs and rebuild completely which is error prone and exact state of meta may not build sometimes because tables state missing in zookeeper etc...

@virajjasani
Copy link
Contributor

Got it, thanks. This makes sense, +1 from my side.

@virajjasani
Copy link
Contributor

@Apache9 does this look good to you?

@chrajeshbabu
Copy link
Contributor Author

@Apache9 will commit it if it's fine for you. Could you please confirm.

@Apache9
Copy link
Contributor

Apache9 commented Apr 19, 2023

I still need to check the code.

This is a very critical part, if we run InitMetaProcedure when meta exists, it could cause serious data loss problem...

We should try to prevent scheduling the InitMetaProcedure, instead of just letting it go and hoping it would work...

@Apache9
Copy link
Contributor

Apache9 commented Apr 20, 2023

After chekcing the code, I think the correct way for fixing this problem is to do the work in HMaster.tryMigrateMetaLocationsFromZooKeeper. We can some more code in this method, to check whether the meta directory is already there and if it is, if we can make sure that this is an upgrading from 1.x(by trying to read something on the filesystem? Is this possible?), then insert a record to the master region so we can skip scheduling the InitMetaProcedure.

@chrajeshbabu
Copy link
Contributor Author

to check whether the meta directory is already there and if it is, if we can make sure that this is an upgrading from 1.x(by trying to read something on the filesystem? Is this possible?)

we can check the column families in meta table to detect whether it's from 1.x or current versions.

@Apache9
Copy link
Contributor

Apache9 commented Apr 20, 2023

And do we have read replica support on 1.x? Do we also need to insert the record for secondary replicas?

@chrajeshbabu
Copy link
Contributor Author

then insert a record to the master region so we can skip scheduling the InitMetaProcedure.

We can insert record but we may not know the state and location which leads to InitMetaProcedure again.

@Apache9
Copy link
Contributor

Apache9 commented Apr 20, 2023

then insert a record to the master region so we can skip scheduling the InitMetaProcedure.

We can insert record but we may not know the state and location which leads to InitMetaProcedure again.

Checked the code, the condition for whether to schedule an InitMetaProcedure is

if (!this.assignmentManager.getRegionStates().hasTableRegionStates(TableName.META_TABLE_NAME)) {

So I think it only needs we have a record in master region for meta, and do not need to know its state and location?

And we will first try to migrate from zookeeper, and for 1.x, if the cluster shutdown gracefully, there is no znode for meta, then we can enter the logic described above, so in this case, I think the state for meta should be CLOSED?

@chrajeshbabu
Copy link
Contributor Author

So I think it only needs we have a record in master region for meta, and do not need to know its state and location?

And we will first try to migrate from zookeeper, and for 1.x, if the cluster shutdown gracefully, there is no znode for meta, then we can enter the logic described above, so in this case, I think the state for meta should be CLOSED?

That's correct let me check and update the patch accordingly.

@virajjasani
Copy link
Contributor

to check whether the meta directory is already there and if it is, if we can make sure that this is an upgrading from 1.x(by trying to read something on the filesystem? Is this possible?)

we can check the column families in meta table to detect whether it's from 1.x or current versions.

@chrajeshbabu but isn't meta state unknown in your case of migration from 1.x to 2.x? Is the plan to read meta CF from filesystem directly as part of tryMigrateMetaLocationsFromZooKeeper?

@chrajeshbabu
Copy link
Contributor Author

chrajeshbabu commented May 8, 2023

@Apache9 I have tried your suggestion of creating put entry of meta table in master region which gets filled in regionstates and helps to avoid calling init meta procedure there is problem with this approach.
Actually meta assignment called only two places 1) during server crash procedure when the region server went down is carrying meta this is not the case during the express upgrade cases 2) during meta initialisation once after the fs layout creation step is done.
By adding the just meta entry in the master region without knowing the location, won't reach any of the above flows and leading to meta region hanging in transition forever.

Here is the code I have tried.

} else { TableDescriptor metaTableDescriptor = tableDescriptors.get(TableName.META_TABLE_NAME); if(metaTableDescriptor != null && metaTableDescriptor.getColumnFamily(TABLE_FAMILY) == null) { MetaTableAccessor.addRegionInfo(put, RegionInfoBuilder.FIRST_META_REGIONINFO); put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY).setRow(put.getRow()) .setFamily(HConstants.CATALOG_FAMILY) .setQualifier(RegionStateStore.getStateColumn(RegionInfoBuilder.FIRST_META_REGIONINFO.getReplicaId())).setTimestamp(put.getTimestamp()) .setType(Cell.Type.Put).setValue(Bytes.toBytes(RegionState.State.CLOSED.name())).build()); LOG.info(info.toString()); masterRegion.update(r -> r.put(put)); }

@Apache9
Copy link
Contributor

Apache9 commented May 9, 2023

@Apache9 I have tried your suggestion of creating put entry of meta table in master region which gets filled in regionstates and helps to avoid calling init meta procedure there is problem with this approach. Actually meta assignment called only two places 1) during server crash procedure when the region server went down is carrying meta this is not the case during the express upgrade cases 2) during meta initialisation once after the fs layout creation step is done. By adding the just meta entry in the master region without knowing the location, won't reach any of the above flows and leading to meta region hanging in transition forever.

Here is the code I have tried.

} else { TableDescriptor metaTableDescriptor = tableDescriptors.get(TableName.META_TABLE_NAME); if(metaTableDescriptor != null && metaTableDescriptor.getColumnFamily(TABLE_FAMILY) == null) { MetaTableAccessor.addRegionInfo(put, RegionInfoBuilder.FIRST_META_REGIONINFO); put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY).setRow(put.getRow()) .setFamily(HConstants.CATALOG_FAMILY) .setQualifier(RegionStateStore.getStateColumn(RegionInfoBuilder.FIRST_META_REGIONINFO.getReplicaId())).setTimestamp(put.getTimestamp()) .setType(Cell.Type.Put).setValue(Bytes.toBytes(RegionState.State.CLOSED.name())).build()); LOG.info(info.toString()); masterRegion.update(r -> r.put(put)); }

Better post the full patch somewhere so I can check it? And maybe we need to manually schedule a TRSP to bring meta region online in this case...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants