-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24273 HBCK's "Orphan Regions on FileSystem" reports regions wit… #1613
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ | |
|
||
import org.apache.hadoop.fs.FileSystem; | ||
import org.apache.hadoop.fs.Path; | ||
import org.apache.hadoop.hbase.MetaTableAccessor; | ||
import org.apache.hadoop.hbase.ScheduledChore; | ||
import org.apache.hadoop.hbase.ServerName; | ||
import org.apache.hadoop.hbase.client.RegionInfo; | ||
|
@@ -134,7 +135,7 @@ protected synchronized void chore() { | |
loadRegionsFromInMemoryState(); | ||
loadRegionsFromRSReport(); | ||
try { | ||
loadRegionsFromFS(); | ||
loadRegionsFromFS(scanForMergedParentRegions()); | ||
} catch (IOException e) { | ||
LOG.warn("Failed to load the regions from filesystem", e); | ||
} | ||
|
@@ -187,6 +188,31 @@ private void saveCheckResultToSnapshot() { | |
} | ||
} | ||
|
||
/** | ||
* Scan hbase:meta to get set of merged parent regions, this is a very heavy scan. | ||
* | ||
* @return Return generated {@link HashSet} | ||
*/ | ||
private HashSet<String> scanForMergedParentRegions() throws IOException { | ||
HashSet<String> mergedParentRegions = new HashSet<>(); | ||
// Null tablename means scan all of meta. | ||
MetaTableAccessor.scanMetaForTableRegions(this.master.getConnection(), | ||
r -> { | ||
List<RegionInfo> mergeParents = MetaTableAccessor.getMergeRegions(r.rawCells()); | ||
if (mergeParents != null) { | ||
for (RegionInfo mergeRegion : mergeParents) { | ||
if (mergeRegion != null) { | ||
// This region is already being merged | ||
mergedParentRegions.add(mergeRegion.getEncodedName()); | ||
} | ||
} | ||
} | ||
return true; | ||
}, | ||
null); | ||
return mergedParentRegions; | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, may be no way around it given the scan for merged parents is so specialized. This looks good. |
||
private void loadRegionsFromInMemoryState() { | ||
List<RegionState> regionStates = | ||
master.getAssignmentManager().getRegionStates().getRegionStates(); | ||
|
@@ -256,7 +282,7 @@ private void loadRegionsFromRSReport() { | |
} | ||
} | ||
|
||
private void loadRegionsFromFS() throws IOException { | ||
private void loadRegionsFromFS(final HashSet<String> mergedParentRegions) throws IOException { | ||
Path rootDir = master.getMasterFileSystem().getRootDir(); | ||
FileSystem fs = master.getMasterFileSystem().getFileSystem(); | ||
|
||
|
@@ -271,12 +297,12 @@ private void loadRegionsFromFS() throws IOException { | |
continue; | ||
} | ||
HbckRegionInfo hri = regionInfoMap.get(encodedRegionName); | ||
if (hri == null) { | ||
// If it is not in in-memory database and not a merged region, | ||
// report it as an orphan region. | ||
if (hri == null && !mergedParentRegions.contains(encodedRegionName)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if it's not in in-memory database but it is in merged regions, that seems like a problem as well and should be reported? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you referring to the following case?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, merge parents are NOT in in-memory state because they are not active. We just have this background cleaner task that is doing janitorial work on hbase:meta cleaning up meta and filesystem.... |
||
orphanRegionsOnFS.put(encodedRegionName, regionDir); | ||
continue; | ||
} | ||
HbckRegionInfo.HdfsEntry hdfsEntry = new HbckRegionInfo.HdfsEntry(regionDir); | ||
hri.setHdfsEntry(hdfsEntry); | ||
} | ||
numRegions += regionDirs.size(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to make it a bit more generic and have "loadRegionsFromMeta()" function (similar to loadRegionsFromInMemoryState/loadRegionsFromRSReport) as then you'd have another source to compare against - hbase:meta. Then loadRegionsFromFS() would check against that state to see if a region is merged or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is the idea. scanForMergedParentRegions() loads regions from meta. For merge, it is a bit special, the parent regions are deleted from meta already, the only bit left is the merge qualifers in the child region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does InMemoryState not have the needed merge info in it? If not, maybe it should.
The CatalogJanitor is what manages when merge references are let go so this scan of meta is probably necessary.
To the @timoha point, are there other places in hbckchore where we need currrent picture of hbase:meta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack Yeah the in-memory state/meta row for merged parents are let go at early state of MergeRegionsProcedures.
@timoha Per Stack's comments about the source of truth is the in-memory database (meta/procedure store are ways to recover in-memory databse since they are persistent).
At this moment, there is no other usage of regions from meta in hbck chore, the merged parents info is a special case, they are columns from the child region, the only source of truth for merged parents. We can maintain the in-memory hashset for merged parent if meta scan is too costly, which can be addressed later. In case of the future requirements, scanForMergedParentRegions() can be modified to get more info from meta.