-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24273 HBCK's "Orphan Regions on FileSystem" reports regions wit… #1613
Conversation
When deciding if a region is an orphan FS region, it also need to check if a region is a merged parent region in meta table. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -134,7 +135,7 @@ protected synchronized void chore() { | |||
loadRegionsFromInMemoryState(); | |||
loadRegionsFromRSReport(); | |||
try { | |||
loadRegionsFromFS(); | |||
loadRegionsFromFS(scanForMergedParentRegions()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to make it a bit more generic and have "loadRegionsFromMeta()" function (similar to loadRegionsFromInMemoryState/loadRegionsFromRSReport) as then you'd have another source to compare against - hbase:meta. Then loadRegionsFromFS() would check against that state to see if a region is merged or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is the idea. scanForMergedParentRegions() loads regions from meta. For merge, it is a bit special, the parent regions are deleted from meta already, the only bit left is the merge qualifers in the child region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does InMemoryState not have the needed merge info in it? If not, maybe it should.
The CatalogJanitor is what manages when merge references are let go so this scan of meta is probably necessary.
To the @timoha point, are there other places in hbckchore where we need currrent picture of hbase:meta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack Yeah the in-memory state/meta row for merged parents are let go at early state of MergeRegionsProcedures.
@timoha Per Stack's comments about the source of truth is the in-memory database (meta/procedure store are ways to recover in-memory databse since they are persistent).
At this moment, there is no other usage of regions from meta in hbck chore, the merged parents info is a special case, they are columns from the child region, the only source of truth for merged parents. We can maintain the in-memory hashset for merged parent if meta scan is too costly, which can be addressed later. In case of the future requirements, scanForMergedParentRegions() can be modified to get more info from meta.
if (hri == null) { | ||
// If it is not in in-memory database and not a merged region, | ||
// report it as an orphan region. | ||
if (hri == null && !mergedParentRegions.contains(encodedRegionName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's not in in-memory database but it is in merged regions, that seems like a problem as well and should be reported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the following case?
if (hri == null && mergedParentRegions.contains(encodedRegionName)) {}
This is valid for merged parent regions as of today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, merge parents are NOT in in-memory state because they are not active. We just have this background cleaner task that is doing janitorial work on hbase:meta cleaning up meta and filesystem....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of on a tangental question: There seems to be 3 sources of regions: FS, hbase:meta and in-memory. All of them need to be in sync, but what is the source of truth? Is it possible that the region is not in hbase:meta but is in in-memory state regionInfoMap? Could that be a good thing to validate in hbck?
My understanding, the source of the truth is hbase:meta. When master switchover, the in-memory db is rebuilt from hbase:meta. For the case of "the region is not in hbase:meta but in in-memory state regionInfoMap", it is a bug which needs to be fixed. CatalogJanitor will report "holes" as well. |
Unittest failures are real, working on it, will update the patch. |
3e4d080
to
aceba6f
Compare
💔 -1 overall
This message was automatically generated. |
2 similar comments
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
aceba6f
to
2b53c06
Compare
Pushed fix for testing failures in TestHbckChore. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Source of truth is Master in-memory. It has most up-to-date, comprehensive state. It effects change. Changes to state are orchestrated by the Master via the Procedures system. Procedures record intention and steps-taken in the Procedure store. When pertinent (changes that processes other-than master need to know of), Procedures publish state changes to hbase:meta (and for some state mostly internal to the cluster, to zk). On crash, Master re-loads from persistent stores -- hbase:meta and the Procedure store -- and reconstructs the before-crash state. (Note added AFTER review and I'm reminded of what this patch is about) My BS above is high-level picture of how things are supposed to work... answering Andrey's question. What is going on in here is a mis-Reporting by HBCK Report of resources that are for GC after references are let-go. |
@@ -134,7 +135,7 @@ protected synchronized void chore() { | |||
loadRegionsFromInMemoryState(); | |||
loadRegionsFromRSReport(); | |||
try { | |||
loadRegionsFromFS(); | |||
loadRegionsFromFS(scanForMergedParentRegions()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does InMemoryState not have the needed merge info in it? If not, maybe it should.
The CatalogJanitor is what manages when merge references are let go so this scan of meta is probably necessary.
To the @timoha point, are there other places in hbckchore where we need currrent picture of hbase:meta?
null); | ||
return mergedParentRegions; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, may be no way around it given the scan for merged parents is so specialized.
This looks good.
if (hri == null) { | ||
// If it is not in in-memory database and not a merged region, | ||
// report it as an orphan region. | ||
if (hri == null && !mergedParentRegions.contains(encodedRegionName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, merge parents are NOT in in-memory state because they are not active. We just have this background cleaner task that is doing janitorial work on hbase:meta cleaning up meta and filesystem....
for (RegionInfo ri: mergeParents) { | ||
regionStates.deleteRegion(ri); | ||
|
||
} | ||
regionStateStore.mergeRegions(child, mergeParents, serverName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. good. lets keep an eye on this one because sometimes fun, unexpected conditions when change order of operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens during my test, which is doing a hbck report every 10 ms, yeah, will check the sequence to make sure it does not cause any unexpected conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped this change as the fix is only targeting to a corner case for a small time window (~10ms). regionMap is being referenced in more than 100 places, it is hard to figure out every possible case.
@@ -39,12 +39,17 @@ | |||
* probably not what you want. | |||
* @param conf configuration | |||
* @return ConnectionImplementation object for <code>conf</code> | |||
* @throws ZooKeeperConnectionException | |||
* @throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty javadoc like this provokes complaint by checkstyle. Just remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
…h referenced HFiles
2b53c06
to
d895fa0
Compare
@saintstack and @timoha, patch is updated based on your comments, please review. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good by me. Will leave it open here a while in case feedback from @timoha
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Thanks @saintstack. @timoha Morning, any objection/concern to merge the code? |
Good by me. |
I am going to merge this request. If @timoha has more comments, plan to address them with an addendum or revert, thanks for the reviews! |
…h referenced HFiles (apache#1613) Signed-off-by: stack <stack@apache.org>
…h referenced HFiles (apache#1613) Signed-off-by: stack <stack@apache.org>
…h referenced HFiles (apache#1613) Signed-off-by: stack <stack@apache.org>
…h referenced HFiles (apache#1613) Signed-off-by: stack <stack@apache.org>
…h referenced HFiles (apache#1613) Signed-off-by: stack <stack@apache.org>
…h referenced HFiles (apache#1613) (apache#1662) Signed-off-by: stack <stack@apache.org>
…h referenced HFiles (apache#1613) (apache#1682) Author: huaxiangsun Reason: Bug Ref: CDPD-15964 Signed-off-by: stack <stack@apache.org> Change-Id: Ibc25dd219e62972f7cb988f82ca453d7a5cba51f (cherry picked from commit 4387cdd)
…h referenced HFiles (apache#1613) (apache#1682) Signed-off-by: stack <stack@apache.org> (cherry picked from commit 61a5af0) Change-Id: Ibc25dd219e62972f7cb988f82ca453d7a5cba51f
…h referenced HFiles