-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24896 'Stuck' in static initialization creating RegionInfo inst… #2274
Conversation
Hard to reproduce so no test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch lgtm, just one question on an alternate approach, if thats doable.
*/ | ||
// Removed because creation was creating a static deadlock, HBASE-24896 | ||
@Deprecated | ||
RegionInfo UNDEFINED = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was wondering if we can move the following into RegionInfo to avoid this deprecation stuff. For that we will have to switch to RegionInfoBuilder c'tor because the following usage is a private c'tor. Curious if you considered that and didn't want to do it for some reason. The c'tor was specifically designed for this usecase, so I may be missing some context here.
public static final RegionInfo FIRST_META_REGIONINFO =
new MutableRegionInfo(1L, TableName.META_TABLE_NAME, RegionInfo.DEFAULT_REPLICA_ID);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @bharathv
Say more. You suggesting move FIRST_META_REGIONINFO define into RegionInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You suggesting move FIRST_META_REGIONINFO define into RegionInfo?
Exactly. I think thats one way to break the loop without this deprecation schedule. I don't know if it has other implications (especially if it breaks the semantics described in HBASE-17980).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd glanced in passing but there are ~100 refs to old location. Let me try it... make sure we are not just moving this problem. Would be better I agree if it worked and we could avoid this deprecation. @bharathv
3068af4
to
90f73f7
Compare
Your suggestion is better @bharathv .... more code change but looks safe to me. RegionInfoBuilder, the host for FIRST_META_REGIONINFO is @InterfaceAudience.Private so moving it is 'allowed'. Thought of adding back something like the below to RegionInfoBuilder in case any downstream references... public static final RegionInfo FIRST_META_REGIONINFO = RegionInfo.FIRST_META_REGIONINFO; ... but that might bring back the statics load deadlock in a new guise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending QA bot.
but that might bring back the statics load deadlock in a new guise.
Ya. I see what you are saying. Recursive static initializations are tricky.
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this really solve the problem?
RegionInfoBuilder and RegionInfo deeply reference each other, even if you remove the FIRST_META_REGIONINFO, there are lots of methods which return value or parameters are RegionInfo, and the MutableRegionInfo is a sub class of RegionInfo, and also a ineer class of RegionInfoBuilder...
FWIW, if we really want to break the tie, since RegionInfo is just an interface, which means it is much 'cleaner', we should remove the dependency of RegionInfoBuilder from RegionInfo. That is, just move the UNDEFINED field to RegionInfoBuilder. I know this breaks the IA.Public rule, so maybe we just leave 2.3.x as is, since this is not easy to reproduce and does not cause any data loss. Let's make it clean on 2.4+.
And in general, I do not think we want users to make use of RegionInfo.UNDEFINED directly? It should be put to RegionInfoBuilder at the first place... |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Right. This patch just reduces the window and chance of init race. If we see the jstacks (and see all the places that got changed in the patch), FIRST_META_REGIONINFO is touched in many init code paths and can potentially cause these races. The whole dependency system is still fragile and as you said, the proper way is to decouple them and I think that would be a much bigger refactor given the static fields are sprinkled all over the place and some of those places are IA.Public. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after going through init paths correlation with FIRST_META_REGIONINFO, how much it's being referred to in all places and how static init can cause a possible deadlock during init. And I also agree to Duo's concerns of not having sufficient changes for a complete safety from other possibile deadlock ways (due to how much RegionInfo and RegionInfoBuilder are inter-twined), but this patch should be quite a relief till a reasonably good extent.
+1 |
Problem is hard to reproduce. Reasoning, this patch shoud reduce likelihood of our encountering this static loading lockup.
Yeah, breaks the IA.Public -- hence this patch which does not break IA.Public. Following the rules, we can't move UNDEFINED till 4.0 -- not 2.4. A server hung in this manner where all priority handlers are jammed up but heartbeating, etc. keeps 'working' broke the cluster; the operator ended up doing cluster restart... Disruptive. Would like to land a workaround/fix for 2.3+. Perhaps you prefer the first patch which just deprecated UNDEFINED setting it to null so as not to reference RegionInfoBuilder? Thanks |
Wondering if moving |
I could make a PR to do that or do that tooo.... Thanks @virajjasani |
4ce004f
to
092a3a8
Compare
Thanks for the help here. The last push implements @virajjasani 's suggestion. It moves the MutableRegionInfo out of RegionInfoBuilder to be standalone. This allows breaking of a static reference to RegionInfoBuilder from RegionInfo. A static Reference from RegionInfo to MutableRegionInfo remains but it seems clean; MRI has no interesting statics initializations for its part. This approach looks sufficient breaking the possible static load tangle. I could add this to first patch -- deprecating and nulling UNDEFINED -- and/or add the nice @bharathv suggestion. Either would nail it but this looks enough. |
Setting it to null does not make real difference with removing it? If users use it, their code will be in trouble... To me, I think this is a mistake, it should not be placed in RegionInfo, or at least should be marked as IA.Private? We have lots of IA.Private fields and methods in RegionInfo. So maybe, we deprecated the field on 2.3, and also marked it as IA.Private, to tell users you should not use it directly. This patch will be committed to all branches. And file another issue, to completely move this field from RegionInfo to RegionInfoBuilder, and this patch will be committed to branches other than 2.3. For me I prefer we commit this to 2.4+, but I'm also fine with committing to 3.x only, or even 4.x, since this problem is really not easy to reproduce. Thanks. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
And I think the opposite. RegionInfo seems natural home for static defines such as UNDEFINED and first Region of meta. Thanks for the other suggestions. Let me add IA.Private and deprecation. Let me add that here to this last patch. Shout if you -1 on this last approach. Otherwise, please remove changes requested. Thanks. |
…ance Untangle RegionInfo, RegionInfoBuilder, and MutableRegionInfo static initializations some. Move MutableRegionInfo from inner-class of RegionInfoBuilder to be (package private) standalone. Undo static initializing references from RI to RIB.
092a3a8
to
4578ac2
Compare
Added IA.Private and deprecation on UNDEFINED in RI per @Apache9 suggestion. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Wondering if this is sufficient. Going by the jstacks, FIRST_META_REGIONINFO access triggers a MutableRegionInfo load and that triggers the parent interface RegionInfo load [1]. And any UNDEFINED access in RegionInfo triggers the exact opposite order. I was under the impression that we should break this tie. While un-nesting MutableRegionInfo is a welcome change, curious if that solves the problem. Thanks. [1] https://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4.2 |
🎊 +1 overall
This message was automatically generated. |
Thats for the helpful pointer @bharathv and your input. Given the above discussion and my not wanting to waste valuable review resource, I think I need to reproduce this issue so can say for sure whether an approach works; I speculate too much overusing the 'hard-to-reproduce' license. Let me add the IA.Private and the deprecation per @Apache9 suggests in a separate issue for now at least while I figure how to repro. the lockup. Thank you for reviews @bharathv , @virajjasani , and @Apache9 |
Closing this PR. Will open a new one against this JIRA when I have a better story. |
…ance
Patch deprecates and nulls RegionInfo#UNDEFINED (added by HBASE-22723)
so as to break possible static initialization deadlock. Adds a local
UNDEFINED to the only place where it is used, in CatalogJanitor doing
fixup.
Cleans up checkstyle complaints in RegionInfo.