-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28354 RegionSizeCalculator throws NPE when regions are in transition #5699
HBASE-28354 RegionSizeCalculator throws NPE when regions are in transition #5699
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine by me.
Do you think there's a case for returning Optional
values instead of 0-means-unknown? I don't see the calling context caring too much -- this is just a hint to the MapReduce scheduler.
I think the problem with that is it just introduces more code. I think there are 2-3 callers of this method and they all pass the result into an InputSplit which takes an int. So if we changed to optional we'd just end up having to update 3 places to do orElse(0) |
Per https://issues.apache.org/jira/browse/HBASE-28399 we might want to return something other than 0. Maybe even just 1, or maybe the average region size from the other regions. |
@frostruan what do you think? |
Thanks for the noticing @bbeaudreault I think we can introduce a constant UNKNOWN_SIZE with type of Size and value of 1 byte to distinguish between having no data and not knowing the specific size. Another important reason for greater than 0 is as I mentioned in HBASE-28399, some computing engines will automatically prune empty partitions, if 0 is returned for in-transition regions, data may be lost. |
agree. I think 1 byte is enough. The size of any region with data cannot be 1 byte. In addition, I think we'd better add some more comments to tell users and developers about the context. |
@frostruan, which part of the code would handle that byte? Also, should that be added to this PR or yours (#5700)? |
I think there are two problems here:
For the first problem, I think maybe we can return something indicates that we can not know the specific data size now, so in the previous discussion, I propose introducing a new constant UNKNOWN_SIZE with value of 1 byte. What do you think ? |
Also ping Duo~ @Apache9 This problem is related to my PR you reviewed yesterday, would you mind taking a look at this too ? |
I'm wondering what the byte constant would hold, other than
Alternatively, if we return the average size of all other regions might be good but it will push the idea of a region in transition too deep into the RegionSizeCalculator and other areas won't know about it, maybe they should? What if there are currently no regions available? What if there is one (or more) region in transition and the size map is empty? What if the size map only contains a region that's empty? I am not sure if these cases are realistic, I'm still new to the codebase. |
I prefer to use 1 byte to represent unknown region size for two reasons:
I'll try to answer your questions later, sorry have to catch the shuttle bus. Thanks. @aalhour |
I don't think regions in state transition and regions no longer exist are same thing. When we call regionLocator.getAllRegionLocations(), we will exlcude offlined split parent regions. You can see https://github.com/apache/hbase/blob/rel/3.0.0-beta-1/hbase-client/src/main/java/org/apache/hadoop/hbase/ClientMetaTableAccessor.java#L172 for details.
I don't think this is a big deal. As @ndimiduk mentioned, it's just a hint for MapReduce scheduler. Thanks. @aalhour |
So, what you mean is that we can return |
Cases where the region's size can be returned as
Not sure if we should do anything in the TableSplit.java class which is used by the above. |
🎊 +1 overall
This message was automatically generated. |
Thanks for your quick address. @aalhour Sorry that maybe I didn't express my thoughts clearly and caused you some confusion. In fact, we share some of the same views. About the constant UNKNOWN_SIZE, I think maybe it should be defined as following:
The byte size of UNKNOWN_SIZE must be greater than 0 so that this input split will not be filtered out because we will use region byte size as the input split length. :) |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hello @frostruan, thanks for clarifying, yeah that's in line with my previous thoughts, I was confused as to why we want to specify a byte and return it from a method that returns longs. Now it's clear. I have just pushed some changes, can you please take one final look and tell me if it's good to be merged? |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
I think this PR is good enough to solve the problem it intends to solve. +1 ! I think there is still some room for optimization here, but this is another problem. We can file another issue to follow up. Thanks. |
ff51051
to
b7dfc35
Compare
Awesome, thanks a lot @frostruan. I just pinged @ndimiduk to help me merge it. Out of curiosity, what remaining problems do you see that need attention? Can we write a quick ticket describing them? I'll see if there's appetite in the team (@HubSpot) to work on them. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java
Outdated
Show resolved
Hide resolved
Thanks a lot ! @aalhour I used to think that we did not fully consider the situation of region merge/split. Once a region is split or merged, it may cause some overlap or loss of the data we read. However, after thinking about it again, I think it seems unlikely. , so please ignore me. :) |
Thanks @ndimiduk, @frostruan and @bbeaudreault for the feedback. I have just reverted the changes back to "returning 0 for unknown regions". |
6834e50
to
cacc9f5
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Thanks for the thoughtful reviews everyone! |
…ition (apache#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…ition (apache#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…ition (apache#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…ition (apache#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…ition (#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…ition (#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…gions are in transition (apache#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…ition (#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…ition (#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
…gions are in transition (apache#5699) When a region is in transition, it may briefly have a null ServerName in meta. The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does not handle the possibility that a RegionLocation.getServerName() could be null. The ServerName is eventually passed into an Admin call, which results in an NPE. This has come up in other contexts. For example, taking a look at getAllRegionLocations() impl, we have checks to ensure that we don't call null server names. We need to similarly handle the possibility of nulls in RegionSizeCalculator. Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Hui Ruan <huiruan@apache.org>
Fixes HBASE-28354.
This PR filters out the regions with
null
ServerName instances from the regions size calculations.