-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-22285 A normalizer which merges small size regions with adjacen… #931
Conversation
💔 -1 overall
This message was automatically generated. |
mergeEnabled = masterRpcServices.isSplitOrMergeEnabled(null, | ||
RequestConverter.buildIsSplitOrMergeEnabledRequest(Admin.MasterSwitchType.MERGE)).getEnabled(); | ||
} catch (ServiceException se) { | ||
LOG.debug("Unable to determine whether merge is enabled", se); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.error() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make it WARN but not error. Its a valid scenario for table to disable merge and still at cluster level have normalizer enabled.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizer.java
Outdated
Show resolved
Hide resolved
* <li> get all regions of a given table | ||
* <li> get avg size S of each region (by total size of store files reported in RegionLoad) | ||
* <li> Otherwise, two region R1 and its smallest neighbor R2 are merged, | ||
* if R1 + R1 < S, and all such regions are returned to be merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: R1 + R2 < S
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching that
candidateIdx++; | ||
} | ||
} catch (Exception e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: LOG.error("MergeNormalizer failed for region {} and {}", r1Name, r2Name, e);
"Table " + table + ", small region size: " + regionSize + " plus its neighbor size: " | ||
+ regionSize2 + ", less than the avg size " + avgRegionSize + ", merging them"); | ||
plans.add(new MergeNormalizationPlan(hri, hri2)); | ||
candidateIdx++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be candidateIdx = candidateIdx + 2
? e.g. if we have 3 regions: r1, r2, r3 and if we consider r1, r2 for merging, plans should have only one entry with Plan(r1, r2) right? but with candidateIdx++
, we might end up adding 2 plans: Plan(r1, r2) and Plan(r2, r3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is candidateIdx++ outside if block and was kept outside purposefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, got it
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional to @virajjasani observations, had made some minor suggestions over the code. On Another thought, the main structure for computePlanForTable looks very similar to the one from SimpleRegionNormalizer. Would it be too difficult to refactor _SimpleRegionNormalizer so that both normalizer implementations could make most of code reuse?
} | ||
HRegionInfo hri2 = tableRegions.get(candidateIdx + 1); | ||
long regionSize2 = getRegionSize(hri2); | ||
if (regionSize >= 0 && regionSize2 >= 0 && regionSize + regionSize2 < avgRegionSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth logging (debug?) regions that didn't pass this check.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizer.java
Outdated
Show resolved
Hide resolved
if (!(new Timestamp(hriTime.getTime() + TimeUnit.DAYS.toMillis(MIN_DURATION_FOR_MERGE))) | ||
.after(currentTime) | ||
|| !(new Timestamp( | ||
hri2Time.getTime() + TimeUnit.DAYS.toMillis(MIN_DURATION_FOR_MERGE))) | ||
.after(currentTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor suggestions:
- Maybe worth mention this extra condition on the class javadoc comment.
- The checking logic seems duplicate. Place the check into a separate method:
!(new Timestamp(hriTime.getTime() + TimeUnit.DAYS.toMillis(MIN_DURATION_FOR_MERGE))) .after(currentTime)
- Make MIN_DURATION_FOR_MERGE configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
💔 -1 overall
This message was automatically generated. |
After giving it a thought i believe we should have two normalizers one which splits and second which merges and may be third SimpleRegionNormalizer which uses these two and together. |
Also i will make the MIN_DURATION_FOR_MERGE configurable. |
I'm not sure moving logic to an util class is a good design practice. Here I think we could probably do a strategy pattern: An abstract normalizer class implementing most of computePlanForTable logic, with only the parts related to picking a region to the region plan and what actions should be perfomed over the region plan (merge, or split, or both) being delegated to methods that would be implemented differently on each of the subclasses. |
💔 -1 overall
This message was automatically generated. |
private static final int MIN_DURATION_FOR_MERGE = 2; | ||
|
||
@Override | ||
public List<NormalizationPlan> computePlanForTable(TableName table) throws HBaseIOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we implement all common parts of computePlanForTable in the base parent class, then only merge or split normalizer specifc logic would need to go on the implementations?
if (isOldEnoughToMerge(hri) || isOldEnoughToMerge(hri2)) { | ||
LOG.info( | ||
"Table {}, small region size: {} plus its neighbor size: {}, less than the avg size " | ||
+ "{}, merging them", | ||
table, regionSize, regionSize2, avgRegionSize); | ||
plans.add(new MergeNormalizationPlan(hri, hri2)); | ||
candidateIdx++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, this could go to a separate, abstract method computeNormalization in the base parent class, then MergeNormalizer would implement it as it is now, and SimpleRegionNormalizer would do it the way it does now.
…r for common functinality
💔 -1 overall
This message was automatically generated. |
RequestConverter.buildIsSplitOrMergeEnabledRequest(Admin.MasterSwitchType.MERGE)) | ||
.getEnabled(); | ||
} catch (ServiceException se) { | ||
LOG.debug("Unable to determine whether merge is enabled", se); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to convert this to error or warn log?
RequestConverter.buildIsSplitOrMergeEnabledRequest(Admin.MasterSwitchType.SPLIT)) | ||
.getEnabled(); | ||
} catch (ServiceException se) { | ||
LOG.debug("Unable to determine whether merge is enabled", se); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WARN it is. 😄
|
||
private static final Log LOG = LogFactory.getLog(SimpleRegionNormalizer.class); | ||
public class SimpleRegionNormalizer extends BaseNormalizer { | ||
private static final Logger LOG = LoggerFactory.getLogger(SimpleRegionNormalizer.class); | ||
private static final int MIN_REGION_COUNT = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MergeNormalizer and SimpleRegionNormalizer both can use same constant MIN_REGION_COUNT
, may be good to move it to HConstant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this yet. Let me think a bit. The point is that both normalizer can have different requirement for the MIN_REGION_COUNT. But then we can't have both normalizer enabled together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, also this is mostly same constant. if we want to use configs for these values(which we don't want I believe), then we could use it separately but here, may be we can refer to same constant.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
|
||
int candidateIdx = 0; | ||
while (candidateIdx < tableRegions.size()) { | ||
if (candidateIdx == tableRegions.size() - 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can get rid of this if condition with while( candidateIdx < tableRegions.size()-1 )
plans.add(new MergeNormalizationPlan(hri, hri2)); | ||
candidateIdx++; | ||
} else { | ||
LOG.debug("Skipping region {} of table {} with size {}", hri.getRegionId(), table, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of regionId, may be good to print region name with hri.getRegionNameAsString()
if (splitEnabled) { | ||
List<NormalizationPlan> splitNormalizationPlan = getSplitNormalizationPlan(table); | ||
if (splitNormalizationPlan != null) { | ||
plans = splitNormalizationPlan; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep it aligned with mergeNormalization, here also, it's good to add all: plans.addAll(splitNormalizationPlan)
} | ||
} | ||
if (mergeEnabled) { | ||
List<NormalizationPlan> normalizationPlans = getMergeNormalizationPlan(table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to mergeNormalizationPlans?
private boolean shouldNormalize(TableName table) { | ||
boolean normalize = false; | ||
if (table == null || table.isSystemTable()) { | ||
LOG.debug("Normalization of system table {} isn't allowed", table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think table could be null here, and we can remove null check from this if condition. Even if null was possible, log message would not be appropriate: Normalization of system table null isn't allowed
💔 -1 overall
This message was automatically generated. |
…t regions