-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Backport "HBASE-24419 Normalizer merge plans should consider more than 2 region…" to branch-2 #2596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport "HBASE-24419 Normalizer merge plans should consider more than 2 region…" to branch-2 #2596
Conversation
…s when possible The core change here is to the loop in `SimpleRegionNormalizer#computeMergeNormalizationPlans`. It's a nested loop that walks the table's region chain once, looking for contiguous sequences of regions that meet the criteria for merge. The outer loop tracks the starting point of the next sequence, the inner loop looks for the end of that sequence. A single sequence becomes an instance of `MergeNormalizationPlan`. Signed-off-by: Huaxiang Sun <huaxiangsun@apache.org>
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
|
||
| final double avgRegionSizeMb = ctx.getAverageRegionSizeMb(); | ||
| final long avgRegionSizeMb = (long) ctx.getAverageRegionSizeMb(); | ||
| if (avgRegionSizeMb < mergeMinRegionSizeMb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is an old PR but can you explain while we skip merging if our average region size is low?
Let's say min region size is 2GB but my average region size is 750MB then this would not do anything while we'd like to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to look back through the original JIRA and PR, but I believe there was a concern that normalizer would merge away the splits made on table creation, between the time when the table was created and when the operator got around to loading it with data. The "minimum table size" configuration was designed to prevent this. This behavior pre-dated HBASE-24419 ; the functionality was preserved when this optimization was implemented.
I personally am not a fan, and prefer the "minimum table age" configuration for handling of this concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Nick for the prompt response. Much appreciated!
This is not really the place for it but I've been working on a backport of the HBase 3 normalizer to an old version over here https://github.com/opencore/hbase-normalizer which includes a few other improvements. The work was triggered because the current Normalizer doesn't consider a minimum size when splitting (unless I'm completely mistaken). For various bad reasons we had tens of thousands of regions with an average size of 7 MB or so. Which means even a 14MB region (twice the average) would be split leading to even more regions etc. so we introduced customizable multipliers and minimum size for splits.
Your context helps. I hope to have the time to contribute these changes back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a JIRA/git audit recently, but the normalizer on master and branch-2 should be identical. branch-2.4 has most of the big improvements that I'm aware of, including support for rate-limiting.
Be advised that its settings for merging are still very coarse. I've been using it recently on large tables containing large regions and find that the 2x-off-average is not practical. It needs more tuning and possibly new settings in order to work well in a large ranges of table topologies.
Depending on how old of an HBase you're targeting, there was a bug where it would always force-merge, which could cause errors in meta. I advise against running normalizer before 2.3.
…s when possible
The core change here is to the loop in
SimpleRegionNormalizer#computeMergeNormalizationPlans. It's a nestedloop that walks the table's region chain once, looking for contiguous
sequences of regions that meet the criteria for merge. The outer loop
tracks the starting point of the next sequence, the inner loop looks
for the end of that sequence. A single sequence becomes an instance of
MergeNormalizationPlan.Signed-off-by: Huaxiang Sun huaxiangsun@apache.org