Skip to content

Conversation

@ndimiduk
Copy link
Member

…s when possible

The core change here is to the loop in
SimpleRegionNormalizer#computeMergeNormalizationPlans. It's a nested
loop that walks the table's region chain once, looking for contiguous
sequences of regions that meet the criteria for merge. The outer loop
tracks the starting point of the next sequence, the inner loop looks
for the end of that sequence. A single sequence becomes an instance of
MergeNormalizationPlan.

Signed-off-by: Huaxiang Sun huaxiangsun@apache.org

…s when possible

The core change here is to the loop in
`SimpleRegionNormalizer#computeMergeNormalizationPlans`. It's a nested
loop that walks the table's region chain once, looking for contiguous
sequences of regions that meet the criteria for merge. The outer loop
tracks the starting point of the next sequence, the inner loop looks
for the end of that sequence. A single sequence becomes an instance of
`MergeNormalizationPlan`.

Signed-off-by: Huaxiang Sun <huaxiangsun@apache.org>
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 2m 17s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 4m 4s branch-2 passed
+1 💚 checkstyle 1m 39s branch-2 passed
+1 💚 spotbugs 3m 0s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 3m 45s the patch passed
+1 💚 checkstyle 1m 40s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 13m 56s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 3m 38s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
44m 14s
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2596
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux b59c48d6f54b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 69f282e
Max. process+thread count 94 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 4m 4s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 33s branch-2 passed
+1 💚 compile 1m 18s branch-2 passed
+1 💚 shadedjars 5m 57s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 59s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 3m 15s the patch passed
+1 💚 compile 1m 20s the patch passed
+1 💚 javac 1m 20s the patch passed
+1 💚 shadedjars 5m 55s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 58s the patch passed
_ Other Tests _
+1 💚 unit 1m 31s hbase-common in the patch passed.
+1 💚 unit 138m 41s hbase-server in the patch passed.
170m 38s
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2596
Optional Tests javac javadoc unit shadedjars compile
uname Linux c11fc3f9c86b 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 69f282e
Default Java 1.8.0_232
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/testReport/
Max. process+thread count 4373 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 7m 15s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for branch
+1 💚 mvninstall 4m 39s branch-2 passed
+1 💚 compile 1m 36s branch-2 passed
+1 💚 shadedjars 7m 22s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 7s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 4m 28s the patch passed
+1 💚 compile 1m 36s the patch passed
+1 💚 javac 1m 36s the patch passed
+1 💚 shadedjars 7m 22s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 4s the patch passed
_ Other Tests _
+1 💚 unit 2m 12s hbase-common in the patch passed.
+1 💚 unit 192m 32s hbase-server in the patch passed.
233m 58s
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2596
Optional Tests javac javadoc unit shadedjars compile
uname Linux aa61d5922da1 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 69f282e
Default Java 2020-01-14
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/testReport/
Max. process+thread count 3317 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@ndimiduk ndimiduk merged commit b84e2f5 into apache:branch-2 Oct 30, 2020
@ndimiduk ndimiduk deleted the 24419-normalizer-multimerge-branch-2 branch October 30, 2020 17:43

final double avgRegionSizeMb = ctx.getAverageRegionSizeMb();
final long avgRegionSizeMb = (long) ctx.getAverageRegionSizeMb();
if (avgRegionSizeMb < mergeMinRegionSizeMb) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an old PR but can you explain while we skip merging if our average region size is low?

Let's say min region size is 2GB but my average region size is 750MB then this would not do anything while we'd like to merge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to look back through the original JIRA and PR, but I believe there was a concern that normalizer would merge away the splits made on table creation, between the time when the table was created and when the operator got around to loading it with data. The "minimum table size" configuration was designed to prevent this. This behavior pre-dated HBASE-24419 ; the functionality was preserved when this optimization was implemented.

I personally am not a fan, and prefer the "minimum table age" configuration for handling of this concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Nick for the prompt response. Much appreciated!

This is not really the place for it but I've been working on a backport of the HBase 3 normalizer to an old version over here https://github.com/opencore/hbase-normalizer which includes a few other improvements. The work was triggered because the current Normalizer doesn't consider a minimum size when splitting (unless I'm completely mistaken). For various bad reasons we had tens of thousands of regions with an average size of 7 MB or so. Which means even a 14MB region (twice the average) would be split leading to even more regions etc. so we introduced customizable multipliers and minimum size for splits.

Your context helps. I hope to have the time to contribute these changes back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a JIRA/git audit recently, but the normalizer on master and branch-2 should be identical. branch-2.4 has most of the big improvements that I'm aware of, including support for rate-limiting.

Be advised that its settings for merging are still very coarse. I've been using it recently on large tables containing large regions and find that the 2x-off-average is not practical. It needs more tuning and possibly new settings in order to work well in a large ranges of table topologies.

Depending on how old of an HBase you're targeting, there was a bug where it would always force-merge, which could cause errors in meta. I advise against running normalizer before 2.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants