Backport "HBASE-24419 Normalizer merge plans should consider more than 2 region…" to branch-2 #2596

ndimiduk · 2020-10-29T20:45:06Z

…s when possible

The core change here is to the loop in
SimpleRegionNormalizer#computeMergeNormalizationPlans. It's a nested
loop that walks the table's region chain once, looking for contiguous
sequences of regions that meet the criteria for merge. The outer loop
tracks the starting point of the next sequence, the inner loop looks
for the end of that sequence. A single sequence becomes an instance of
MergeNormalizationPlan.

Signed-off-by: Huaxiang Sun huaxiangsun@apache.org

…s when possible The core change here is to the loop in `SimpleRegionNormalizer#computeMergeNormalizationPlans`. It's a nested loop that walks the table's region chain once, looking for contiguous sequences of regions that meet the criteria for merge. The outer loop tracks the starting point of the next sequence, the inner loop looks for the end of that sequence. A single sequence becomes an instance of `MergeNormalizationPlan`. Signed-off-by: Huaxiang Sun <huaxiangsun@apache.org>

Apache-HBase · 2020-10-29T21:30:07Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	2m 17s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ branch-2 Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 4s	branch-2 passed
+1 💚	checkstyle	1m 39s	branch-2 passed
+1 💚	spotbugs	3m 0s	branch-2 passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 45s	the patch passed
+1 💚	checkstyle	1m 40s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	13m 56s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚	spotbugs	3m 38s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 25s	The patch does not generate ASF License warnings.
		44m 14s

Subsystem	Report/Notes
Docker	Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#2596
Optional Tests	dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname	Linux b59c48d6f54b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `69f282e`
Max. process+thread count	94 (vs. ulimit of 12500)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/console
versions	git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2020-10-29T23:36:29Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	4m 4s	Docker mode activated.
-0 ⚠️	yetus	0m 5s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ branch-2 Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 33s	branch-2 passed
+1 💚	compile	1m 18s	branch-2 passed
+1 💚	shadedjars	5m 57s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 59s	branch-2 passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 15s	the patch passed
+1 💚	compile	1m 20s	the patch passed
+1 💚	javac	1m 20s	the patch passed
+1 💚	shadedjars	5m 55s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 58s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 31s	hbase-common in the patch passed.
+1 💚	unit	138m 41s	hbase-server in the patch passed.
		170m 38s

Subsystem	Report/Notes
Docker	Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR	#2596
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux c11fc3f9c86b 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `69f282e`
Default Java	1.8.0_232
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/testReport/
Max. process+thread count	4373 (vs. ulimit of 12500)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/console
versions	git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2020-10-30T00:39:46Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	7m 15s	Docker mode activated.
-0 ⚠️	yetus	0m 6s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ branch-2 Compile Tests _
+0 🆗	mvndep	0m 14s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 39s	branch-2 passed
+1 💚	compile	1m 36s	branch-2 passed
+1 💚	shadedjars	7m 22s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	1m 7s	branch-2 passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 17s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 28s	the patch passed
+1 💚	compile	1m 36s	the patch passed
+1 💚	javac	1m 36s	the patch passed
+1 💚	shadedjars	7m 22s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	1m 4s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 12s	hbase-common in the patch passed.
+1 💚	unit	192m 32s	hbase-server in the patch passed.
		233m 58s

Subsystem	Report/Notes
Docker	Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#2596
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux aa61d5922da1 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `69f282e`
Default Java	2020-01-14
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/testReport/
Max. process+thread count	3317 (vs. ulimit of 12500)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2596/1/console
versions	git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

lfrancke · 2021-06-23T11:34:25Z

...e-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java


-    final double avgRegionSizeMb = ctx.getAverageRegionSizeMb();
+    final long avgRegionSizeMb = (long) ctx.getAverageRegionSizeMb();
+    if (avgRegionSizeMb < mergeMinRegionSizeMb) {


I know this is an old PR but can you explain while we skip merging if our average region size is low?

Let's say min region size is 2GB but my average region size is 750MB then this would not do anything while we'd like to merge.

I'd have to look back through the original JIRA and PR, but I believe there was a concern that normalizer would merge away the splits made on table creation, between the time when the table was created and when the operator got around to loading it with data. The "minimum table size" configuration was designed to prevent this. This behavior pre-dated HBASE-24419 ; the functionality was preserved when this optimization was implemented.

I personally am not a fan, and prefer the "minimum table age" configuration for handling of this concern.

Thank you Nick for the prompt response. Much appreciated!

This is not really the place for it but I've been working on a backport of the HBase 3 normalizer to an old version over here https://github.com/opencore/hbase-normalizer which includes a few other improvements. The work was triggered because the current Normalizer doesn't consider a minimum size when splitting (unless I'm completely mistaken). For various bad reasons we had tens of thousands of regions with an average size of 7 MB or so. Which means even a 14MB region (twice the average) would be split leading to even more regions etc. so we introduced customizable multipliers and minimum size for splits.

Your context helps. I hope to have the time to contribute these changes back.

I haven't done a JIRA/git audit recently, but the normalizer on master and branch-2 should be identical. branch-2.4 has most of the big improvements that I'm aware of, including support for rate-limiting.

Be advised that its settings for merging are still very coarse. I've been using it recently on large tables containing large regions and find that the 2x-off-average is not practical. It needs more tuning and possibly new settings in order to work well in a large ranges of table topologies.

Depending on how old of an HBase you're targeting, there was a bug where it would always force-merge, which could cause errors in meta. I advise against running normalizer before 2.3.

ndimiduk merged commit b84e2f5 into apache:branch-2 Oct 30, 2020

ndimiduk deleted the 24419-normalizer-multimerge-branch-2 branch October 30, 2020 17:43

lfrancke reviewed Jun 23, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport "HBASE-24419 Normalizer merge plans should consider more than 2 region…" to branch-2 #2596

Backport "HBASE-24419 Normalizer merge plans should consider more than 2 region…" to branch-2 #2596

Uh oh!

ndimiduk commented Oct 29, 2020

Uh oh!

Apache-HBase commented Oct 29, 2020

Uh oh!

Apache-HBase commented Oct 29, 2020

Uh oh!

Apache-HBase commented Oct 30, 2020

Uh oh!

lfrancke Jun 23, 2021

Uh oh!

ndimiduk Jun 23, 2021

Uh oh!

lfrancke Jun 24, 2021

Uh oh!

ndimiduk Jun 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Backport "HBASE-24419 Normalizer merge plans should consider more than 2 region…" to branch-2 #2596

Backport "HBASE-24419 Normalizer merge plans should consider more than 2 region…" to branch-2 #2596

Uh oh!

Conversation

ndimiduk commented Oct 29, 2020

Uh oh!

Apache-HBase commented Oct 29, 2020

Uh oh!

Apache-HBase commented Oct 29, 2020

Uh oh!

Apache-HBase commented Oct 30, 2020

Uh oh!

lfrancke Jun 23, 2021

Choose a reason for hiding this comment

Uh oh!

ndimiduk Jun 23, 2021

Choose a reason for hiding this comment

Uh oh!

lfrancke Jun 24, 2021

Choose a reason for hiding this comment

Uh oh!

ndimiduk Jun 24, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants