HBASE-25126 Add load balance logic in hbase-client to distribute read… by huaxiangsun · Pull Request #2570 · apache/hbase

huaxiangsun · 2020-10-20T16:22:34Z

… load over meta replica regions

It adds load balance support. With "hbase.meta.replicas.use" set to true, client support meta replica feature.
The existing mode is called HighAvailable, client sends scan request to the primary meta replica region first,
if response is not back within a configured amount of time, it will send scan requests to all meta replica regions and
take the first response. On top of the existing mode, a new mode LoadBalance is added. In this mode, client first
choose a meta replica region to send scan request. If the response is stale, it will send the scan request to the primary
meta region. In this mode, all meta replica regions serve scan requests.

Two new config knobs are added:

hbase.meta.replicas.mode
Valid value is "HighAvailable" and "LoadBalance".
hbase.meta.replicas.mode.loadbalance.replica.chooser
Implementation class for MetaReplicaChooser. Only org.apache.hadoop.hbase.client.MetaReplicaLoadBalanceReplicaSimpleChooser.class
is supported for now, which is the default as well.

… load over meta replica regions It adds load balance support. With "hbase.meta.replicas.use" set to true, client support meta replica feature. The existing mode is called HighAvailable, client sends scan request to the primary meta replica region first, if response is not back within a configured amount of time, it will send scan requests to all meta replica regions and take the first response. On top of the existing mode, a new mode LoadBalance is added. In this mode, client first choose a meta replica region to send scan request. If the response is stale, it will send the scan request to the primary meta region. In this mode, all meta replica regions serve scan requests. Two new config knobs are added: 1. hbase.meta.replicas.mode Valid value is "HighAvailable" and "LoadBalance". 2. hbase.meta.replicas.mode.loadbalance.replica.chooser Implementation class for MetaReplicaChooser. Only org.apache.hadoop.hbase.client.MetaReplicaLoadBalanceReplicaSimpleChooser.class is supported for now, which is the default as well.

Apache-HBase · 2020-10-20T17:06:11Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 8s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ HBASE-18070 Compile Tests _
+0 🆗	mvndep	0m 22s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 1s	HBASE-18070 passed
+1 💚	checkstyle	2m 3s	HBASE-18070 passed
+1 💚	spotbugs	3m 47s	HBASE-18070 passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 45s	the patch passed
-0 ⚠️	checkstyle	0m 27s	hbase-client: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	12m 24s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
-1 ❌	spotbugs	1m 12s	hbase-client generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
		_ Other Tests _
+1 💚	asflicense	0m 33s	The patch does not generate ASF License warnings.
		42m 52s

Reason	Tests
FindBugs	module:hbase-client
	Should org.apache.hadoop.hbase.client.MetaReplicaLoadBalanceReplicaSimpleChooser$StaleTableCache be a static inner class? At MetaReplicaLoadBalanceReplicaSimpleChooser.java:inner class? At MetaReplicaLoadBalanceReplicaSimpleChooser.java:[lines 111-112]

Subsystem	Report/Notes
Docker	Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#2570
Optional Tests	dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname	Linux b51aff078fd9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-18070 / `4e47554`
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-client.txt
spotbugs	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-general-check/output/new-spotbugs-hbase-client.html
Max. process+thread count	84 (vs. ulimit of 30000)
modules	C: hbase-common hbase-client hbase-server U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/console
versions	git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

saintstack

Some high-level feedback @huaxiangsun

saintstack · 2020-10-20T18:35:41Z

...lient/src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaChooser.java

+ */
+enum MetaReplicaMode {
+  None,
+  HighAvailable,


s/HighAvailable/HighlyAvailable/ or Failover or PrimaryThenReplicas.

s/stable/stale/?

Does the enum have to be named 'MetaReplicaMode' Can it be named ReplicaMode or ReadReplicaClientPolicy ?

When it is named for meta only, it implies this policy only works for meta replica? Is this so?

So, here we 'choose' a policy but then the implementation is elsewhere. Could vary? You provide a 'simple' one here. What if I want to do an involved one? That would be a config?

Yeah, META_REPLICAS_MODE_LOADBALANCE_REPILCA_CHOOSER(/SELECTOR) is supposed to be the config change for a different implementation.

If it is changed to ReplicaMode, it sounds like this is supported for user tables as well. A ReplicaMode only applies to some special system tables?

I do not think this is suitable for user tables yet. At least for the LoadBalance mode, where we explicitly mention the 'stale' data will impact our policy. For meta table I think whether the data is 'stale' can be detected by our own, but for user tables, only users know how to determine whether the data is stale, which means we need to make the interface be able to get information from users on whether the previous data is stale and I only want to go to primary replica this time. This requires a good design on interface as it will be IA.Public, so I think keep it as meta only and IA.Private is more suitable for now.

But better to put this enum to a separated file?

Yeah, will do.

saintstack · 2020-10-20T18:51:35Z

...lient/src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaChooser.java

+@InterfaceAudience.Private
+public interface MetaReplicaLoadBalanceReplicaChooser {
+
+  void updateCacheOnError(final HRegionLocation loc, final int fromMetaReplicaId);


No need of the 'final' qualifiers in an Interface at least.

Need doc on these methods because in an Interface describing expectations?

You don't pass cache on updateCacheOnError?

s/chooseReplicaToGo/chooseReplica/ ?

Is this stuff only for meta? Can it be more generic than meta? Drop the 'meta' prefix?

Is the cache here TableCache or StaleCache maintained in ReplicaSelector? There is no need to pass it in.
Will take care of the rest comments.

Calling it "meta" is fine at this point, the only other thing we could be balancing requests over is a user table, and there's separate mechanism for that functionality (I haven't looked into why that functionality cannot be reused here, but I presume it pertains to the different means of configuration... which is a bit of a shame).

If we need to use this to spread requests over replicas of other system tables, we can update the class name accordingly.

Agreed. I am going to rename it "Catalog" instead of "Meta", to reflect the fact that it can be used to location lookup service. As for user tables, if LoadBalance mode needs to be supported, current interface already supports it. It is up to application to support this mode.

...lient/src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaChooser.java

...src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaSimpleChooser.java

saintstack · 2020-10-20T19:04:53Z

...src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaSimpleChooser.java

+  private final AsyncConnectionImpl conn;
+
+  MetaReplicaLoadBalanceReplicaSimpleChooser(final AsyncConnectionImpl conn) {
+    staleCache = new ConcurrentHashMap<>();


Declare the data member and assign at same time rather than declare above and assign here? nit.

Will take care of it.

...src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaSimpleChooser.java

saintstack · 2020-10-20T19:13:02Z

hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java


  /** Conf key for enabling meta replication */
  public static final String USE_META_REPLICAS = "hbase.meta.replicas.use";
+  public static final String META_REPLICAS_MODE = "hbase.meta.replicas.mode";


Yeah, does it have to be meta exclusive. Can this go somewhere in client package rather than here in the global HConstants? It is a client-only config?

What do you propose @saintstack ? hbase.catalogue.replicas.mode ?

"hbase.catalog.replicas.mode"? Or we want to include "meta" here so it can only enable meta table. In the future, let's say root table needs to be supported, so another config knob will be added. By this way, configs are separated for both for current "meta" and future system tables.

Thought it about again, I think this config knob needs to be specific to meta for now. In the future, if it is going to be turned on for another table, different knob needs to be provided as basically they are for different purpose.

Apache-HBase · 2020-10-20T19:26:18Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 4s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ HBASE-18070 Compile Tests _
+0 🆗	mvndep	0m 23s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 40s	HBASE-18070 passed
+1 💚	compile	1m 44s	HBASE-18070 passed
+1 💚	shadedjars	6m 32s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	1m 22s	HBASE-18070 passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 28s	the patch passed
+1 💚	compile	1m 46s	the patch passed
+1 💚	javac	1m 46s	the patch passed
+1 💚	shadedjars	6m 39s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	1m 20s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 24s	hbase-common in the patch passed.
+1 💚	unit	1m 1s	hbase-client in the patch passed.
-1 ❌	unit	149m 33s	hbase-server in the patch failed.
		183m 0s

Subsystem	Report/Notes
Docker	Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#2570
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 12298d174364 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-18070 / `4e47554`
Default Java	1.8.0_232
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/testReport/
Max. process+thread count	4557 (vs. ulimit of 30000)
modules	C: hbase-common hbase-client hbase-server U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/console
versions	git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2020-10-20T19:30:05Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 44s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ HBASE-18070 Compile Tests _
+0 🆗	mvndep	0m 26s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 12s	HBASE-18070 passed
+1 💚	compile	2m 24s	HBASE-18070 passed
+1 💚	shadedjars	8m 11s	branch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 32s	hbase-client in HBASE-18070 failed.
-0 ⚠️	javadoc	0m 19s	hbase-common in HBASE-18070 failed.
-0 ⚠️	javadoc	0m 47s	hbase-server in HBASE-18070 failed.
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 17s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 52s	the patch passed
+1 💚	compile	2m 25s	the patch passed
+1 💚	javac	2m 25s	the patch passed
+1 💚	shadedjars	8m 14s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 18s	hbase-common in the patch failed.
-0 ⚠️	javadoc	0m 30s	hbase-client in the patch failed.
-0 ⚠️	javadoc	0m 47s	hbase-server in the patch failed.
		_ Other Tests _
+1 💚	unit	1m 49s	hbase-common in the patch passed.
+1 💚	unit	1m 17s	hbase-client in the patch passed.
-1 ❌	unit	144m 52s	hbase-server in the patch failed.
		186m 44s

Subsystem	Report/Notes
Docker	Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#2570
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 2c97bc4d46aa 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-18070 / `4e47554`
Default Java	2020-01-14
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-client.txt
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-common.txt
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-common.txt
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-client.txt
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/testReport/
Max. process+thread count	4210 (vs. ulimit of 30000)
modules	C: hbase-common hbase-client hbase-server U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/console
versions	git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache9 · 2020-10-21T02:50:20Z

On a high level design, we used to have a hbase.meta.replicas.use config to control whether to make use meta replicas at client side. Do we want to deprecate this config and let our new configs rule here? Just asking.

For me, I think the scope of hbase.meta.replicas.use is too wide, as it will impact all the client operations on meta(not sure if we have hacked all the places but at least it is designed to). After reviewing the PR, I think our approach here is only for the locator related logic right? This is also what I expect. In general, I think there are 3 ways we access meta table at client side:

Locator related logic. This is the most critical path at client side and also makes the most pressure on meta table.
Admin related logic. We have delegated most of the operaations through master but there are still some places we will access meta directly. But admin operation is expected to be low frequent so I do not think we need to deal with it here.
Users access meta table directly by their own. This is controlled by user written code so I do not think we need to deal with it either, users should take care of it by their own.

I think only 1 is what we really care here, so I suggest that, we just narrow the scope of the newly introduced configs to be locator only(maybe by adding 'locator' keyword in the config name), and consider it first before hbase.meta.replicas.use in the locator related logic. So users do not need to set hbase.meta.replicas.use to true to enable this feature, just select the LB mode. If the new config is disabled(set to None maybe), then we honor the old hbase.meta.replicas.use config.

In general, I think the abstraction and trick here are both good. Setting the replica id directly on Query is a straight forward way to archive our goal here, and the chooser or selector is also a good abstraction. The only concern is how to implement the 'fallback to primary' logic as we need to pass down from the rpc retrying caller of the actual exception type, anyway, this can be done later.

And I suggest we just make this PR against master branch, it is only client side change and whether we have implement the meta region replication should not impact the code. Why I suggest this is that, the code for master and branch-2 will be different, as on branch-2, the sync client has its own logic to do region locating, which is not well constructed I believe(we expose a bunch of locating methods in ClusterConnection interface and use it everywhere). So if we want to include this feature in 2.4.0, we'd better make this PR against master, and also backport it to branch-2 ASAP.

Thanks.

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncConnectionImpl.java

Apache9 · 2020-10-21T03:00:48Z

...lient/src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaChooser.java

+ */
+enum MetaReplicaMode {
+  None,
+  HighAvailable,


I do not think this is suitable for user tables yet. At least for the LoadBalance mode, where we explicitly mention the 'stale' data will impact our policy. For meta table I think whether the data is 'stale' can be detected by our own, but for user tables, only users know how to determine whether the data is stale, which means we need to make the interface be able to get information from users on whether the previous data is stale and I only want to go to primary replica this time. This requires a good design on interface as it will be IA.Public, so I think keep it as meta only and IA.Private is more suitable for now.

Apache9 · 2020-10-21T03:01:09Z

...lient/src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaChooser.java

+ */
+enum MetaReplicaMode {
+  None,
+  HighAvailable,


But better to put this enum to a separated file?

...src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaSimpleChooser.java

huaxiangsun · 2020-10-21T19:38:10Z

On a high level design, we used to have a hbase.meta.replicas.use config to control whether to make use meta replicas at client side. Do we want to deprecate this config and let our new configs rule here? Just asking.

For me, original thought is that hbase.meta.replicas.use is an existing config, so it still needs to be honored. A new "mode" config is added to support LoadBalance(new) and HighlyAvailable(the existing mode). If we want to deprecate this config, then, it makes sense.

For me, I think the scope of hbase.meta.replicas.use is too wide, as it will impact all the client operations on meta(not sure if we have hacked all the places but at least it is designed to). After reviewing the PR, I think our approach here is only for the locator related logic right? This is also what I expect. In general, I think there are 3 ways we access meta table at client side:

Locator related logic. This is the most critical path at client side and also makes the most pressure on meta table.

Admin related logic. We have delegated most of the operaations through master but there are still some places we will access meta directly. But admin operation is expected to be low frequent so I do not think we need to deal with it here.

Users access meta table directly by their own. This is controlled by user written code so I do not think we need to deal with it either, users should take care of it by their own.

Agreed.

I think only 1 is what we really care here, so I suggest that, we just narrow the scope of the newly introduced configs to be locator only(maybe by adding 'locator' keyword in the config name), and consider it first before hbase.meta.replicas.use in the locator related logic. So users do not need to set hbase.meta.replicas.use to true to enable this feature, just select the LB mode. If the new config is disabled(set to None maybe), then we honor the old hbase.meta.replicas.use config.

Ok.

In general, I think the abstraction and trick here are both good. Setting the replica id directly on Query is a straight forward way to archive our goal here, and the chooser or selector is also a good abstraction. The only concern is how to implement the 'fallback to primary' logic as we need to pass down from the rpc retrying caller of the actual exception type, anyway, this can be done later.

Do you think "fallback to primary" logic needs to be passed down from rpc retrying caller? Then it needs to be aware of this feature and needs to maintain some state. Was trying to avoid it. Yeah, this part can be optimized later.

And I suggest we just make this PR against master branch, it is only client side change and whether we have implement the meta region replication should not impact the code. Why I suggest this is that, the code for master and branch-2 will be different, as on branch-2, the sync client has its own logic to do region locating, which is not well constructed I believe(we expose a bunch of locating methods in ClusterConnection interface and use it everywhere). So if we want to include this feature in 2.4.0, we'd better make this PR against master, and also backport it to branch-2 ASAP.

Yeah, the client side change is quite independent. It will work with today's meta replica, only nit is that there will be more stale data.
I will put a new patch against master. All implementation will be renamed as Catalog instead of meta, as it can be applied to similar location lookup service. The config will remain same as here it applies to meta only.

Thanks for the feedbacks, Duo.

huaxiangsun · 2020-10-21T19:46:09Z

Forgot to mention that patch against master, the only nit is testing cases as there is no meta replication support in master, will check the tests and see how much test can be ported.

ndimiduk

My comments are mostly on style and code structure. Only a couple questions on the implementation details.

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncConnectionImpl.java

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

ndimiduk · 2020-10-21T21:20:47Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

+            + " but there is no meta replica configured");
+          this.metaReplicaChooser = null;
+        } else {
+          String metaReplicaChooserImpl = conn.getConfiguration().get(


Please move this ChooserImpl instantiation into a factory method.

ndimiduk · 2020-10-21T21:21:41Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

      @Override
      public void onError(Throwable error) {
+        // TODO: if it fails, with meta replica load balance, it may try with another meta
+        // replica. This improvement will be done later.


Do you have a Jira ID for this TODO?

ndimiduk · 2020-10-21T22:22:13Z

.../org/apache/hadoop/hbase/client/TestAsyncNonMetaRegionLocatorWithMetaReplicaLoadBalance.java

+import org.junit.experimental.categories.Category;
+
+@Category({ MediumTests.class, ClientTests.class })
+public class TestAsyncNonMetaRegionLocatorWithMetaReplicaLoadBalance


oh no, please not test inheritance :'(

Can this be done as a Parameterized test instead?

Let me check it. Trying to make sure that with Parameterized test, minicluster is going to be recreated (once I was there, but forgot the details).

ndimiduk · 2020-10-21T22:24:42Z

...g/apache/hadoop/hbase/replication/regionserver/TestMetaRegionReplicaReplicationEndpoint.java

+ *
 * @see TestRegionReplicaReplicationEndpoint
 */
 @Category({LargeTests.class})


Not yours, but I think this can be be added to the MasterTests category. Or RegionServerTests.. I'm not even clear on which one this is at this point. Maybe need a category for CatalogueTests.

will add "RegionServerTests" tag.

ndimiduk · 2020-10-21T22:29:04Z

...g/apache/hadoop/hbase/replication/regionserver/TestMetaRegionReplicaReplicationEndpoint.java

    return !containsScanner.advance() && matches >= 1 && count >= matches && count == cells.size();
  }
+
+  private void doNGets(final Table table, final byte[][] keys) throws Exception {


nit: for helper methods who's purpose is to make assertions, I like to prefix their names with assert. So, assertDoNGets, assertPrimaryNoChangeReplicaIncrease, &c.

ndimiduk · 2020-10-21T22:30:40Z

...g/apache/hadoop/hbase/replication/regionserver/TestMetaRegionReplicaReplicationEndpoint.java

-        Arrays.copyOfRange(HBaseTestingUtility.KEYS, 1, HBaseTestingUtility.KEYS.length)))  {
-      verifyReplication(TableName.META_TABLE_NAME, NB_SERVERS, getMetaCells(table.getName()));
+    try (Table table = HTU
+      .createTable(TableName.valueOf(this.name.getMethodName() + "_0"), HConstants.CATALOG_FAMILY,


FYI, there's a TableNameTestRule for generating the TableName from the running method. I guess it doesn't help where you're adding a suffix though...

ndimiduk · 2020-10-21T22:31:15Z

...g/apache/hadoop/hbase/replication/regionserver/TestMetaRegionReplicaReplicationEndpoint.java

+
+      // Wait until moveRegion cache timeout.
+      while (destRs.getMovedRegion(userRegion.getRegionInfo().getEncodedName()) != null) {
+        Thread.sleep(1000);


please use waitFor with a maximum time to wait.

Apache9 · 2020-10-22T00:33:22Z

Do you think "fallback to primary" logic needs to be passed down from rpc retrying caller? Then it needs to be aware of this feature and needs to maintain some state. Was trying to avoid it. Yeah, this part can be optimized later.

If you want to make a decision based on whether the previous returned location is stale, then you need to pass something down from the rpc retrying caller to the selector, as only in rpc retrying caller we can get the exception we want. Either by adding a parameter as I proposed above to not go to secondary replicas, or as your current solution, adding a stale cache in the selector and you need to pass down the exception and the related location.
My concern here is that, the current 'simple' algorithm may not work well for some cases as my mentioned above. I suggest that we remove this part of code from the patch here(just do random selecting, without other consideration) to get this patch in quickly, and then open another issue to discuss the suitable algorithm, or maybe multiple algorithms.

Thanks.

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

huaxiangsun · 2020-10-22T21:03:00Z

Do you think "fallback to primary" logic needs to be passed down from rpc retrying caller? Then it needs to be aware of this feature and needs to maintain some state. Was trying to avoid it. Yeah, this part can be optimized later.

If you want to make a decision based on whether the previous returned location is stale, then you need to pass something down from the rpc retrying caller to the selector, as only in rpc retrying caller we can get the exception we want. Either by adding a parameter as I proposed above to not go to secondary replicas, or as your current solution, adding a stale cache in the selector and you need to pass down the exception and the related location.
My concern here is that, the current 'simple' algorithm may not work well for some cases as my mentioned above. I suggest that we remove this part of code from the patch here(just do random selecting, without other consideration) to get this patch in quickly, and then open another issue to discuss the suitable algorithm, or maybe multiple algorithms.

Thank Duo.

I think I am still not 100% clear here. Let me try to explain my understanding of what you thought before moving forward.

My current understanding of rpc retry caller and AsyncRegionLocator.

If a location is stale, rpc retry caller will detect it and pass down exception to AsyncRegionLocator(AsyncNonMetaRegionLocator), AsyncNonMetaRegionLocator updates/cleans up table cache accordingly.

In the next retry, if there is no table cache entry for the key, it will call locateInMeta() to fetch location from meta.

In the above flow, rpc retry caller layer is the one detecting stale location, it is up to AsyncRegionLocator layer to handle all the logic.

Similarly, meta selector interface tries to do similar and build the state it needs to make decision inside itself.

As for different algorithm, i.e, the case you described above, a concrete example, lets there there are 5 meta replicas, if the location from meta replica 1 stale, it may try other meta replica. We can have a different implementation of selector,
AdvancedSelector, which can implement such algorithm based on the state maintained in selector. I.e, the staleCache entry has info to indicate fromMetaReplicaId.

Thought that the selector interface is designed for this purpose, maybe I miss something here.

The initial implementation, I kept fromMetaReplica in tableCache entry. When the patch was out for review, I dropped this change, as the current simple algorithm really does not need this info. The select interface keeps fromMetaReplicaId for future enhancement so when it is needed, at least the interface does not need to be changed.

I can drop fromMetaReplicaId in the current simple implementation as it is not needed anyway.

Maybe you were referring to add a flag in locateInMeta() method? If the flag says going with primary only, it will bypass all these logic and goes only to primary region so it will give upper layer more control.

Please share more thoughts, thanks.

huaxiangsun · 2020-10-26T02:23:43Z

Will post a new patch based on master, the major change in the master branch is that meta replica # is part of meta table, test and selector initialization needs to be changed accordingly.

huaxiangsun · 2020-11-03T00:12:09Z

This request is replaced by a new request against master branch, will do a backport once the master patch is reviewed and merged.

huaxiangsun requested review from Apache9, apurtell, ndimiduk and saintstack October 20, 2020 16:22

saintstack reviewed Oct 20, 2020

View reviewed changes

Apache9 requested changes Oct 21, 2020

View reviewed changes

ndimiduk requested changes Oct 21, 2020

View reviewed changes

huaxiangsun commented Oct 22, 2020

View reviewed changes

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java Show resolved Hide resolved

huaxiangsun closed this Nov 3, 2020

Conversation

huaxiangsun commented Oct 20, 2020

Uh oh!

Apache-HBase commented Oct 20, 2020

Uh oh!

saintstack left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huaxiangsun Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Oct 20, 2020

Uh oh!

Apache-HBase commented Oct 20, 2020

Uh oh!

Apache9 commented Oct 21, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

huaxiangsun commented Oct 21, 2020

Uh oh!

huaxiangsun commented Oct 21, 2020

Uh oh!

ndimiduk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huaxiangsun Oct 20, 2020 •

edited

Loading