Skip to content

HBASE-25126 Add load balance logic in hbase-client to distribute read…#2570

Closed
huaxiangsun wants to merge 1 commit intoapache:HBASE-18070from
huaxiangsun:HBASE-18070.HBASE-25126
Closed

HBASE-25126 Add load balance logic in hbase-client to distribute read…#2570
huaxiangsun wants to merge 1 commit intoapache:HBASE-18070from
huaxiangsun:HBASE-18070.HBASE-25126

Conversation

@huaxiangsun
Copy link
Contributor

… load over meta replica regions

It adds load balance support. With "hbase.meta.replicas.use" set to true, client support meta replica feature.
The existing mode is called HighAvailable, client sends scan request to the primary meta replica region first,
if response is not back within a configured amount of time, it will send scan requests to all meta replica regions and
take the first response. On top of the existing mode, a new mode LoadBalance is added. In this mode, client first
choose a meta replica region to send scan request. If the response is stale, it will send the scan request to the primary
meta region. In this mode, all meta replica regions serve scan requests.

Two new config knobs are added:

  1. hbase.meta.replicas.mode
    Valid value is "HighAvailable" and "LoadBalance".

  2. hbase.meta.replicas.mode.loadbalance.replica.chooser
    Implementation class for MetaReplicaChooser. Only org.apache.hadoop.hbase.client.MetaReplicaLoadBalanceReplicaSimpleChooser.class
    is supported for now, which is the default as well.

… load over meta replica regions

It adds load balance support. With "hbase.meta.replicas.use" set to true, client support meta replica feature.
The existing mode is called HighAvailable, client sends scan request to the primary meta replica region first,
if response is not back within a configured amount of time, it will send scan requests to all meta replica regions and
take the first response. On top of the existing mode, a new mode LoadBalance is added. In this mode, client first
choose a meta replica region to send scan request. If the response is stale, it will send the scan request to the primary
meta region. In this mode, all meta replica regions serve scan requests.

Two new config knobs are added:

1. hbase.meta.replicas.mode
   Valid value is "HighAvailable" and "LoadBalance".

2. hbase.meta.replicas.mode.loadbalance.replica.chooser
   Implementation class for MetaReplicaChooser. Only org.apache.hadoop.hbase.client.MetaReplicaLoadBalanceReplicaSimpleChooser.class
   is supported for now, which is the default as well.
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ HBASE-18070 Compile Tests _
+0 🆗 mvndep 0m 22s Maven dependency ordering for branch
+1 💚 mvninstall 4m 1s HBASE-18070 passed
+1 💚 checkstyle 2m 3s HBASE-18070 passed
+1 💚 spotbugs 3m 47s HBASE-18070 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 45s the patch passed
-0 ⚠️ checkstyle 0m 27s hbase-client: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 12m 24s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
-1 ❌ spotbugs 1m 12s hbase-client generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
42m 52s
Reason Tests
FindBugs module:hbase-client
Should org.apache.hadoop.hbase.client.MetaReplicaLoadBalanceReplicaSimpleChooser$StaleTableCache be a static inner class? At MetaReplicaLoadBalanceReplicaSimpleChooser.java:inner class? At MetaReplicaLoadBalanceReplicaSimpleChooser.java:[lines 111-112]
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2570
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux b51aff078fd9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-18070 / 4e47554
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-client.txt
spotbugs https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-general-check/output/new-spotbugs-hbase-client.html
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-common hbase-client hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high-level feedback @huaxiangsun

*/
enum MetaReplicaMode {
None,
HighAvailable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/HighAvailable/HighlyAvailable/ or Failover or PrimaryThenReplicas.

s/stable/stale/?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the enum have to be named 'MetaReplicaMode' Can it be named ReplicaMode or ReadReplicaClientPolicy ?

When it is named for meta only, it implies this policy only works for meta replica? Is this so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here we 'choose' a policy but then the implementation is elsewhere. Could vary? You provide a 'simple' one here. What if I want to do an involved one? That would be a config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, META_REPLICAS_MODE_LOADBALANCE_REPILCA_CHOOSER(/SELECTOR) is supposed to be the config change for a different implementation.

If it is changed to ReplicaMode, it sounds like this is supported for user tables as well. A ReplicaMode only applies to some special system tables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is suitable for user tables yet. At least for the LoadBalance mode, where we explicitly mention the 'stale' data will impact our policy. For meta table I think whether the data is 'stale' can be detected by our own, but for user tables, only users know how to determine whether the data is stale, which means we need to make the interface be able to get information from users on whether the previous data is stale and I only want to go to primary replica this time. This requires a good design on interface as it will be IA.Public, so I think keep it as meta only and IA.Private is more suitable for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But better to put this enum to a separated file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will do.

@InterfaceAudience.Private
public interface MetaReplicaLoadBalanceReplicaChooser {

void updateCacheOnError(final HRegionLocation loc, final int fromMetaReplicaId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need of the 'final' qualifiers in an Interface at least.

Need doc on these methods because in an Interface describing expectations?

You don't pass cache on updateCacheOnError?

s/chooseReplicaToGo/chooseReplica/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this stuff only for meta? Can it be more generic than meta? Drop the 'meta' prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cache here TableCache or StaleCache maintained in ReplicaSelector? There is no need to pass it in.
Will take care of the rest comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it "meta" is fine at this point, the only other thing we could be balancing requests over is a user table, and there's separate mechanism for that functionality (I haven't looked into why that functionality cannot be reused here, but I presume it pertains to the different means of configuration... which is a bit of a shame).

If we need to use this to spread requests over replicas of other system tables, we can update the class name accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I am going to rename it "Catalog" instead of "Meta", to reflect the fact that it can be used to location lookup service. As for user tables, if LoadBalance mode needs to be supported, current interface already supports it. It is up to application to support this mode.

private final AsyncConnectionImpl conn;

MetaReplicaLoadBalanceReplicaSimpleChooser(final AsyncConnectionImpl conn) {
staleCache = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare the data member and assign at same time rather than declare above and assign here? nit.

Copy link
Contributor Author

@huaxiangsun huaxiangsun Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take care of it.


/** Conf key for enabling meta replication */
public static final String USE_META_REPLICAS = "hbase.meta.replicas.use";
public static final String META_REPLICAS_MODE = "hbase.meta.replicas.mode";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, does it have to be meta exclusive. Can this go somewhere in client package rather than here in the global HConstants? It is a client-only config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you propose @saintstack ? hbase.catalogue.replicas.mode ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hbase.catalog.replicas.mode"? Or we want to include "meta" here so it can only enable meta table. In the future, let's say root table needs to be supported, so another config knob will be added. By this way, configs are separated for both for current "meta" and future system tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought it about again, I think this config knob needs to be specific to meta for now. In the future, if it is going to be turned on for another table, different knob needs to be provided as basically they are for different purpose.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 4s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ HBASE-18070 Compile Tests _
+0 🆗 mvndep 0m 23s Maven dependency ordering for branch
+1 💚 mvninstall 3m 40s HBASE-18070 passed
+1 💚 compile 1m 44s HBASE-18070 passed
+1 💚 shadedjars 6m 32s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 22s HBASE-18070 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 3m 28s the patch passed
+1 💚 compile 1m 46s the patch passed
+1 💚 javac 1m 46s the patch passed
+1 💚 shadedjars 6m 39s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 20s the patch passed
_ Other Tests _
+1 💚 unit 1m 24s hbase-common in the patch passed.
+1 💚 unit 1m 1s hbase-client in the patch passed.
-1 ❌ unit 149m 33s hbase-server in the patch failed.
183m 0s
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #2570
Optional Tests javac javadoc unit shadedjars compile
uname Linux 12298d174364 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-18070 / 4e47554
Default Java 1.8.0_232
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/testReport/
Max. process+thread count 4557 (vs. ulimit of 30000)
modules C: hbase-common hbase-client hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 44s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ HBASE-18070 Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for branch
+1 💚 mvninstall 5m 12s HBASE-18070 passed
+1 💚 compile 2m 24s HBASE-18070 passed
+1 💚 shadedjars 8m 11s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 32s hbase-client in HBASE-18070 failed.
-0 ⚠️ javadoc 0m 19s hbase-common in HBASE-18070 failed.
-0 ⚠️ javadoc 0m 47s hbase-server in HBASE-18070 failed.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 4m 52s the patch passed
+1 💚 compile 2m 25s the patch passed
+1 💚 javac 2m 25s the patch passed
+1 💚 shadedjars 8m 14s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 18s hbase-common in the patch failed.
-0 ⚠️ javadoc 0m 30s hbase-client in the patch failed.
-0 ⚠️ javadoc 0m 47s hbase-server in the patch failed.
_ Other Tests _
+1 💚 unit 1m 49s hbase-common in the patch passed.
+1 💚 unit 1m 17s hbase-client in the patch passed.
-1 ❌ unit 144m 52s hbase-server in the patch failed.
186m 44s
Subsystem Report/Notes
Docker Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2570
Optional Tests javac javadoc unit shadedjars compile
uname Linux 2c97bc4d46aa 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-18070 / 4e47554
Default Java 2020-01-14
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-client.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-common.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-common.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-client.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/testReport/
Max. process+thread count 4210 (vs. ulimit of 30000)
modules C: hbase-common hbase-client hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2570/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Oct 21, 2020

On a high level design, we used to have a hbase.meta.replicas.use config to control whether to make use meta replicas at client side. Do we want to deprecate this config and let our new configs rule here? Just asking.

For me, I think the scope of hbase.meta.replicas.use is too wide, as it will impact all the client operations on meta(not sure if we have hacked all the places but at least it is designed to). After reviewing the PR, I think our approach here is only for the locator related logic right? This is also what I expect. In general, I think there are 3 ways we access meta table at client side:

  1. Locator related logic. This is the most critical path at client side and also makes the most pressure on meta table.
  2. Admin related logic. We have delegated most of the operaations through master but there are still some places we will access meta directly. But admin operation is expected to be low frequent so I do not think we need to deal with it here.
  3. Users access meta table directly by their own. This is controlled by user written code so I do not think we need to deal with it either, users should take care of it by their own.

I think only 1 is what we really care here, so I suggest that, we just narrow the scope of the newly introduced configs to be locator only(maybe by adding 'locator' keyword in the config name), and consider it first before hbase.meta.replicas.use in the locator related logic. So users do not need to set hbase.meta.replicas.use to true to enable this feature, just select the LB mode. If the new config is disabled(set to None maybe), then we honor the old hbase.meta.replicas.use config.

In general, I think the abstraction and trick here are both good. Setting the replica id directly on Query is a straight forward way to archive our goal here, and the chooser or selector is also a good abstraction. The only concern is how to implement the 'fallback to primary' logic as we need to pass down from the rpc retrying caller of the actual exception type, anyway, this can be done later.

And I suggest we just make this PR against master branch, it is only client side change and whether we have implement the meta region replication should not impact the code. Why I suggest this is that, the code for master and branch-2 will be different, as on branch-2, the sync client has its own logic to do region locating, which is not well constructed I believe(we expose a bunch of locating methods in ClusterConnection interface and use it everywhere). So if we want to include this feature in 2.4.0, we'd better make this PR against master, and also backport it to branch-2 ASAP.

Thanks.

*/
enum MetaReplicaMode {
None,
HighAvailable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is suitable for user tables yet. At least for the LoadBalance mode, where we explicitly mention the 'stale' data will impact our policy. For meta table I think whether the data is 'stale' can be detected by our own, but for user tables, only users know how to determine whether the data is stale, which means we need to make the interface be able to get information from users on whether the previous data is stale and I only want to go to primary replica this time. This requires a good design on interface as it will be IA.Public, so I think keep it as meta only and IA.Private is more suitable for now.

*/
enum MetaReplicaMode {
None,
HighAvailable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But better to put this enum to a separated file?

@huaxiangsun
Copy link
Contributor Author

On a high level design, we used to have a hbase.meta.replicas.use config to control whether to make use meta replicas at client side. Do we want to deprecate this config and let our new configs rule here? Just asking.

For me, original thought is that hbase.meta.replicas.use is an existing config, so it still needs to be honored. A new "mode" config is added to support LoadBalance(new) and HighlyAvailable(the existing mode). If we want to deprecate this config, then, it makes sense.

For me, I think the scope of hbase.meta.replicas.use is too wide, as it will impact all the client operations on meta(not sure if we have hacked all the places but at least it is designed to). After reviewing the PR, I think our approach here is only for the locator related logic right? This is also what I expect. In general, I think there are 3 ways we access meta table at client side:

  1. Locator related logic. This is the most critical path at client side and also makes the most pressure on meta table.
  2. Admin related logic. We have delegated most of the operaations through master but there are still some places we will access meta directly. But admin operation is expected to be low frequent so I do not think we need to deal with it here.
  3. Users access meta table directly by their own. This is controlled by user written code so I do not think we need to deal with it either, users should take care of it by their own.

Agreed.

I think only 1 is what we really care here, so I suggest that, we just narrow the scope of the newly introduced configs to be locator only(maybe by adding 'locator' keyword in the config name), and consider it first before hbase.meta.replicas.use in the locator related logic. So users do not need to set hbase.meta.replicas.use to true to enable this feature, just select the LB mode. If the new config is disabled(set to None maybe), then we honor the old hbase.meta.replicas.use config.

Ok.

In general, I think the abstraction and trick here are both good. Setting the replica id directly on Query is a straight forward way to archive our goal here, and the chooser or selector is also a good abstraction. The only concern is how to implement the 'fallback to primary' logic as we need to pass down from the rpc retrying caller of the actual exception type, anyway, this can be done later.

Do you think "fallback to primary" logic needs to be passed down from rpc retrying caller? Then it needs to be aware of this feature and needs to maintain some state. Was trying to avoid it. Yeah, this part can be optimized later.

And I suggest we just make this PR against master branch, it is only client side change and whether we have implement the meta region replication should not impact the code. Why I suggest this is that, the code for master and branch-2 will be different, as on branch-2, the sync client has its own logic to do region locating, which is not well constructed I believe(we expose a bunch of locating methods in ClusterConnection interface and use it everywhere). So if we want to include this feature in 2.4.0, we'd better make this PR against master, and also backport it to branch-2 ASAP.

Yeah, the client side change is quite independent. It will work with today's meta replica, only nit is that there will be more stale data.
I will put a new patch against master. All implementation will be renamed as Catalog instead of meta, as it can be applied to similar location lookup service. The config will remain same as here it applies to meta only.

Thanks for the feedbacks, Duo.

@huaxiangsun
Copy link
Contributor Author

Forgot to mention that patch against master, the only nit is testing cases as there is no meta replication support in master, will check the tests and see how much test can be ported.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are mostly on style and code structure. Only a couple questions on the implementation details.

+ " but there is no meta replica configured");
this.metaReplicaChooser = null;
} else {
String metaReplicaChooserImpl = conn.getConfiguration().get(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this ChooserImpl instantiation into a factory method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@Override
public void onError(Throwable error) {
// TODO: if it fails, with meta replica load balance, it may try with another meta
// replica. This improvement will be done later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a Jira ID for this TODO?

import org.junit.experimental.categories.Category;

@Category({ MediumTests.class, ClientTests.class })
public class TestAsyncNonMetaRegionLocatorWithMetaReplicaLoadBalance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no, please not test inheritance :'(

Can this be done as a Parameterized test instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check it. Trying to make sure that with Parameterized test, minicluster is going to be recreated (once I was there, but forgot the details).

*
* @see TestRegionReplicaReplicationEndpoint
*/
@Category({LargeTests.class})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yours, but I think this can be be added to the MasterTests category. Or RegionServerTests.. I'm not even clear on which one this is at this point. Maybe need a category for CatalogueTests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add "RegionServerTests" tag.

return !containsScanner.advance() && matches >= 1 && count >= matches && count == cells.size();
}

private void doNGets(final Table table, final byte[][] keys) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for helper methods who's purpose is to make assertions, I like to prefix their names with assert. So, assertDoNGets, assertPrimaryNoChangeReplicaIncrease, &c.

Arrays.copyOfRange(HBaseTestingUtility.KEYS, 1, HBaseTestingUtility.KEYS.length))) {
verifyReplication(TableName.META_TABLE_NAME, NB_SERVERS, getMetaCells(table.getName()));
try (Table table = HTU
.createTable(TableName.valueOf(this.name.getMethodName() + "_0"), HConstants.CATALOG_FAMILY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, there's a TableNameTestRule for generating the TableName from the running method. I guess it doesn't help where you're adding a suffix though...


// Wait until moveRegion cache timeout.
while (destRs.getMovedRegion(userRegion.getRegionInfo().getEncodedName()) != null) {
Thread.sleep(1000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use waitFor with a maximum time to wait.

@Apache9
Copy link
Contributor

Apache9 commented Oct 22, 2020

Do you think "fallback to primary" logic needs to be passed down from rpc retrying caller? Then it needs to be aware of this feature and needs to maintain some state. Was trying to avoid it. Yeah, this part can be optimized later.

If you want to make a decision based on whether the previous returned location is stale, then you need to pass something down from the rpc retrying caller to the selector, as only in rpc retrying caller we can get the exception we want. Either by adding a parameter as I proposed above to not go to secondary replicas, or as your current solution, adding a stale cache in the selector and you need to pass down the exception and the related location.
My concern here is that, the current 'simple' algorithm may not work well for some cases as my mentioned above. I suggest that we remove this part of code from the patch here(just do random selecting, without other consideration) to get this patch in quickly, and then open another issue to discuss the suitable algorithm, or maybe multiple algorithms.

Thanks.

@huaxiangsun
Copy link
Contributor Author

Do you think "fallback to primary" logic needs to be passed down from rpc retrying caller? Then it needs to be aware of this feature and needs to maintain some state. Was trying to avoid it. Yeah, this part can be optimized later.

If you want to make a decision based on whether the previous returned location is stale, then you need to pass something down from the rpc retrying caller to the selector, as only in rpc retrying caller we can get the exception we want. Either by adding a parameter as I proposed above to not go to secondary replicas, or as your current solution, adding a stale cache in the selector and you need to pass down the exception and the related location.
My concern here is that, the current 'simple' algorithm may not work well for some cases as my mentioned above. I suggest that we remove this part of code from the patch here(just do random selecting, without other consideration) to get this patch in quickly, and then open another issue to discuss the suitable algorithm, or maybe multiple algorithms.

Thank Duo.

I think I am still not 100% clear here. Let me try to explain my understanding of what you thought before moving forward.

  1. My current understanding of rpc retry caller and AsyncRegionLocator.

If a location is stale, rpc retry caller will detect it and pass down exception to AsyncRegionLocator(AsyncNonMetaRegionLocator), AsyncNonMetaRegionLocator updates/cleans up table cache accordingly.

  1. In the next retry, if there is no table cache entry for the key, it will call locateInMeta() to fetch location from meta.

In the above flow, rpc retry caller layer is the one detecting stale location, it is up to AsyncRegionLocator layer to handle all the logic.

Similarly, meta selector interface tries to do similar and build the state it needs to make decision inside itself.

As for different algorithm, i.e, the case you described above, a concrete example, lets there there are 5 meta replicas, if the location from meta replica 1 stale, it may try other meta replica. We can have a different implementation of selector,
AdvancedSelector, which can implement such algorithm based on the state maintained in selector. I.e, the staleCache entry has info to indicate fromMetaReplicaId.

Thought that the selector interface is designed for this purpose, maybe I miss something here.

The initial implementation, I kept fromMetaReplica in tableCache entry. When the patch was out for review, I dropped this change, as the current simple algorithm really does not need this info. The select interface keeps fromMetaReplicaId for future enhancement so when it is needed, at least the interface does not need to be changed.

I can drop fromMetaReplicaId in the current simple implementation as it is not needed anyway.

Maybe you were referring to add a flag in locateInMeta() method? If the flag says going with primary only, it will bypass all these logic and goes only to primary region so it will give upper layer more control.

Please share more thoughts, thanks.

@huaxiangsun
Copy link
Contributor Author

Will post a new patch based on master, the major change in the master branch is that meta replica # is part of meta table, test and selector initialization needs to be changed accordingly.

@huaxiangsun huaxiangsun closed this Nov 3, 2020
@huaxiangsun
Copy link
Contributor Author

This request is replaced by a new request against master branch, will do a backport once the master patch is reviewed and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments