HBASE-25126 Add load balance logic in hbase-client to distribute read…#2570
HBASE-25126 Add load balance logic in hbase-client to distribute read…#2570huaxiangsun wants to merge 1 commit intoapache:HBASE-18070from
Conversation
… load over meta replica regions It adds load balance support. With "hbase.meta.replicas.use" set to true, client support meta replica feature. The existing mode is called HighAvailable, client sends scan request to the primary meta replica region first, if response is not back within a configured amount of time, it will send scan requests to all meta replica regions and take the first response. On top of the existing mode, a new mode LoadBalance is added. In this mode, client first choose a meta replica region to send scan request. If the response is stale, it will send the scan request to the primary meta region. In this mode, all meta replica regions serve scan requests. Two new config knobs are added: 1. hbase.meta.replicas.mode Valid value is "HighAvailable" and "LoadBalance". 2. hbase.meta.replicas.mode.loadbalance.replica.chooser Implementation class for MetaReplicaChooser. Only org.apache.hadoop.hbase.client.MetaReplicaLoadBalanceReplicaSimpleChooser.class is supported for now, which is the default as well.
|
💔 -1 overall
This message was automatically generated. |
saintstack
left a comment
There was a problem hiding this comment.
Some high-level feedback @huaxiangsun
| */ | ||
| enum MetaReplicaMode { | ||
| None, | ||
| HighAvailable, |
There was a problem hiding this comment.
s/HighAvailable/HighlyAvailable/ or Failover or PrimaryThenReplicas.
s/stable/stale/?
There was a problem hiding this comment.
Does the enum have to be named 'MetaReplicaMode' Can it be named ReplicaMode or ReadReplicaClientPolicy ?
When it is named for meta only, it implies this policy only works for meta replica? Is this so?
There was a problem hiding this comment.
So, here we 'choose' a policy but then the implementation is elsewhere. Could vary? You provide a 'simple' one here. What if I want to do an involved one? That would be a config?
There was a problem hiding this comment.
Yeah, META_REPLICAS_MODE_LOADBALANCE_REPILCA_CHOOSER(/SELECTOR) is supposed to be the config change for a different implementation.
If it is changed to ReplicaMode, it sounds like this is supported for user tables as well. A ReplicaMode only applies to some special system tables?
There was a problem hiding this comment.
I do not think this is suitable for user tables yet. At least for the LoadBalance mode, where we explicitly mention the 'stale' data will impact our policy. For meta table I think whether the data is 'stale' can be detected by our own, but for user tables, only users know how to determine whether the data is stale, which means we need to make the interface be able to get information from users on whether the previous data is stale and I only want to go to primary replica this time. This requires a good design on interface as it will be IA.Public, so I think keep it as meta only and IA.Private is more suitable for now.
There was a problem hiding this comment.
But better to put this enum to a separated file?
| @InterfaceAudience.Private | ||
| public interface MetaReplicaLoadBalanceReplicaChooser { | ||
|
|
||
| void updateCacheOnError(final HRegionLocation loc, final int fromMetaReplicaId); |
There was a problem hiding this comment.
No need of the 'final' qualifiers in an Interface at least.
Need doc on these methods because in an Interface describing expectations?
You don't pass cache on updateCacheOnError?
s/chooseReplicaToGo/chooseReplica/ ?
There was a problem hiding this comment.
Is this stuff only for meta? Can it be more generic than meta? Drop the 'meta' prefix?
There was a problem hiding this comment.
Is the cache here TableCache or StaleCache maintained in ReplicaSelector? There is no need to pass it in.
Will take care of the rest comments.
There was a problem hiding this comment.
Calling it "meta" is fine at this point, the only other thing we could be balancing requests over is a user table, and there's separate mechanism for that functionality (I haven't looked into why that functionality cannot be reused here, but I presume it pertains to the different means of configuration... which is a bit of a shame).
If we need to use this to spread requests over replicas of other system tables, we can update the class name accordingly.
There was a problem hiding this comment.
Agreed. I am going to rename it "Catalog" instead of "Meta", to reflect the fact that it can be used to location lookup service. As for user tables, if LoadBalance mode needs to be supported, current interface already supports it. It is up to application to support this mode.
...lient/src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaChooser.java
Show resolved
Hide resolved
...src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaSimpleChooser.java
Show resolved
Hide resolved
| private final AsyncConnectionImpl conn; | ||
|
|
||
| MetaReplicaLoadBalanceReplicaSimpleChooser(final AsyncConnectionImpl conn) { | ||
| staleCache = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Declare the data member and assign at same time rather than declare above and assign here? nit.
There was a problem hiding this comment.
Will take care of it.
...src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaSimpleChooser.java
Show resolved
Hide resolved
|
|
||
| /** Conf key for enabling meta replication */ | ||
| public static final String USE_META_REPLICAS = "hbase.meta.replicas.use"; | ||
| public static final String META_REPLICAS_MODE = "hbase.meta.replicas.mode"; |
There was a problem hiding this comment.
Yeah, does it have to be meta exclusive. Can this go somewhere in client package rather than here in the global HConstants? It is a client-only config?
There was a problem hiding this comment.
What do you propose @saintstack ? hbase.catalogue.replicas.mode ?
There was a problem hiding this comment.
"hbase.catalog.replicas.mode"? Or we want to include "meta" here so it can only enable meta table. In the future, let's say root table needs to be supported, so another config knob will be added. By this way, configs are separated for both for current "meta" and future system tables.
There was a problem hiding this comment.
Thought it about again, I think this config knob needs to be specific to meta for now. In the future, if it is going to be turned on for another table, different knob needs to be provided as basically they are for different purpose.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
On a high level design, we used to have a For me, I think the scope of
I think only 1 is what we really care here, so I suggest that, we just narrow the scope of the newly introduced configs to be locator only(maybe by adding 'locator' keyword in the config name), and consider it first before In general, I think the abstraction and trick here are both good. Setting the replica id directly on Query is a straight forward way to archive our goal here, and the chooser or selector is also a good abstraction. The only concern is how to implement the 'fallback to primary' logic as we need to pass down from the rpc retrying caller of the actual exception type, anyway, this can be done later. And I suggest we just make this PR against master branch, it is only client side change and whether we have implement the meta region replication should not impact the code. Why I suggest this is that, the code for master and branch-2 will be different, as on branch-2, the sync client has its own logic to do region locating, which is not well constructed I believe(we expose a bunch of locating methods in ClusterConnection interface and use it everywhere). So if we want to include this feature in 2.4.0, we'd better make this PR against master, and also backport it to branch-2 ASAP. Thanks. |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncConnectionImpl.java
Show resolved
Hide resolved
| */ | ||
| enum MetaReplicaMode { | ||
| None, | ||
| HighAvailable, |
There was a problem hiding this comment.
I do not think this is suitable for user tables yet. At least for the LoadBalance mode, where we explicitly mention the 'stale' data will impact our policy. For meta table I think whether the data is 'stale' can be detected by our own, but for user tables, only users know how to determine whether the data is stale, which means we need to make the interface be able to get information from users on whether the previous data is stale and I only want to go to primary replica this time. This requires a good design on interface as it will be IA.Public, so I think keep it as meta only and IA.Private is more suitable for now.
| */ | ||
| enum MetaReplicaMode { | ||
| None, | ||
| HighAvailable, |
There was a problem hiding this comment.
But better to put this enum to a separated file?
...src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaSimpleChooser.java
Show resolved
Hide resolved
...src/main/java/org/apache/hadoop/hbase/client/MetaReplicaLoadBalanceReplicaSimpleChooser.java
Show resolved
Hide resolved
For me, original thought is that
Agreed.
Ok.
Do you think "fallback to primary" logic needs to be passed down from rpc retrying caller? Then it needs to be aware of this feature and needs to maintain some state. Was trying to avoid it. Yeah, this part can be optimized later.
Yeah, the client side change is quite independent. It will work with today's meta replica, only nit is that there will be more stale data. Thanks for the feedbacks, Duo. |
|
Forgot to mention that patch against master, the only nit is testing cases as there is no meta replication support in master, will check the tests and see how much test can be ported. |
ndimiduk
left a comment
There was a problem hiding this comment.
My comments are mostly on style and code structure. Only a couple questions on the implementation details.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncConnectionImpl.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java
Show resolved
Hide resolved
| + " but there is no meta replica configured"); | ||
| this.metaReplicaChooser = null; | ||
| } else { | ||
| String metaReplicaChooserImpl = conn.getConfiguration().get( |
There was a problem hiding this comment.
Please move this ChooserImpl instantiation into a factory method.
| @Override | ||
| public void onError(Throwable error) { | ||
| // TODO: if it fails, with meta replica load balance, it may try with another meta | ||
| // replica. This improvement will be done later. |
There was a problem hiding this comment.
Do you have a Jira ID for this TODO?
| import org.junit.experimental.categories.Category; | ||
|
|
||
| @Category({ MediumTests.class, ClientTests.class }) | ||
| public class TestAsyncNonMetaRegionLocatorWithMetaReplicaLoadBalance |
There was a problem hiding this comment.
oh no, please not test inheritance :'(
Can this be done as a Parameterized test instead?
There was a problem hiding this comment.
Let me check it. Trying to make sure that with Parameterized test, minicluster is going to be recreated (once I was there, but forgot the details).
| * | ||
| * @see TestRegionReplicaReplicationEndpoint | ||
| */ | ||
| @Category({LargeTests.class}) |
There was a problem hiding this comment.
Not yours, but I think this can be be added to the MasterTests category. Or RegionServerTests.. I'm not even clear on which one this is at this point. Maybe need a category for CatalogueTests.
There was a problem hiding this comment.
will add "RegionServerTests" tag.
| return !containsScanner.advance() && matches >= 1 && count >= matches && count == cells.size(); | ||
| } | ||
|
|
||
| private void doNGets(final Table table, final byte[][] keys) throws Exception { |
There was a problem hiding this comment.
nit: for helper methods who's purpose is to make assertions, I like to prefix their names with assert. So, assertDoNGets, assertPrimaryNoChangeReplicaIncrease, &c.
| Arrays.copyOfRange(HBaseTestingUtility.KEYS, 1, HBaseTestingUtility.KEYS.length))) { | ||
| verifyReplication(TableName.META_TABLE_NAME, NB_SERVERS, getMetaCells(table.getName())); | ||
| try (Table table = HTU | ||
| .createTable(TableName.valueOf(this.name.getMethodName() + "_0"), HConstants.CATALOG_FAMILY, |
There was a problem hiding this comment.
FYI, there's a TableNameTestRule for generating the TableName from the running method. I guess it doesn't help where you're adding a suffix though...
|
|
||
| // Wait until moveRegion cache timeout. | ||
| while (destRs.getMovedRegion(userRegion.getRegionInfo().getEncodedName()) != null) { | ||
| Thread.sleep(1000); |
There was a problem hiding this comment.
please use waitFor with a maximum time to wait.
If you want to make a decision based on whether the previous returned location is stale, then you need to pass something down from the rpc retrying caller to the selector, as only in rpc retrying caller we can get the exception we want. Either by adding a parameter as I proposed above to not go to secondary replicas, or as your current solution, adding a stale cache in the selector and you need to pass down the exception and the related location. Thanks. |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java
Show resolved
Hide resolved
Thank Duo. I think I am still not 100% clear here. Let me try to explain my understanding of what you thought before moving forward.
If a location is stale, rpc retry caller will detect it and pass down exception to AsyncRegionLocator(AsyncNonMetaRegionLocator), AsyncNonMetaRegionLocator updates/cleans up table cache accordingly.
In the above flow, rpc retry caller layer is the one detecting stale location, it is up to AsyncRegionLocator layer to handle all the logic. Similarly, meta selector interface tries to do similar and build the state it needs to make decision inside itself. As for different algorithm, i.e, the case you described above, a concrete example, lets there there are 5 meta replicas, if the location from meta replica 1 stale, it may try other meta replica. We can have a different implementation of selector, Thought that the selector interface is designed for this purpose, maybe I miss something here. The initial implementation, I kept fromMetaReplica in tableCache entry. When the patch was out for review, I dropped this change, as the current simple algorithm really does not need this info. The select interface keeps fromMetaReplicaId for future enhancement so when it is needed, at least the interface does not need to be changed. I can drop fromMetaReplicaId in the current simple implementation as it is not needed anyway. Maybe you were referring to add a flag in locateInMeta() method? If the flag says going with primary only, it will bypass all these logic and goes only to primary region so it will give upper layer more control. Please share more thoughts, thanks. |
|
Will post a new patch based on master, the major change in the master branch is that meta replica # is part of meta table, test and selector initialization needs to be changed accordingly. |
|
This request is replaced by a new request against master branch, will do a backport once the master patch is reviewed and merged. |
… load over meta replica regions
It adds load balance support. With "hbase.meta.replicas.use" set to true, client support meta replica feature.
The existing mode is called HighAvailable, client sends scan request to the primary meta replica region first,
if response is not back within a configured amount of time, it will send scan requests to all meta replica regions and
take the first response. On top of the existing mode, a new mode LoadBalance is added. In this mode, client first
choose a meta replica region to send scan request. If the response is stale, it will send the scan request to the primary
meta region. In this mode, all meta replica regions serve scan requests.
Two new config knobs are added:
hbase.meta.replicas.mode
Valid value is "HighAvailable" and "LoadBalance".
hbase.meta.replicas.mode.loadbalance.replica.chooser
Implementation class for MetaReplicaChooser. Only org.apache.hadoop.hbase.client.MetaReplicaLoadBalanceReplicaSimpleChooser.class
is supported for now, which is the default as well.