New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport: HBASE-25126 Add load balance logic in hbase-client to distribute read… #2640
Backport: HBASE-25126 Add load balance logic in hbase-client to distribute read… #2640
Conversation
Hi @Apache9 and @saintstack, please review the changes for branch-2. The scan over a specific replica region is not supported in branch-2, so I added ReverseScan over a specific replica region in this patch, this needs to be carefully reviewed. Newly added support is also added in ConnectionImplementation as Duo mentioned before, thanks. |
We do not support scan on a specific replica on branch-2? I think this should be a bug? We can set replica id on a Scan object? |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old scanner implementation plus replica is a mess...
Need more time to review. I used to think we just need to write code on top of the old code but you say the old code does not support scanning a specific replica, then it will be a pain...
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ReversedClientScanner.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
Show resolved
Hide resolved
Seems that the patch has some issue, because related tests are failing, let me look into those failures. |
Actually, it is not mentioned before that scan over a specific replica region is supported. Right now, Reverse scan is supported, I need to create a followup jira to support this over normal scan. |
8da5fb0
to
f775727
Compare
Uploaded a new patch which fixes unitest errors. |
🎊 +1 overall
This message was automatically generated. |
For async client this is implemented. And for sync client on branch-2, we have timeline consistent read support for scan. IIRC, it is very complicated and spent me a lot of time to support the feature in async client. Let me read the code again. I could open a separated issue to support it on branch-2 to address the problem so you could rebase your patch later. |
Yeah, you are right. Both Async and Sync support timeline scan. There is no interface exposed to support meta scan over a replica region, maybe it is ok for now? |
No interface needed, just set the replicaId in the Scan object is enough? |
Checked the code, we do not support scan on a specific replica... I think this should be a bug... Let me open an issue to address it, should not be very hard but the code is a bit mess so... |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
|
a3f1c09
to
bf67659
Compare
Pushed addendum to remove setUseMetaReplicas() and updated unittests accordingly. |
@saintstack, I pushed the diff to remove setUseMetaReplicas(), please review, thanks. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
bf67659
to
7540dec
Compare
HBASE-18070.branch-2 needs to be rebased to include Duo's HBASE-27252. I rebased my branch and now the review messes up. |
e4e7386
to
77749cf
Compare
@Apache9 @saintstack I have rebased my branch to include HBASE-25272, please review. It shows lots of commits, but please look at the last two commits. Once Stack rebase HBASE-18070.branch-2, these previous commits will go away. `Huaxiang Sun Huaxiang Sun |
@saintstack Could you rebase HBASE-18070.branch-2 to latest branch-2? thanks |
77749cf
to
e6f65ab
Compare
@huaxiangsun done sir commit e6f65ab (HEAD -> HBASE-18070.branch-2.2, origin/HBASE-18070.branch-2) |
7540dec
to
93e7c72
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
I run TestMetaRegionLocationCache#testStandByMetaLocations and it passed locally. Probably this test is flakey. |
@Apache9 @saintstack I think your comments are addressed. Is this patch good to be merged? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed. Looks good to me.
@Apache9 Ping for +1. I think all your comments are addressed, thanks. |
I would assume that you had +1 here, but want to double check with you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is OK but seems a small UT failed so we do not run the full test. Let me retrigger thepre commit build to see if it is OK. Please wait a while, I think it could finish when you American get up in the morning.
🎊 +1 overall
This message was automatically generated. |
3f3bc2e
to
47ad207
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Thanks! |
… load over meta replica regions It adds load balance support for meta lookup in AsyncTableRegionLocator. The existing meta replica mode is renamed as "HedgedRead", client sends scan request to the primary meta replica region first, if response is not back within a configured amount of time, it will send scan requests to all meta replica regions and take the first response. On top of the existing mode, a new mode "LoadBalance" is introduced. In this mode, client first choose a meta replica region to send scan request. If the response is stale, it may send the request to another meta replica region or the primary region. In this mode, meta scan requests are load balanced across all replica regions with the primary mode as the ultimate source of truth. Two new config knobs are added: 1. hbase.locator.meta.replicas.mode Valid options are "None", "HedgedRead" and "LoadBalance", they are case insensitive. The default mode is "None". 2. hbase.locator.meta.replicas.mode.loadbalance.selector The load balance alogrithm to select a meta replica to send the requests. Only org.apache.hadoop.hbase.client.CatalogReplicaLoadBalanceReplicaSimpleSelector.class is supported for now, which is the default as well. The algorithm works as follows: a. Clients select a randome meta replica region to send the requests if there is no entry for the location in the stale location cache. b. If the location from one meta replica region is stale, a stale entry will be created in the statle location cache for the region. c. Clients select the primary meta region if the location is in the stale location cache. d. The stale location cache entries time out in 3 seconds. If there is no "hbase.locator.meta.replicas.mode" configured, it will check the config knob "hbase.meta.replicas.use". If "hbase.meta.replicas.use" is configured, the mode will be set to "HedgedRead". For branch-2 support, it introduces support for ReversedScan over a specific non-default replica region, this is mainly for load balance meta scan among all its replica regions. Remove ConnectionImplementation#setUseMetaReplicas()
93e7c72
to
0a77dce
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
… load over meta replica regions
It adds load balance support for meta lookup in AsyncTableRegionLocator.
The existing meta replica mode is renamed as "HedgedRead", client sends scan request to the primary meta replica region first,
if response is not back within a configured amount of time, it will send scan requests to all meta replica regions and
take the first response. On top of the existing mode, a new mode "LoadBalance" is introduced. In this mode, client first
choose a meta replica region to send scan request. If the response is stale, it may send the request to another meta replica region or
the primary region. In this mode, meta scan requests are load balanced across all replica regions with the primary mode as
the ultimate source of truth.
Two new config knobs are added:
hbase.locator.meta.replicas.mode
Valid options are "None", "HedgedRead" and "LoadBalance", they are case insensitive. The default mode is "None".
hbase.locator.meta.replicas.mode.loadbalance.selector
The load balance alogrithm to select a meta replica to send the requests.
Only org.apache.hadoop.hbase.client.CatalogReplicaLoadBalanceReplicaSimpleSelector.class
is supported for now, which is the default as well. The algorithm works as follows:
a. Clients select a randome meta replica region to send the requests if there is no entry for the location in the stale
location cache.
b. If the location from one meta replica region is stale, a stale entry will be created in the statle location cache
for the region.
c. Clients select the primary meta region if the location is in the stale location cache.
d. The stale location cache entries time out in 3 seconds.
If there is no "hbase.locator.meta.replicas.mode" configured, it will check the config knob "hbase.meta.replicas.use".
If "hbase.meta.replicas.use" is configured, the mode will be set to "HedgedRead".
For branch-2 support, it introduces support for ReversedScan over a specific non-default replica region, this is mainly
for load balance meta scan among all its replica regions.