New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](proxy) Fix getProxy frequently parse hostname to ip #28479
Conversation
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run clickbench |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
838929d
to
47d3411
Compare
clang-tidy review says "All clean, LGTM! 👍" |
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client. So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname. This PR mainly changes: 1. Add DNSCache for both FE and BE. The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP. Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it and update the cache. In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may be changed at anytime. There are other implements of this dns cache: 1. kaka11chen@36fed13 This is for BE side, but it does not handle the IP change case. 3. #28479 This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change. And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client. So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname. This PR mainly changes: 1. Add DNSCache for both FE and BE. The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP. Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it and update the cache. In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may be changed at anytime. There are other implements of this dns cache: 1. kaka11chen@36fed13 This is for BE side, but it does not handle the IP change case. 3. apache#28479 This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change. And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client. So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname. This PR mainly changes: 1. Add DNSCache for both FE and BE. The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP. Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it and update the cache. In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may be changed at anytime. There are other implements of this dns cache: 1. kaka11chen@36fed13 This is for BE side, but it does not handle the IP change case. 3. apache#28479 This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change. And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
Proposed changes
Each RPC between Doris FE and BE will call getProxy. Each getProxy will parse hostname and check whether the IP has changed.
When the hostname resolution server is unstable, frequent parse hostname will often fail and cause performance problems.
Therefore, only check whether the backend IP changes during heartbeat, and if so, recreate the rpc client.
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...