Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient prefers IP (from DNS) from the same network as host #27734

Closed
Elufimov opened this issue Oct 25, 2018 · 16 comments
Closed

HttpClient prefers IP (from DNS) from the same network as host #27734

Elufimov opened this issue Oct 25, 2018 · 16 comments
Labels
area-System.Net.Http bug os-linux Linux OS (any supported distro)
Milestone

Comments

@Elufimov
Copy link

Env:
Consul server of 3 master nodes with dns enable. 4 servers registered by one name, 172.18.2.20,172.18.2.21,10.50.0.72,10.50.0.73. Ip of host server is 172.18.1.250. Consul by default has 0 ttl on dns query and shuffle ips.

dig searchcases.service.caravan.consul

; <<>> DiG 9.10.3-P4-Ubuntu <<>> searchcases.service.caravan.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23101
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;searchcases.service.caravan.consul. IN	A

;; ANSWER SECTION:
searchcases.service.caravan.consul. 0 IN A	172.18.2.20
searchcases.service.caravan.consul. 0 IN A	10.50.0.73
searchcases.service.caravan.consul. 0 IN A	10.50.0.72
searchcases.service.caravan.consul. 0 IN A	172.18.2.21

;; ADDITIONAL SECTION:
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="

;; Query time: 1 msec
;; SERVER: 172.16.0.79#53(172.16.0.79)
;; WHEN: Thu Oct 25 07:02:47 UTC 2018
;; MSG SIZE  rcvd: 271

dig searchcases.service.caravan.consul

; <<>> DiG 9.10.3-P4-Ubuntu <<>> searchcases.service.caravan.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35746
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;searchcases.service.caravan.consul. IN	A

;; ANSWER SECTION:
searchcases.service.caravan.consul. 0 IN A	172.18.2.21
searchcases.service.caravan.consul. 0 IN A	10.50.0.73
searchcases.service.caravan.consul. 0 IN A	10.50.0.72
searchcases.service.caravan.consul. 0 IN A	172.18.2.20

;; ADDITIONAL SECTION:
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="

;; Query time: 1 msec
;; SERVER: 172.16.0.79#53(172.16.0.79)
;; WHEN: Thu Oct 25 07:02:47 UTC 2018
;; MSG SIZE  rcvd: 271

As one can see consul return ips in random order for each query.
The test code looks like:

             var host = new HostBuilder()
                .ConfigureServices((context, collection) =>
                {
                    collection
                        .AddHttpClient("1")
                        .SetHandlerLifetime(TimeSpan.FromSeconds(5))
                        .ConfigurePrimaryHttpMessageHandler(builder =>
                            new SocketsHttpHandler
                            {
                                PooledConnectionLifetime = TimeSpan.FromSeconds(5)                             
                            }
                        );
                })
                .Build();
            var _httpClients = host.Services.GetService<IHttpClientFactory>().CreateClient("1");
            while (true)
            {
               var response = _httpClients.GetAsync(args[0]).GetAwaiter().GetResult();
            }

From the start requests split almost equally on 172.18.2.20,172.18.2.21 and nothing goes to 10.50.0.72,10.50.0.73. HttpClient send request to 10.50.0.72,10.50.0.73 only if 172.18.2.20,172.18.2.21 were removed from consul. And if we reintroduce even one of 172.18.2.20,172.18.2.21 in consul it makes all request go to 172.18.2.20 or 172.18.2.21.

dotnet --version
2.1.400
Application was published with self contained for ubuntu-x64 runtime and was launched on Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-103-generic x86_64)
@karelz
Copy link
Member

karelz commented Oct 25, 2018

@Elufimov AFAIK we call DNS subsystem and don't make any preferences. Can you check what the DNS subsystem returns in your case?

@karelz
Copy link
Member

karelz commented Oct 25, 2018

cc @wfurt

@wfurt
Copy link
Member

wfurt commented Oct 25, 2018

One part is that we have connection pool and existing connections will be reused. That may be one reason why you don't see expected distribution. You can try to set Connection: close header to force new connection for each request - just as a experiment.

@Elufimov
Copy link
Author

Elufimov commented Oct 26, 2018

Updated code
using System;
using System.Diagnostics;
using System.Net;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

namespace PravoRu.DataLake.DNS.Test
{
    class Program
    {

        static void Main(string[] args)
        {
            var uri = new Uri(args[0]);
            
            var host = new HostBuilder()
                .ConfigureServices((context, collection) =>
                {
                    collection
                        .AddHttpClient("1", client =>
                        {
                            client.DefaultRequestHeaders.Connection.Add("close");
                        })
                        .SetHandlerLifetime(TimeSpan.FromSeconds(5))
                        .ConfigurePrimaryHttpMessageHandler(builder =>
                            new SocketsHttpHandler
                            {
                                PooledConnectionLifetime = TimeSpan.FromSeconds(5)                             
                            }
                        );
                })
                .Build();

            var httpClient = host.Services.GetService<IHttpClientFactory>().CreateClient("1");

            var dsnResolvingTask = Task.Run(() =>
            {
                while (true)
                {
                    try
                    {
                        var ipAddresses = Dns.GetHostAddressesAsync(uri.DnsSafeHost).GetAwaiter().GetResult();
                        Console.WriteLine($"{DateTime.Now} For {args[0]} was resolved: {string.Join<IPAddress>(",", ipAddresses)}");
                    }
                    catch (Exception exception)
                    {
                        Console.WriteLine(exception.ToString());
                    }

                    Thread.Sleep(TimeSpan.FromSeconds(1));                   
                }
            });

            var httpRequestsTask = Task.Run(() =>
            {
                while (true)
                {
                    var sw = Stopwatch.StartNew();
                    var response = httpClient.GetAsync(uri).GetAwaiter().GetResult();
                    sw.Stop();
                    if(!response.IsSuccessStatusCode)
                    {
                        Console.WriteLine($"{DateTime.Now} Request was unsuccessful with return code: {response.StatusCode.ToString()} and time: {sw.ElapsedMilliseconds} ms");
                    }
                }
            });

            Task.WaitAll(dsnResolvingTask, httpRequestsTask);
        }
    }
}

@karelz I checked Dns.GetHostAddressesAsync and the result was

dns resolved in code
10/26/18 5:47:14 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,172.18.2.20,10.50.0.73
10/26/18 5:47:15 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.20,172.18.2.21,10.50.0.72
10/26/18 5:47:16 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,172.18.2.20,10.50.0.73
10/26/18 5:47:17 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.20,10.50.0.72,10.50.0.73
10/26/18 5:47:18 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.20,10.50.0.73,10.50.0.72
10/26/18 5:47:19 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,10.50.0.73,10.50.0.72
10/26/18 5:47:20 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.20,10.50.0.72,10.50.0.73
10/26/18 5:47:21 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.20,10.50.0.72,10.50.0.73
10/26/18 5:47:22 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,10.50.0.72,10.50.0.73
10/26/18 5:47:23 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,10.50.0.72,10.50.0.73
10/26/18 5:47:24 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,10.50.0.72,10.50.0.73
10/26/18 5:47:25 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,172.18.2.20,10.50.0.72
10/26/18 5:47:26 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,172.18.2.20,10.50.0.72
10/26/18 5:47:27 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.20,172.18.2.21,10.50.0.73
10/26/18 5:47:28 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.20,10.50.0.73,10.50.0.72
10/26/18 5:47:29 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,172.18.2.20,10.50.0.72
10/26/18 5:47:30 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.20,10.50.0.73,10.50.0.72
10/26/18 5:47:31 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,172.18.2.20,10.50.0.72
10/26/18 5:47:32 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,10.50.0.72,10.50.0.73
10/26/18 5:47:33 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,172.18.2.20,10.50.0.72
10/26/18 5:47:34 AM For http://searchcases.service.caravan.consul:8083/SearchCases/healthcheck.json was resolved: 172.18.2.21,10.50.0.73,10.50.0.72

As one can see Dns.GetHostAddressesAsync returning ips sorted by subnet and just thee of them. I attached small example, but it was the same on all observation period.

@wfurt it drastically decreases performance form ~5000 to ~1700 in 10s. The behaviour was not change.

dns resolved by dig
; <<>> DiG 9.10.3-P4-Ubuntu <<>> searchcases.service.caravan.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49032
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;searchcases.service.caravan.consul. IN	A

;; ANSWER SECTION:
searchcases.service.caravan.consul. 0 IN A	172.18.2.21
searchcases.service.caravan.consul. 0 IN A	10.50.0.73
searchcases.service.caravan.consul. 0 IN A	10.50.0.72
searchcases.service.caravan.consul. 0 IN A	172.18.2.20

;; ADDITIONAL SECTION:
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="

;; Query time: 1 msec
;; SERVER: 172.16.0.79#53(172.16.0.79)
;; WHEN: Fri Oct 26 05:57:49 UTC 2018
;; MSG SIZE  rcvd: 271

; <<>> DiG 9.10.3-P4-Ubuntu <<>> searchcases.service.caravan.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49494
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;searchcases.service.caravan.consul. IN	A

;; ANSWER SECTION:
searchcases.service.caravan.consul. 0 IN A	10.50.0.72
searchcases.service.caravan.consul. 0 IN A	172.18.2.21
searchcases.service.caravan.consul. 0 IN A	10.50.0.73
searchcases.service.caravan.consul. 0 IN A	172.18.2.20

;; ADDITIONAL SECTION:
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="

;; Query time: 1 msec
;; SERVER: 172.16.0.79#53(172.16.0.79)
;; WHEN: Fri Oct 26 05:58:06 UTC 2018
;; MSG SIZE  rcvd: 271

; <<>> DiG 9.10.3-P4-Ubuntu <<>> searchcases.service.caravan.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36354
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;searchcases.service.caravan.consul. IN	A

;; ANSWER SECTION:
searchcases.service.caravan.consul. 0 IN A	10.50.0.72
searchcases.service.caravan.consul. 0 IN A	10.50.0.73
searchcases.service.caravan.consul. 0 IN A	172.18.2.20
searchcases.service.caravan.consul. 0 IN A	172.18.2.21

;; ADDITIONAL SECTION:
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="
searchcases.service.caravan.consul. 0 IN TXT	"consul-network-segment="

;; Query time: 1 msec
;; SERVER: 172.16.0.79#53(172.16.0.79)
;; WHEN: Fri Oct 26 05:58:07 UTC 2018
;; MSG SIZE  rcvd: 271 

And I want to notice that everything works fine with jvm. If one disable dns caching of course.

@Elufimov
Copy link
Author

I just checked under windows server 2016 and everything was the same.

@karelz
Copy link
Member

karelz commented Oct 26, 2018

@Elufimov what do you mean by "everything was the same" on Windows 2016? Same problem experienced?
By DNS subsystem, I meant the one in OS, not .NET APIs ... I wonder if it is OS or .NET who is somehow sorting the addresses.

@Elufimov
Copy link
Author

Elufimov commented Oct 26, 2018

@karelz yes, I have the same problem on windows 2016. You can find dig output in my previous message. I can run something else if it not what needed. But dig shows that consul returns ips in random order like it should. Also simple ping uses all ips in random order.

@karelz
Copy link
Member

karelz commented Oct 26, 2018

It looks like we are returning addresses in the order that is given to us by OS API:
https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/System.Net.NameResolution/src/System/Net/NameResolutionPal.Windows.cs#L159
Which I believe ends up calling to:
https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/Common/src/Interop/Windows/Winsock/Interop.GetAddrInfoW.cs

We need to debug into that to confirm. In the meantime if you can confirm that GetAddrInfoW returns different order on each call, that would be very helpful.

@Elufimov
Copy link
Author

Elufimov commented Nov 1, 2018

I wrote simple test in corefx\src\System.Net.NameResolution\tests\PalTests\System.Net.NameResolution.Pal.Tests.cs

Test code
        [Fact]
        public void Issue33037_TryGetAddrInfo()
        {
            Dictionary<string, int> results = new Dictionary<string, int>();
            List<int> iterations = Enumerable.Range(0, 10000).ToList();
            iterations.ForEach((i) =>
            {
                IPHostEntry hostEntry;
                int nativeErrorCode;
                SocketError error = NameResolutionPal.TryGetAddrInfo("searchcases.service.caravan.consul", out hostEntry, out nativeErrorCode);
                Assert.Equal(SocketError.Success, error);
                var ipsString = string.Join<IPAddress>(";", hostEntry.AddressList);
                var value = results.GetValueOrDefault(ipsString, 0);
                results[ipsString] = value + 1;
            });
            var lines = results.Select(entity => entity.Key + ": " + entity.Value.ToString());
            Console.WriteLine(string.Join(Environment.NewLine, lines));
        }

Test output:

172.18.2.21;10.50.0.72;10.50.0.73: 1217
172.18.2.21;172.18.2.20;10.50.0.73: 1264
172.18.2.20;10.50.0.72;10.50.0.73: 1233
172.18.2.21;10.50.0.73;10.50.0.72: 1275
172.18.2.20;172.18.2.21;10.50.0.72: 1231
172.18.2.21;172.18.2.20;10.50.0.72: 1266
172.18.2.20;172.18.2.21;10.50.0.73: 1243
172.18.2.20;10.50.0.73;10.50.0.72: 1271

Also I can confirm that SafeFreeAddrInfo.GetAddrInfo returns 3 addresses
There is no sorting code in NameResolutionPal.TryGetAddrInfo.

@stephentoub
Copy link
Member

@Elufimov, I'm trying to follow along with the thread... you're suggesting that GetHostAddressesAsync is sorting the results returned by NameResolutionPal? Where?

@Elufimov
Copy link
Author

Elufimov commented Nov 1, 2018

@stephentoub I am suggesting that SafeFreeAddrInfo.GetAddrInfo returns results sorted and not all of them. At least this is what I get in my environment.

@Elufimov
Copy link
Author

Elufimov commented Nov 1, 2018

My case is that I have consul as dns server and i have service with 4 ip addresses (172.18.2.20, 172.18.2.21, 10.50.0.72, 10.50.0.73). When I use domain name managed by consul in httpclient it do not provide a round robin load balancing like it should, it uses only addresses from 172 subnet if they exist in dns response. I do not think that is something wrong with consul because dig and our jvm applications work fine.

@stephentoub
Copy link
Member

stephentoub commented Nov 1, 2018

I am suggesting that SafeFreeAddrInfo.GetAddrInfo returns results sorted and not all of them

Ok. In that case it's a question about Windows behavior, rather than .NET Core. SafeFreeAddrInfo.GetAddrInfo just P/Invokes to Windows:

https://github.com/dotnet/corefx/blob/61f51e6b2b26271de205eb8a14236afef482971b/src/Common/src/Interop/Windows/Winsock/SafeFreeAddrInfo.cs#L23

https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/Common/src/Interop/Windows/Winsock/Interop.GetAddrInfoW.cs#L14-L20

unless you're suggesting that GetAddrInfoW is the wrong method to be consuming?

@Elufimov
Copy link
Author

Elufimov commented Nov 1, 2018

I have the same behavior on windows and on linux. I will be glad to collect additional data and samples.

@karelz karelz changed the title HttpClient prefers ips (from dns) from the same network as host HttpClient prefers IP (from DNS) from the same network as host Oct 9, 2019
@karelz
Copy link
Member

karelz commented Oct 9, 2019

Triage: We believe that this scenario will be enabled by implementing custom dialers (ConnectionCallback) http://github.com/dotnet/corefx/issues/35404 (#28721 after migration to runtime repo), where anyone can customize what they want to pick.

@karelz karelz closed this as completed Oct 9, 2019
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@scalablecory
Copy link
Contributor

This has been resolved via the API added here in .NET 5: #41949

@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http bug os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

6 participants