Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow HttpClient operation caused by DNS resolution failure on iOS #41451

Closed
rxwen opened this issue Apr 11, 2020 · 94 comments
Closed

Slow HttpClient operation caused by DNS resolution failure on iOS #41451

rxwen opened this issue Apr 11, 2020 · 94 comments
Assignees
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-_http library-io P1 A high priority bug; for example, a single project is unusable or has many test failures type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@rxwen
Copy link

rxwen commented Apr 11, 2020

The issue has been discussed in flutter, move it here since it's more relevant.

Our environment is iOS version: 13.3.1. iPhone 6S. (And some customers reported same problems on different devices, iPhone 7, iPhone X, latest iOS system as of now).

As a reference, I created a flutter sample project, with

flutter create --androidx -t app sample

Then, add some code in main.dart to perform a simple http get request, whenever user clicks the + button and display the time spent. (https://gist.github.com/rxwen/009c7ae4328ee799f71013f06e206b13#file-main-dart-L55)
This simplest example, exhibit the same behavior.

Took about 5 second on wifi (On the same device, using safari can open the same page instantaneously under the same wifi connection)
Took about 500 ms on 4G

I run sample on flutter v1.12.13+hotfix.8, v1.9.1 in debug and release mode. All got the similar results.

Further testing showed that it may be related to dns. I tried change the url to its ip address instead of domain name, the speed isn't slow at all.
And if I set the iPhone's dns to manual mode, and remove the ipv6 dns server (which is configured automatically), the problem is gone too.

So, here is the question, how come the ipv6 dns server will affect the flutter app's dns resolution? While other apps are not affected. It's not a feasible solution if we have to ask our app user to configure the dns manually when they encounter this problem. There still should be something different in dart code base that handle the dns in a different way.

@rxwen
Copy link
Author

rxwen commented Apr 12, 2020

The problem is with dns resolution, I changed the sample to perform dns resolution as:
`
var sw = new Stopwatch()..start();

final ia = await InternetAddress.lookup("baidu.com");

sw.stop();

_counter = sw.elapsed.inMilliseconds;

`
It costs like 5 seconds to run.

The automatic dns configuration is:

image

If I removed the IPv6 dns server manually, or add a custom dns server (8.8.8.8), the resolution speed is ok.

The dig command test showed the 192.168.1.1 server is ok, and the ipv6 server doesn't.

@rxwen
Copy link
Author

rxwen commented Apr 12, 2020

The router is a openwrt, it doesn't have upstream ipv6 connection.

Looks like when there are both ipv4 and ipv6 LAN connections on the iOS, the flutter performs dns resolution only with ipv6 (which should fail because there is no ipv6 upstream connection), but ignores the ipv4 dns server. Other applications still respect the ipv4 dns server setting, so they will not fail.

@rxwen
Copy link
Author

rxwen commented Apr 12, 2020

Capture network traffic on iPhone showed that whenever I asked the app to lookup ip address, it always started a AAAA type lookup and failed. The A type lookup only happened once with a success result, and the result should have been cached.

The InternetAddress.lookup method waited until the type AAAA resolution failed eventually after 5 seconds and then returned. Thought it should be able to return the cached result quicker.

And I'm not sure if the ipv6 ip address is the root cause. As a user of the app reported the slow behavior, while his iPhone didn't obtained a ipv6 address at all.

If https://github.com/dart-lang/sdk/blob/master/runtime/bin/socket_base_linux.cc#L202 is the code for iOS platform, looks like it's implemented on top of getaddrinfo function.

@vsmenon vsmenon added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Apr 13, 2020
@a-siva a-siva added library-_http area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-io and removed area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. labels Apr 14, 2020
@a-siva
Copy link
Contributor

a-siva commented Apr 14, 2020

/cc @zichangg

@zichangg
Copy link
Contributor

From your description, it is likely to be relative to our POSIX socket operations. In dart, we handle IOS and MacOS the same way with read,write and other POSIX sockets methods, which is discouraged by IOS man page.

@rxwen Can you verify that this happens only on IOS but not MacOS?

Some discussion: #41376

@rxwen
Copy link
Author

rxwen commented Apr 14, 2020

I'm not that familiar with macos app development. So a side question is, do I need to follow this guide (https://flutter.dev/desktop) to try a macos app?

We haven't received any report on android platform with this issue so far.

@zichangg
Copy link
Contributor

You'll need to run the same code on MacOS. Not so familiar with flutter setup either but your link looks good to me.

If you have dart(which should be included in Flutter), you can just run dart path/to/your_program.dart on Mac.

Android's implementation is quite similar but not enough to draw the conclusion. They use different eventhandlers and details are slightly different, which might contain a bug.

@rxwen
Copy link
Author

rxwen commented Apr 14, 2020

Sure. Will try when I'm in the reproducible environment and let you know.

My initial suspect is could it be possible that it's blocked here till timeout?
If it's caused by the posix socket, does it mean that the flutter code is waiting for io_service's response on a socket without being able to return earlier?

@zichangg
Copy link
Contributor

My initial suspect is could it be possible that it's blocked here till timeout?
If it's caused by the posix socket, does it mean that the flutter code is waiting for io_service's response on a socket without being able to return earlier?

It is possible. For lookup, it is blocking. Connection will be issued after addresses are resolved.
Checkout the implementation here:

static Future<List<InternetAddress>> lookup(String host,

@rxwen
Copy link
Author

rxwen commented Apr 14, 2020

Is the original design of the lookup to be blocking in the case that the underlying dns has already got a ip for the domain name?
Could the lookup be async and the slow response could simply update the dns cache when it eventually arrives?

@zichangg
Copy link
Contributor

Is the original design of the lookup to be blocking in the case that the underlying dns has already got a ip for the domain name?

As you pointed out, we wrapped info and passed into getaddrinfo. The behavior is controlled by system call. You should check getaddrinfo man page.

Could the lookup be async and the slow response could simply update the dns cache when it eventually arrives?

Apologies for using blocking in previous comment. We do put it into a Future so that it won't blocked. What I meant is following connections are concatenated to this Future and will be executed afterwards. Once lookup hangs, the following part will be delayed as well.

@sortie
Copy link
Contributor

sortie commented Apr 14, 2020

@rxwen Why is there an invalid IPv6 DNS server in your configuration? That seems like the problem. You're saying it's automatically configured? Maybe there's something wrong with your network and the addresses it's broadcasting? In this case, Dart is using the system getaddrinfo function which is outside of our control and is supposed to do something reasonable. In that sense, this is not a problem with Dart, and we can't do anything.

Although, as @zichangg mentions, there is another iOS networking API that we could use, and it might resolve addresses differently. It's still a problem that there's a broken server in the DNS configuration, but this other API may smooth over the problem with concurrent DNS lookup.

@rxwen
Copy link
Author

rxwen commented Apr 14, 2020

@sortie The reason I'm having a IPv6 DNS server is because my router supports ipv6, but my Internet port doesn't. So IPv6 works in LAN. I doubted its caused by my network configuration as well. But what I don't understand is, only the flutter apps (our pratical app or the simple demo I mentioned before) are affected, all other apps were working fine.
And this is the only environment that I have access to that can reproduce the issue. As you may notice in my previous mentioned, some users of our app reported the same behavior (about 5 seconds delay for each network access), and the information I gathered from them are:

  1. Other apps are not affected.
  2. Some of them helped check DNS configuration, there is no IPv6 dns server configured.

Since I can't do further investigation in users' environment. I have to use whatever I can gather so far to find out the cause. At least find out a explanation to these questions above. And is there possibility that the flutter app doesn't work well with some kind of iOS network configuration?

Also, I tried in a different network environment that is similar in that it also has a IPv6 Lan, but no IPv6 external ip address. I can confirm there is a IPv6 dns server configured in the same iPhone device. But in this environment, the flutter apps are not slow.

@mraleph
Copy link
Member

mraleph commented Apr 14, 2020

FWIW, Apple networking guidelines have the following section Avoid Resolving DNS Names Before Connecting to a Host, which rather aptly describes the problem we are hitting here.

Our HttpClient should probably perform two DNS resolutions in parallel (instead of doing a single getaddrinfo hoping to resolve both IPv6 and IPv4) and then use whatever comes first (maybe with a some small timeout to accommodate for small delays).

@rxwen
Copy link
Author

rxwen commented Apr 14, 2020

@mraleph I thought the HttpClient might use InternessAddressType.Any internally. The type is passed to getaddrinfo, which was instructed to perform do two DNS resolutions in parallel.

@mraleph
Copy link
Member

mraleph commented Apr 14, 2020

@rxwen yeah, but I suspect that things hang because getaddrinfo waits to get both IPv6 and IPv4 lookup results back. Could you try changing your sample test to do:

await InternetAddress.lookup("baidu.com", type: InternetAddressType.IPv4)

and see if it hangs just the same or returns result fast?

@rxwen
Copy link
Author

rxwen commented Apr 14, 2020

I already tried this before. It returned immediately.

@rxwen
Copy link
Author

rxwen commented Apr 14, 2020

@zichangg I run the code below on mac:

import 'dart:io';
import 'dart:async';
import 'dart:core';
void main() async{
    print("hello world!");
    var ip = await InternetAddress.lookup("baidu.com");
    print("$ip");
}

It returned immediately.
But on a iPhone in the same wifi. It's still very slow.
Does it give any hint to you?

@rxwen
Copy link
Author

rxwen commented Apr 15, 2020

I captured network package in the other network environment which also has a IPv6 LAN address. It looks like the AAAA dns query got a response soon, so the httpclient isn't blocked.
It means the httpclient is indeed blocked due to the AAAA resolution timeout.

I'll try to get more information from our user in their environment and let you know.

@zichangg
Copy link
Contributor

Looks like getaddrinfo is stuck.

There is probably no quick fixes except a shorter timeout.

Rewriting our IOS socket implementation with CFSocket and CFHost mechanism should probably work for this case. But as I mentioned in another case #41376, it also has some limitations on enabling VPN.

@mraleph
Copy link
Member

mraleph commented Apr 15, 2020

@zichangg

There is probably no quick fixes except a shorter timeout.

There are two ways to look at this problem:

  • The bug is in the InternetAddress.lookup with type: any. It should have a shorter timeout.

  • The bug is in the _NativeSocket.startConnect which issues InternetAddress.lookup with type: any. Remember that the original issue people are experiencing is that HttpClient is very slow to connect when IPv6 network is somehow misconfigured - meaning that they don't really care about InternetAddress.lookup which happens deep in the dart:io implementation - they just want to pass hostname and let dart:io handle the rest efficiently.

I suggest we look at the second interpretation. For that there is rather straightforward fix - we should slightly rework how _NativeSocket.startConnect is implemented. Instead of issuing a single InternetAddress.lookup with type: any it should issue two separate lookups and use results of whatever comes first to start connecting. There is already logic inside that tries addresses one by one (if host has multiple addresses). The logic with multiple DNS lookups fits rather well into that.

@rxwen
Copy link
Author

rxwen commented Apr 15, 2020

From a user's perspective, the second suggested way sounds logic to me.
As a user of HttpClient, we don't have to worry about the details. The interface of it is straight forward to us.
I guess, if the change is made in underlying dart:io, other network applications like a plain tcp connection will be relieved too.

@sortie
Copy link
Contributor

sortie commented Apr 15, 2020

Would multiple DNS lookups actually solve the problem? We can ask for an IPv4/IPv6 address separately, yes, but that doesn't necessarily mean we use an a IPv4/IPv6 server unless iOS getaddrinfo specifically does that. For instance, I imagine asking for an IPv4 could make iOS getaddrinfo ask its IPv6 server first, and once it times out, then ask the IPv4 server and get a response.

Additionally if we did two concurrent lookups, which response do we prefer? The one that came first? The IPv6 one? The one the OS believes to have the best route? We'll be bypassing / reimplementing the system policy here.

In my opinion, this is a libc deficiency where getaddrinfo doesn't cope with a broken IPv6 DNS server efffieciently. We can work around it, potentially, but we're not quite on the right level for this kind of policy, and it would be an imperfect workaround. We do want to have some sort of solution since this results in real user problems.

Since other iOS apps don't have this problem, it sounds like we're using the wrong API to do DNS lookups. If there's a higher level API we can use that takes care of abstracting these policies away, that would the best option for a clear solution.

@mraleph
Copy link
Member

mraleph commented Apr 15, 2020

Would multiple DNS lookups actually solve the problem?

Yes. I asked what happens if you specifically ask IPv4 address and I got the answer that it resolves quick.

Since other iOS apps don't have this problem, it sounds like we're using the wrong API to do DNS lookups.

My guess other apps are following the Apple guidelines to not resolve hostnames before connecting and letting library handle that. See link from this comment.

We can debate things back and forth - but I don't think it is a good spend of our time, when there is a rather small fix to apply now, and this fix would unbreak people.

After unbreaking people we can spend time contemplating perfect solution (e.g. large scale refactoring of dart:io to use non-POSIX API on iOS and Mac OS X).

@wanjm
Copy link

wanjm commented Nov 30, 2020

any update?

@marikrg
Copy link

marikrg commented Dec 9, 2020

Any update here? I'm experiencing the same issue on a S10+ device.
DNS resolution takes ~500ms on 4g and ~9000ms on wifi.

@renntbenrennt
Copy link

it's almost Christmas!!!! Come on!!! Give us a Christmas miracle and approve this!!!! God!!! 😭😭😭

@mraleph
Copy link
Member

mraleph commented Dec 21, 2020

I think this has fallen through the cracks, our apologies. Unfortunately I don't think anything is going to happen until after NY given the holiday season.

@mraleph mraleph added the P1 A high priority bug; for example, a single project is unusable or has many test failures label Dec 21, 2020
@mraleph mraleph added this to the January 2021 milestone Dec 21, 2020
@mraleph
Copy link
Member

mraleph commented Dec 21, 2020

To make sure we are actually going to address it I am putting it into January 2021 milestone, marking as P1 and assigning to @aam (who has taken over dart:io from @zichangg)

@renntbenrennt
Copy link

😭 oh no!!!! What should I do.... Our users is mostly on iOS... oh god.... I can feel our PM and my boss is going to kill me.... 😭😭😭😭😭😭 (I put a huge bet on Flutter... oh god😭

And I have just finished reading through that code review thing on this: https://dart-review.googlesource.com/c/sdk/+/143883

And it seems the patch make by @zichangg is revert due to some issue at the end...

And according to what you said, @zichangg is no longer in charge of this? @mraleph

But how about the worked patch version @rxwen tested ? like what he said above?

I'm literally typing these words with tears 😭😭😭 oh god....

@mraleph
Copy link
Member

mraleph commented Dec 21, 2020

@SeasonLeee are your users actually hitting this problem? if you need a rapid fix then your options are:

  1. If you APIs are http rather than https (though it is unlikely) then you can perform DNS resolution yourself concurrently and bypass this issue by passing resolved API addresses into HttpClient APIs. Unfortunately this will not work for HTTPS because you can't specify hostname/authority via HttpClient which means you would have some issues with certificate validation.
  2. Move away from HttpClient e.g. perform requests in Objective-C/Swift and send stuff back to Dart via message channels or FFI.
  3. Re-apply @zichangg's fix (might require rebasing), ignoring a known issue with it, and build your own Flutter engine (requires advanced knowledge to be able to build Flutter engine)
  4. Copy HttpClient sources out of core libraries (similar to what alt_http package has done) and tweak the source to perform DNS resolution in parallel.

@renntbenrennt
Copy link

@mraleph oh my god, thanks for your response😭

And as for who is hitting this problem question, no, it's not my user but us developers! (We are doing pre-releases testing...

We plan to release the first version of our app in couple days, and now after almost every feature is done and run the project on real device, this issue(slow network requesting) occur! And we have tested it on couple different iOS devices range from newest to oldest...

And such slow internet operation is just unbearable! How will you feel if you are the very first user of a new app and facing such problem, will you continue to use an app like this? And will you recommend others to download this app? Our app is just going to be dead on arrival😭😭😭

Oh god, this is why I'm so freak out right now! 😭 (I am the one to persuade our team to adopt Flutter instead of continue using React Native....😭

But really thank you for your advices, I will give that perform internet request in native language and send response back to Dart, that method 2 in your response, because this is the method I can have a example as guide thanks to the sample code from @ivanryndyuk in #41451 (comment)
(because I don't really know anything about iOS development...😭 or why do you think we use flutter in the first place....

anyway, thank you for your response...

oh, by the way, I'm not sure if this is also a factor, that it seems most people on the internet reporting this slow internet requesting issue on iOS is located in mainland China, so as I.... I don't know if that GFW is messing anything or whatsoever... but I think it might have a little bit to do with it? (And my team just start to complain that "why this issue is not happening to React Native blah blah blah") I'm literally like this smile 😀 to them but deep inside my heart was like this 🥲

anyway, thanks again and wish you an early happy holiday!

@daadu
Copy link

daadu commented Feb 16, 2021

@mraleph For flutter users, any idea when this will land in "stable" or "beta" channels?

@mraleph
Copy link
Member

mraleph commented Feb 16, 2021

master: contains a173599f4c1055306429f04289884153897f81d7
dev: contains a173599f4c1055306429f04289884153897f81d7
beta: contains a173599f4c1055306429f04289884153897f81d7
stable: does not contain a173599f4c1055306429f04289884153897f81d7

beta should already contain the fix, so the next stable release (which should happen relatively soon, though no concrete date) will contain the fix as well.

@daadu
Copy link

daadu commented Feb 17, 2021

Is this fix also part of https://pub.dev/packages/http pub package?

@mraleph
Copy link
Member

mraleph commented Feb 17, 2021

@daadu that package should just be calling dart:io primitives, so there should be no need to fix the package itself.

@fekitibi
Copy link

Is this issue fixed or still open? Is it still only available on the beta channel?
I am also experiencing this issue on iOS devices.

@cyjaysong
Copy link

It's the same on Android

@davictor24
Copy link

Is the issue fixed? I still experience it, even on beta :(

@aam
Copy link
Contributor

aam commented Jan 2, 2022

@davictor24 the fix for the problem was to perform concurrent resolution using ipv4 and ipv6 which was landed almost year ago.
If you are still experiencing network-related performance problems please raise new tracking issue detailing if possible steps to reproduce what you see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-_http library-io P1 A high priority bug; for example, a single project is unusable or has many test failures type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests