Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient throws unexpected TaskCanceledException when HttpClient is reused #24392

Closed
shravan2x opened this issue Dec 8, 2017 · 58 comments
Closed
Labels
area-System.Net.Http bug os-linux Linux OS (any supported distro)
Milestone

Comments

@shravan2x
Copy link

This is the raw stack trace from the exception that occurs:

FATAL: Unhandled exception (A task was canceled.).
DEBUG:    at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsyncUnbuffered>d__59.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at QuarkHttp.QNewHttpClient.<SendNewAsync>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at QuarkHttp.QHttpClient.<SendAsync>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at DataApi.<GetDataAsync>d__4.MoveNext() in C:\...\DataApi.cs:line 84
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at DataPoller.<RefreshChangesAsync>d__18.MoveNext() in C:\...\DataPoller.cs:line 121
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at DataPoller.<RunPollCycle>d__17.MoveNext() in C:\...\DataPoller.cs:line 98
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

The issue:
My problem originates with the HttpWebRequest API being depricated and a significantly lower performance version being used in .NET Core 2.0. I use a library (QuarkHttp) that is a tiny wrapper around the HttpWebRequest API. When porting this wrapper library to .NET Core, I used a shared HttpClient instance to avoid the overhead of creating new connections each time (following @geoffkizer's comment /cc @davidsh). My code looks like this:

static QNewHttpClient()
{
    _defaultHttpClient = CreateHttpClient((false, null));
    _proxyHttpClients = new ConcurrentDictionary<(bool, IWebProxy), HttpClient>();
}

public static async Task<QHttpWrapNewResponse> SendNewAsync(this QHttpRequest request, bool fetchResponse = true, bool isAllowRedirects = false, int timeout = 100000, IWebProxy proxy = null)
{
    HttpClient httpClientToUse = _defaultHttpClient;
    if (isAllowRedirects || proxy != null)
        httpClientToUse = _proxyHttpClients.GetOrAdd((isAllowRedirects, proxy), CreateHttpClient);

    DateTime startTime = DateTime.Now;

    HttpRequestMessage requestMessage = CreateHttpRequestMessage(request);
    HttpCompletionOption completionOption = (fetchResponse ? HttpCompletionOption.ResponseContentRead : HttpCompletionOption.ResponseHeadersRead);
    HttpResponseMessage responseMessage;

    try
    {
        responseMessage = await httpClientToUse.SendAsync(requestMessage, completionOption, (new CancellationTokenSource(timeout)).Token).ConfigureAwait(false);
    }
    catch (Exception ex) when (ex is TaskCanceledException || ex is OperationCanceledException)
    {
        Log.Error($"{ex.GetType()} thrown (Time taken: {(DateTime.Now - startTime).TotalSeconds}s).");
        Log.Debug($"URL: {request.Url}\tMethod: {request.Method}");
        throw;
    }

    return new QHttpWrapNewResponse(responseMessage, fetchResponse);
}

private static HttpClient CreateHttpClient((bool allowRedirects, IWebProxy proxy) properties)
{
    HttpClientHandler clientHandler = new HttpClientHandler
    {
        AllowAutoRedirect = properties.allowRedirects,
        AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip,
        UseCookies = false
    };

    if (properties.proxy != null)
    {
        if (!clientHandler.SupportsProxy)
            throw new Exception("Proxy not supported.");

        clientHandler.UseProxy = true;
        clientHandler.Proxy = properties.proxy;
    }

    return new HttpClient(clientHandler)
    {
        Timeout = TimeSpan.FromMilliseconds(Int32.MaxValue)
    };
}

private static HttpRequestMessage CreateHttpRequestMessage(QHttpRequest request)
{
    HttpRequestMessage requestMessage = new HttpRequestMessage(new HttpMethod(request.Method.Name()), request.Url);
    requestMessage.Version = request.Version;
    if (request.Referrer != null)
        requestMessage.Headers.TryAddWithoutValidation("referer", request.Referrer);

    foreach (string curHeader in request.Headers)
    {
        string normalizedHeader = curHeader.ToLower();
        if (normalizedHeader != "content-type")
                //requestMessage.Headers.Add(curHeader, request.Headers[curHeader]);
            if (!requestMessage.Headers.TryAddWithoutValidation(curHeader, request.Headers[curHeader]))
                throw new Exception("Header invalid.");
    }

    if (request.Payload != null)
    {
        requestMessage.Content = request.Payload;
        requestMessage.Content.Headers.ContentType = MediaTypeHeaderValue.Parse(request.ContentType);
    }

    return requestMessage;
}

The exception occurs on this line

responseMessage = await httpClientToUse.SendAsync(requestMessage, completionOption, (new CancellationTokenSource(timeout)).Token).ConfigureAwait(false);

In the exception handler, I measure the time taken myself to ensure that I wasn't missing something obvious. This is the output of that logging:

[2017/12/08 16:37:43.742 QuarkHttp.QNewHttpClient] ERROR: System.Threading.Tasks.TaskCanceledException thrown (Time taken: 100.0041412s).
[2017/12/08 16:37:43.742 QuarkHttp.QNewHttpClient] DEBUG: URL: https://some.site/url	Method: Get

One point to note is that this issue occurs across atleast 5 different domains that I know of, but have never seen this issue anywhere in the past. It works as expected on the .NET framework 4.5.x,4.6.x,4.7.x and Mono 4.x.x/5.x.x. However, on .NET Core, the issue occurs very often (many times an hour in my application).

This is why I think this is a framework issue rather than my own error:
In the SendNewAsync method, the line HttpClient httpClientToUse = _defaultHttpClient; causes the reuse of the default HttpClient. When this line is changed to HttpClient httpClientToUse = CreateHttpClient((false, null)); (which is used to initialize the default HttpClient in the first place as seen in the constructor), a new HttpClient is used for every request. This makes things slower, but the issue disappears. It only occurs when the same HttpClient is reused.

I have no code that modifies the HttpClient in any way after it is initially created. In my specific application, the proxy and allowRedirects options are never changed, so the HttpClient that's used wasn't taken from _proxyHttpClients either.

I'm not sure how to debug this issue further. I could definitely test things if anyone has ideas.

My .NET Core version is 2.0.2, and it runs on Ubuntu 16.04.

@shravan2x
Copy link
Author

shravan2x commented Dec 8, 2017

As an update, the issue appears to occur mostly on requests that have fetchResponse set to false i.e. the HttpCompletionOption.ResponseHeadersRead option is used.

Though there have been a very few exceptions to this.

@Priya91
Copy link
Contributor

Priya91 commented Dec 8, 2017

@shravan2x Can you provide a short small repro app for the issue, with the exact combination of options for which this issue repros. The proxy stuff is not required for the repro right, going by your description, it would be great if only relevant code is there.

@shravan2x
Copy link
Author

shravan2x commented Dec 8, 2017

@Priya91 I'm trying to get one up, but haven't had success yet. Making the same request over and over (with HttpCompletionOption.ResponseHeadersRead) doesn't lead to the error even after 2000 requests. I believe I need a combination of factors to get this to work, but I'm really not sure which.

I've currently pushed a version of the app that replaces all common occurences of HttpCompletionOption.ResponseHeadersRead with HttpCompletionOption.ResponseContentRead. If the issue does not occur in a few hours, I can confirm that this is one aspect of the problem.

I'll post back when I have more information.

@shravan2x
Copy link
Author

shravan2x commented Dec 9, 2017

I was finally able to reproduce it!

It has nothing to do with the HttpCompletionOption. It appears to happen because of this:

  1. Send a request to xyz.com
  2. Wait some time (like 10m)
  3. Send another request to xyz.com
  4. An exception is thrown since the HttpClient tries to use the old connection but fails since it was already terminated due to some timeout.

It only happens on Linux.

The correct behavior would be for it to detect closed connections and recreate a new one. I remember @davidsh mentioning that .NET Core on linux uses a CURL wrapper, but Windows uses something else. That might lead to the differing behavior.

This explains why it didn't occur when I used a new HttpClient every time - no attempt was made to reuse old connections.

Reproduction:

HttpClient hc = new HttpClient();

while (true)
{
    Console.Write(DateTime.Now + " ...");
    await hc.GetAsync("https://api.steampowered.com/", (new CancellationTokenSource(TimeSpan.FromSeconds(30))).Token);
    Console.WriteLine("done");
    Console.ReadKey();
}

Run this, wait 5-6 minutes, then enter a key to send another request.

Result on Windows:

12/9/2017 2:12:06 AM ...done
12/9/2017 2:12:07 AM ...done
12/9/2017 2:12:08 AM ...done
12/9/2017 2:18:36 AM ...done

Result on Ubuntu 16.04:

12/9/17 8:12:02 AM ...done

12/9/17 8:12:10 AM ...done

12/9/17 8:12:10 AM ...done

12/9/17 8:18:34 AM ...
Unhandled Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsyncBuffered>d__58.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at TestLabCore.Experiments.Http.HttpClientTimeoutExperiment.<Play>d__4.MoveNext() in C:\...\Data.cs:line 28
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at TestLabCore.Program.<RunExperiment>d__1`1.MoveNext() in C:\...\Data.cs:line 25
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at TestLabCore.Program.<Main>(String[] args)
Aborted (core dumped)

The exception occurs after 30s, as expected from the CancellationToken we passed in.

P.S. I made a separate issue for the extra newlines in the Ubuntu output at https://github.com/dotnet/corefx/issues/25809.

@jo-ninja
Copy link
Contributor

jo-ninja commented Dec 9, 2017

I am also having the same problem but after I deploy it to Azure (Azure Container Service, Ubuntu 16.04).
In my case, I am using Docker For Windows and cannot reproduce it in my local environment. So it only happens in Azure.
I thought by using Docker I should be able to reproduce it in local as well ("It works on my machine!").... but it is not the case.
Probably also worth to mention that initially, I was using .NET Core 1.1.4, then I updated to the latest .NET Core 2.0.3.

Run this, wait 5-6 minutes, then enter a key to send another request.

In my case, wait for 4 minutes is enough to reproduce it. (TCP/IP TIME_WAIT 240 seconds?)

So, my current workaround (until the bug is fixed) is, remember last invocation time and then dispose+renew the HttpClient instance after 4 minutes as necessary, so it will still reuse the same instance within 4 minutes since the last invocation.

@shravan2x
Copy link
Author

@jo-ninja I'm not sure a fixed 4 minute window would always work, since (apart from .NET Core's own timeout) different servers may have different connection timeouts.

@shravan2x shravan2x changed the title HttpClient throws unexpected TaskCanceledException when the HttpClient is reused HttpClient throws unexpected TaskCanceledException when HttpClient is reused Dec 9, 2017
@Priya91
Copy link
Contributor

Priya91 commented Dec 11, 2017

@shravan2x Thanks for coming up with a more helpful repro. It sounds like a bug, we will investigate.

@shravan2x
Copy link
Author

@karelz @Priya91 This issue is fairly serious, preventing reuse of HttpClients (since there is no guarantee when a connection will be lost, and a full 100s delay occurs when it is). Could a fix be prioritized?

@geoffkizer
Copy link
Contributor

This does seem like a pretty serious issue.

@Priya91
Copy link
Contributor

Priya91 commented Dec 14, 2017

@shravan2x I tried this repro with latest changes from master on ubuntu1704 with curl 7.52.1, and was not able to repro it. Can you try with the latest master bits?

12/14/17 11:13:00 PM ...12/14/17 11:13:00 PM ...Before enqueue: 0
  Before enqueue: 0
  After enqueue: 1
  After enqueue: 1
  Multiagent new worker being created.
  Multiagent new worker being created.
  done
  done

  12/14/17 11:13:39 PM ...Before enqueue: 0
  After enqueue: 1
  Multiagent new worker being created.
  done

  12/14/17 11:28:43 PM ...Before enqueue: 0
  After enqueue: 1
  Multiagent new worker being created.
  done

@shravan2x
Copy link
Author

@Priya91 Does .NET Core use the version of CURL I can find with curl --version?

If not, how would I check this?

@tai-yi
Copy link

tai-yi commented Dec 15, 2017

@shravan2x
curl -V

I got the same issue.
I'm using docker container base on microsoft/aspnetcore:2.
Curl version of container is 7.52.1.

@shravan2x
Copy link
Author

I'm on 7.47.0.

@Priya91 If you can't reproduce it, it might be something that was fixed on master since at least 2.0.2. However, I can't compile and test at the moment.

@tai-yi
Copy link

tai-yi commented Dec 15, 2017

I changed the code, do not always new HttpClient, it turns well.

@karelz
Copy link
Member

karelz commented Dec 15, 2017

@Priya91 can you check your test on 2.0? It would be good to have a confirmation we are able to reproduce it on 2.0 and the fact we can't reproduce it on master (2.1) means, it is fixed.

@tiayi I don't follow - the code snippet above does not create new HttpClient instances. Just one.

@Priya91
Copy link
Contributor

Priya91 commented Dec 15, 2017

@shravan2x I was not able to repro this for 2.0 as well,

lasekar@ubuntu1704:~/httpclienthang$ dotnet --version
2.0.2

lasekar@ubuntu1704:~/httpclienthang$ dotnet run
12/15/17 11:37:44 PM ...done

12/15/17 11:37:47 PM ...done

12/15/17 11:52:45 PM ...done

12/15/17 11:54:36 PM ...done

12/15/17 11:54:38 PM ...done

@Priya91
Copy link
Contributor

Priya91 commented Dec 15, 2017

@shravan2x Can you share more information about your runtime environment, which linux distro version are you on, is it a docker container? Can you try your code on updated version of curl?

@shravan2x
Copy link
Author

@Priya91

Runtime Environment: .NET Core 2.0.2 (A regular installation, not preview or anything)
Linux: Ubuntu 16.04.3 LTS
Is Container: No

I can't try an updated CURL until next Wednesday or so.

@Priya91
Copy link
Contributor

Priya91 commented Dec 18, 2017

I was not able to repro on a vanilla ubuntu16.04 vm with .netcore 2.0.2

lasekar@ubuntu1604:~/httpclientperf$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

lasekar@ubuntu1604:~/httpclientperf$ dotnet --version
2.0.2

lasekar@ubuntu1604:~/httpclientperf$ dotnet run
12/18/17 8:30:39 PM ...done

12/18/17 8:30:41 PM ...done

12/18/17 9:00:52 PM ...done

12/18/17 9:00:57 PM ...done

@shravan2x Are you able to reproduce on other ubuntu16.04 vms? Looks like your machine could be in corrupted state or someother issue is interferring with your requests.

@shravan2x
Copy link
Author

@Priya91 I've tested on 3 VMs so far. All of them have the issue. It might have something to do with the fact that I upgraded most of these from the 1.0.0 preview versions. I'll try on a vanilla VM and post back.

@danielmarbach
Copy link
Contributor

I was able to reproduce a similar issue with the AWS SDK using HttpClient underneath on Ubuntu 16.04. Details can be found aws/aws-sdk-net#796

@karelz
Copy link
Member

karelz commented Dec 19, 2017

@danielmarbach why do you think it is the same problem? There is no concurrency in this case.
Were you able to reproduce the same problem with exactly identical code as posted here?

@danielmarbach
Copy link
Contributor

I was not able to reproduce it on the same machine with the code provided here and not even when taking concurrency into account. When you look at the stack trace the exception is also raised from FinishSendAsyncUnbuffered. I was thinking they might be related.

@Priya91
Copy link
Contributor

Priya91 commented Dec 19, 2017

When you look at the stack trace the exception is also raised from FinishSendAsyncUnbuffered

Almost all exceptions will be from there, that's the method that awaits the execution of the send task that sends the request.

@shravan2x
Copy link
Author

@Priya91 @karelz I'm not able to reproduce this on a vanilla VM. The issue still occurs on my existing machines. I will try to update to 2.0.4 and report back.

@karelz
Copy link
Member

karelz commented Jan 2, 2018

I don't think 2.0.4 will make any difference for your case.
It seems to be configuration/environmental problem. Until you can identify what exactly that is or someone else is able to reproduce the problem, sadly there is nothing we can do on our side :(

Closing for now. Feel free to reopen if there is actionable repro / more people hitting the problem.

@karelz karelz closed this as completed Jan 2, 2018
@shravan2x
Copy link
Author

@karelz One thing is that I have I side-by-side installation of dotnet 1.1.4 on the VMs with the issue. I'll try removing to see if that makes a difference.

@karelz
Copy link
Member

karelz commented Jan 2, 2018

@shravan2x there may be even more machine-wide configurations. Removing things one by one is usually good strategy to understand how to build a new VM from scratch to reproduce it. If you succeed, just post more details and let us know, or reopen the issue.

@karelz
Copy link
Member

karelz commented Jan 16, 2018

It is weird/suspicious that step [2] is needed to reproduce it.
If you have steps that lead to repro on any new VM and you can share them, we could take a look.

@1iveowl
Copy link

1iveowl commented Feb 1, 2018

Any update or work-around on this?

I'm experiencing this issue with dotnet core runtime v2.0.4 running on Raspberry Pi with Debian 9 (Stretch).

Message: Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Net.Http.HttpClient.d__58.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

@shravan2x
Copy link
Author

shravan2x commented Feb 1, 2018

@1iveowl I've been trying to find a workable repro for months with no luck.

I just switched to Mono for now. It seems to work correctly.

@karelz Could we leave the issue open to let people know that it hasn't been fixed?

@karelz
Copy link
Member

karelz commented Feb 1, 2018

@1iveowl if you have a repro, or something we can reproduce here and investigate, we would be happy to look at it as I mentioned above.
The callstack is sadly not sufficient and @shravan2x's repro/steps were insufficient for us to reproduce locally and investigate further.

@1iveowl
Copy link

1iveowl commented Feb 1, 2018

@karelz It's a tough nut. I tried to create dumbed down repo, but I can only provoke the error when I try and connect to a web service running on a particular server running in my local environment. The server is not exposed on the internet.

@karelz
Copy link
Member

karelz commented Feb 1, 2018

Is it possible to try to replicate the server setup somewhere on the internet (e.g. Azure)? We could run small repro against existing server.
The other option is to investigate it on your side - catch it under debugger, instrument CoreFX code to add additional logging, etc.

@danielmarbach
Copy link
Contributor

Would it be so far off to try my AWS repo? It might still lead to some insights

@wfurt
Copy link
Member

wfurt commented Feb 1, 2018

It should not matter where it is hosted as long as we can reproduce the problem. It is primarily about the client anyway.

@karelz
Copy link
Member

karelz commented Feb 1, 2018

@danielmarbach AFAIK your repro used AWS SDK - that is something I would like to remove from the repro (removing the chance it is some weirdness introduced by the SDK).
If the server is hosted in AWS or Azure does not matter as soon as the main app to debug is local repro (with "minimal" dependencies).

@cnblogs-dudu
Copy link
Contributor

cnblogs-dudu commented Feb 27, 2018

We encountered the same issue on Windows Server 2008 R2.

System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsyncBuffered>d__58.MoveNext()

@lukazh
Copy link

lukazh commented Mar 14, 2018

I reproduced the issue on a ubuntu 16.04 x64 vm with runtime 2.0.6 and sdk 2.1.101.

The issue came out with the default completionOption parameter of HttpClient.GetAsync method which is HttpCompletionOption.ResponseContentRead. When I changed it to HttpCompletionOption.ResponseHeadersRead, everything goes fine.

It seems to have something to do with the call of LoadIntoBufferAsync on the response's content in HttpClient.FinishSendAsyncBuffered method but I have not figured it out how. The direct cause of the timeout should be that Interop.Http.MultiInfoRead in CurlHandler.WorkerBodyLoop method always returns false when called after last request for 4-5 min (LoadIntoBufferAsync has been called at least once).

@karelz
Copy link
Member

karelz commented Mar 14, 2018

@lukazh do you have a simplified repro you can share?

@lukazh
Copy link

lukazh commented Mar 14, 2018

@karelz FYI. No big difference from what shravan2x provided.

static HttpClient hc = new HttpClient();

static void Main(string[] args)
{
    while (true)
    {
        Get();
        Console.ReadKey();
    }
}

static async void Get()
{
    try
    {
        Console.Write(DateTime.Now + " ... ");
        var r = await hc.GetAsync("https://api.steampowered.com/", /*HttpCompletionOption.ResponseHeadersRead,*/ 
            (new CancellationTokenSource(TimeSpan.FromSeconds(30))).Token);
        Console.WriteLine("done");
    }
    catch(Exception e)
    {
        Console.WriteLine(e.Message);
        Console.WriteLine(e.StackTrace);
    }
}

I will try it out with 2.0.3 later to check if the issue comes out again.

@karelz
Copy link
Member

karelz commented Mar 14, 2018

@lukazh we were not able to reproduce the original problem in house or on vanilla VM (see https://github.com/dotnet/corefx/issues/25800#issuecomment-352556764 and https://github.com/dotnet/corefx/issues/25800#issuecomment-354729309).
If you have a small repro that reproduces on vanilla install, please let us know. We can try to reproduce it again.

@shravan2x
Copy link
Author

@karelz If this is too difficult to reproduce on a vanilla VM, perhaps we could try to reproduce in a container, freeze it and upload?

@karelz
Copy link
Member

karelz commented Mar 14, 2018

@shravan2x that is a good idea. It would be still good to minimize the steps which are installed / executed on the container prior to the repro.
I am very curious how @lukazh's repro steps look like for Ubuntu VM.

@lukazh
Copy link

lukazh commented Mar 15, 2018

I ran some more tests on the same VM with different URLs and the exception never came out. Even with the URL I used to repro it can't ensure to run into the issue.
I captured the requests sent to https://api.steampowered.com/. The request that sent first established a connection with the server and the others reused the connection. After waiting for a few minutes, when the issue is likely to be encountered, a request was sent out, which tried also to reuse the connection. However, due to the time passed, the connection's status had already changed no matter how and the server just hung there for a while(a few seconds, unstable) after sending a ACK, and then sent a RST leading to a new connection. If the Timeout of CancellationToken of the task was shorter than the idle time, the task would be canceled and TaskCanceledException would be thrown. So I guess the cause of this issue is that the request is indeed incomplete within the given time for some reasons(network situation, OS environment, etc) instead of HttpClient itself. @shravan2x maybe you can make the timeout a little longer to see if the server will eventually send response.
BTW, the reason why it runs well when completionOption set to HttpCompletionOption.ResponseHeadersRead is that each request establishes a new connection for its own, which is also weird. @karelz do you know something about this?

@shravan2x
Copy link
Author

@lukazh My timeout was 100s for an endpoint that usually completes <1s. I'm fairly confident waiting longer wouldn't change much.

@sepehr1014
Copy link

Any updates on this issue?

@dshevani
Copy link

dshevani commented May 17, 2018

Any updates here. Our API service (ASP.Net API) runs on Azure Classic App Service and out of every 100000 request 200 request fail with Internal Server Error with same exception. Please assist

TaskCanceledException: A task was canceled.. ExceptionDetails : System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Web.Http.Controllers.ApiControllerActionInvoker.<InvokeActionAsyncCore>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__0.MoveNext()

[EDIT] Format call stack by @karelz

@karelz
Copy link
Member

karelz commented May 17, 2018

@dshevani your callstack seems to be from .NET Framework (System.Web.* usage), not from .NET Core. Moreover it does not seem to use HttpClient at all, so it is likely not related to this issue. For .NET Framework help, you can use community channels (like StackOverflow), or if you have a repro / specific bug report, then VS Developer Community.

@sepehr1014 in general, we are still blocked on getting anything actionable here - repro, or deeper analysis/debugging from a repro.

@booradlus
Copy link

The team I work with has been observing this issue on a host of .NET Core services we have - all of them have had the 100s timeout on various calls with no correlation to call size and minimal correlation to load. We also experienced the frustration of low reproducibility and inability to get the issue to occur in our dev environments. I suspect part of that stemmed from developing on Windows boxes, but our services are running in containers on Linux.

Luckily we have seen the issue resolve when we built against .NET Core 2.1 RC. I haven't had the time to completely isolate the issue, but since was present when we built against 2.1 Preview 1, we believe it was related to the fact that since preview 2, the default is to use the SocketsHttpHandler instead of the native handler (which uses libcurl). After almost 3 days running so far we haven't seen any timeouts, so we're pretty confident it's resolved.

@karelz
Copy link
Member

karelz commented May 17, 2018

@booradlus that's great news, thanks for confirmation that 2.1 RC helps!

@filipw
Copy link
Contributor

filipw commented May 24, 2018

@karelz I had the exact same issue on Debian Stretch Docker images. Roughly 1-3 calls in a 1000 from a static HttpClient would randomly fail with the exact same Task Canceled / timeout exception, with the same stacktrace (ending with FinishSendAsyncBuffered).

The HTTP call would never leave the box, so the target service never even recorded an incoming request. It was happening fairly consistently, but impossible to reproduce on demand though, and appeared to happen with no apparent behavior pattern.

The problem disappeared after upgrading to the .NET Core 2.1 RC runtime, so it might have been some issue with the curl handler.

@wfurt
Copy link
Member

wfurt commented May 24, 2018

do you have stanalone simple repro @filipw ? Something I can just run and get repro?
As far as the handler: You can force it back and forth via environmental variable.

@karelz
Copy link
Member

karelz commented May 24, 2018

Thanks @filipw for letting us know it does not reproduce for you on .NET Core 2.1 RC anymore!

The root-cause might have been in libcurl (specific version), or how we interact with it in CurlHandler. We introduced SocketsHttpHandler (default in 2.1) to help us address exactly this class of problems and make behaviors on all platforms uniform. It's good to see it is paying off.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@msftbot msftbot bot locked as resolved and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http bug os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests