Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SocketException - An existing connection was forcibly closed by the remote host. HttpClientFactory HttpClient #52267

Closed
punitsinghi opened this issue May 4, 2021 · 40 comments
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@punitsinghi
Copy link

punitsinghi commented May 4, 2021

We have Azure Function / API which calls On Premise API and we are seeing intermittently SocketException type with below error message -
An error occurred while sending the request. Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.. An existing connection was forcibly closed by the remote host.

Our Azure function / API are in .NET core 3.1 and uses HttpClientFactory to create httpclient. We use this httpclient to call our On Premise API. We earlier had using block when we called CreateClient, which we have removed now but we don't think this will help to avoid SocketException issue. Based on wireshack report we see reset packet are sent by server when this error happens which means the server is closing the connection and httpclient is using the same connection to make new requests which could cause this issue. Basically if the new HTTP request and the TCP reset are in flight across the network at the same, I think it'll manifest as a socket error and thats kind of a race condition. We are implementing Retry logic so we are hoping that in next retry request will success given a new connection is created. I would like to know that is this a known issue on httpclientfactory as we don't see this issue on .NET framework side when we use static httpclient. We have on premise apis which uses static httpclient to call other on premise apis and we don't see this issue. Would like to know if anything we do change on our Azure Infra side to reduce such issues.

var httpClient = _httpClientFactory.CreateClient();
var result = await httpClient.SendAsync(httpRequestMessage, httpCompletionOption, 
    cancellationToken).ConfigureAwait(false);
return result;

AppInsights also show ResultCode as Faulted which means it didn't receive the http response from server.

Stack trace-

System.Net.Http.HttpRequestException:
   at System.Net.Http.HttpConnection+<SendAsyncCore>d__53.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Http.HttpConnectionPool+<SendWithNtConnectionAuthAsync>d__48.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Http.HttpConnectionPool+<SendWithRetryAsync>d__47.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Http.RedirectHandler+<SendAsync>d__4.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Http.DiagnosticsHandler+<SendAsync>d__2.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
Inner exception System.IO.IOException handled at System.Net.Http.HttpConnection+<SendAsyncCore>d__53.MoveNext:
   at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.ThrowException (System.Net.Sockets, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.GetResult (System.Net.Sockets, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Security.SslStream+<<FillBufferAsync>g__InternalFillBufferAsync|215_0>d`1.MoveNext (System.Net.Security, Version=4.1.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Security.SslStream+<ReadAsyncInternal>d__214`1.MoveNext (System.Net.Security, Version=4.1.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Http.HttpConnection+<FillAsync>d__87.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Http.HttpConnection+<ReadNextResponseHeaderLineAsync>d__84.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Http.HttpConnection+<SendAsyncCore>d__53.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
Inner exception System.Net.Sockets.SocketException handled at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.ThrowException:
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net.Http untriaged New issue has not been triaged by the area owner labels May 4, 2021
@ghost
Copy link

ghost commented May 4, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

We have Azure Function / API which calls On Premise API and we are seeing intermittently SocketException type with below error message -
An error occurred while sending the request. Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.. An existing connection was forcibly closed by the remote host.

Our Azure function / API are in .NET core 3.1 and uses HttpClientFactory to create httpclient. We use this httpclient to call our On Premise API. We earlier had using block when we called CreateClient, which we have removed now but we don't think this will help to avoid SocketException issue. Based on wireshack report we see reset packet are sent by server when this error happens which means the server is closing the connection and httpclient is using the same connection to make new requests which could cause this issue. Basically if the new HTTP request and the TCP reset are in flight across the network at the same, I think it'll manifest as a socket error and thats kind of a race condition. We are implementing Retry logic so we are hoping that in next retry request will success given a new connection is created. I would like to know that is this a known issue on httpclientfactory as we don't see this issue on .NET framework side when we use static httpclient. We have on premise apis which uses static httpclient to call other on premise apis and we don't see this issue. Would like to know if anything we do change on our Azure Infra side to reduce such issues.

    var httpClient = _httpClientFactory.CreateClient();
      var result = await httpClient.SendAsync(httpRequestMessage, httpCompletionOption, 
                               cancellationToken).ConfigureAwait(false);
        return result;          

AppInsights also show ResultCode as Faulted which means it didn't receive the http response from server.

Stack trace-
System.Net.Http.HttpRequestException:
at System.Net.Http.HttpConnection+d__53.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Net.Http.HttpConnectionPool+<SendWithNtConnectionAuthAsync>d__48.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Net.Http.HttpConnectionPool+d__47.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Net.Http.RedirectHandler+<SendAsync>d__4.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Net.Http.DiagnosticsHandler+d__2.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
Inner exception System.IO.IOException handled at System.Net.Http.HttpConnection+d__53.MoveNext:
at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.ThrowException (System.Net.Sockets, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.GetResult (System.Net.Sockets, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Threading.Tasks.ValueTask1.get_Result (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Net.Security.SslStream+<g__InternalFillBufferAsync|215_0>d1.MoveNext (System.Net.Security, Version=4.1.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Threading.Tasks.ValueTask1.get_Result (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Net.Security.SslStream+<ReadAsyncInternal>d__2141.MoveNext (System.Net.Security, Version=4.1.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.Tasks.ValueTask1.get_Result (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Net.Http.HttpConnection+d__87.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Net.Http.HttpConnection+d__84.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.Tasks.ValueTask1.get_Result (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Net.Http.HttpConnection+d__53.MoveNext (System.Net.Http, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
Inner exception System.Net.Sockets.SocketException handled at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.ThrowException:

Author: punitsinghi
Assignees: -
Labels:

area-System.Net.Http, untriaged

Milestone: -

@antonfirsov
Copy link
Member

@punitsinghi any chance you can share those packet packet captures here or privately?

@punitsinghi
Copy link
Author

punitsinghi commented May 5, 2021

@punitsinghi any chance you can share those packet packet captures here or privately?

Without IP Address, here is how it looks. Will this help? With IP address, I will have to check with our Infra team.
image

@punitsinghi
Copy link
Author

@punitsinghi any chance you can share those packet packet captures here or privately?

Without IP Address, here is how it looks. Will this help? With IP address, I will have to check with our Infra team.
image

@karelz - Can you please let me know what additional details needed.

@karelz
Copy link
Member

karelz commented May 6, 2021

@punitsinghi I am not sure the above screenshot is sufficient to confirm your suspicion. I let @antonfirsov check it out and say more.
If there is truly race condition and connection is closed by server while we are sending a request, I don't think anything can be done and surfacing the exception is the best we can do. In such case, I would expect that the difference between .NET Framework and .NET Core is just timing. Not necessarily a fault in HttpClient or HttpClientFactory.

@antonfirsov
Copy link
Member

For me this screenshot is not very helpful, would be great to see the whole communication sequence with timestamps, to help detailed diagnosis.

If there is truly race condition and connection is closed by server while we are sending a request, I don't think anything can be done and surfacing the exception is the best we can do.

If we can confirm that this race condition exists, and we want to support concurrent HTTP requests to a server that may (for any reason) terminate the connections, we may try to fix this by detecting SocketError.ConnectionReset, and retrying the request on a new connection. Not something I would do for 6.0 though.

@punitsinghi if I was you, I would try to get some insights about the server if possible to understand why is it closing down the connections.

@karelz
Copy link
Member

karelz commented May 6, 2021

Triage: Well behaving servers should send "connection: close" in the response headers. The server should never reset the connection (unless client takes too long to disconnect).
If the above suspicion is true, then it points towards misbehaving server ... we need packet capture traces to confirm that.

@punitsinghi
Copy link
Author

punitsinghi commented May 6, 2021

@karelz @antonfirsov Thanks for the input. Sure let me see if I can pass on the screenshot privately which will have all the details

we have the idletimeout on client and server set to 120 seconds. Based on my understanding that httpclientfactory will use PooledConnectionIdleTimeout and that is 2 minutes while on our IIS it is also 120 seconds. Do you think in this scenario client can take too long to disconnect. https://docs.microsoft.com/en-us/dotnet/api/system.net.http.socketshttphandler.pooledconnectionidletimeout?view=netcore-3.1

Also is that pretty normal that Azure to On Premise API calls get such Connection close due to delay in network.

@jhudsoncedaron
Copy link

@karelz : Having had to write a custom HTTP client I say there's something you can do. If you get the "reset" error while uploading a request (before you start waiting for the response), connect anew and try again. Don't loop though; only use this logic on a connection you got back from the pool.

@karelz
Copy link
Member

karelz commented May 7, 2021

@punitsinghi

Do you think in this scenario client can take too long to disconnect.

By taking too long to disconnect I meant reaction to "connection:close". If both server and client have 2 min timeout for closing connections, then it is possible that server will send the rude closure. The question is how likely it is that exactly at that time client is trying to reuse the connection. That should be pretty rare IMO.

Also is that pretty normal that Azure to On Premise API calls get such Connection close due to delay in network.

In that case users should get used to react to failures like these with retry policies, etc.

@jhudsoncedaron do you mean to close the connection if we didn't finish sending the request yet? It also means, we would have to buffer the request body, which we don't do today AFAIK.
It would complicate things a bit. Not sure if it is worth it ... thoughts @geoffkizer @scalablecory @ManickaP?

@ManickaP
Copy link
Member

ManickaP commented May 7, 2021

We already have a mechanism to know whether we can retry or not (_canRetry field) and we do take advantage of that (if the request body sending hasn't been started while the connection was closed).

Also note that whether the request can be automatically retried depends on whether the method is idempotent or not: https://tools.ietf.org/html/rfc7230#section-6.3.1.

I'd need more details on the exception and the request to say whether we even could retry it or not. Because your case might not be automatically retry-able.

Edit: My thoughts are that this probably is not worth it. AFAIK only PUT would be a retry-able request with a body (I know, other methods can have body as well, but usually don't and I'm generalizing here). We would need to introduce request body buffering and keep it in memory until we get the response. If we entertain this idea it would definitely have to be opt-in since it can have detrimental effect on performance. IMHO this is a lot of work for a small gain. If retrying the request in all exceptional cases is your goal, you can always do that in your code on top of HttpClient.

@jhudsoncedaron
Copy link

jhudsoncedaron commented May 7, 2021

I actually mean (pseudo-code):

try {
    await connection.WriteAsync(fist packet data)
} catch (SocketException|IOException) when (_pooledConnection) {
    connection = GetConnection()
    _pooledConnection = false;
    await connection.WriteAsync(fist packet data)
}

I ran some experiements in .NET Core 3.0 to try to convince another vendor that network instability is a real thing and they can't assume that I got the HttpResponse just because they sent it, and found the recovery logic wasn't quite this good.

@punitsinghi
Copy link
Author

@jhudsoncedaron Can you please let me know you full custom httpclient implementation

@scalablecory
Copy link
Contributor

Based on wireshack report we see reset packet are sent by server when this error happens which means the server is closing the connection and httpclient is using the same connection to make new requests which could cause this issue.

We already retry requests when we can verify that they haven't been processed, including when a reset happens on an idle connection between requests.

So, this is likely a reset mid-request, where we can't verify the server hasn't processed the request. In this case only safe thing to do is to throw. This isn't something we'll be able to change. A mid-request connection reset is generally not correct behavior. A server would do this if it sees a protocol error, a malicious client, or maybe if server-side threw an exception. Consider checking server/firewall logs to understand the root cause here.

Since you are already using HttpClientFactory, consider using Polly to automatically retry if you know that your request is safe for the server to process twice.

I would like to know that is this a known issue on httpclientfactory as we don't see this issue on .NET framework side when we use static httpclient.

Can you grab a wireshark capture of identical requests on both .NET Framework and .NET Core so we can see what the difference is between the two platforms?

@punitsinghi
Copy link
Author

punitsinghi commented May 11, 2021

Thanks guys for the input. @karelz / @scalablecory / @antonfirsov - I can share the wireshack privately. Can you please let us know how can I do that.

@scalablecory - Thanks for the details. So on a reset due to idle timeout, it should have tried thrice, looking at the code?. Yeah we have implemented Polly and we are currently doing retry on HttpRequestException as some of our legacy APIs are idempotent and some are not. Does HttpRequestException guarantees that server has not processed the request specially on .NET core as on timeout it will throw TaskCanceledException for. NET core apps. We are trying to see will it be safe to retry on such cases.

@jhudsoncedaron
Copy link

@punitsinghi : No. I was able to determine whether or not it was resumable from the stacktrace, but refused to write that code.

@karelz
Copy link
Member

karelz commented May 11, 2021

@punitsinghi our emails are linked from our GH profiles. If you want more official way, you would have to go through official Microsoft support.

@punitsinghi Does HttpRequestException guarantees that server has not processed the request specially on .NET core as on timeout it will throw TaskCanceledException for. NET core apps. We are trying to see will it be safe to retry on such cases.

No, we cannot provide any guarantees. As soon as we send any part of the request out, we can't tell if the server received it or processed it and if it made any action ... that seems to be your scenario per your description.
I don't think we provide any insights if even a part of the request made it out of the computer, or if the request was cancelled while waiting for available connection.

@punitsinghi
Copy link
Author

punitsinghi commented May 11, 2021

Thanks @karelz, we have started discussion through MS support as well. One of the engineer from MS support recommended to increase server idle timeout as the IIS idle time out is 120 seconds (default) and pooledconnectionidletimeout is also 120 seconds and due to a network delay between Azure and On Premise we can run into this issue, but based on @scalablecory reply, I think this may not help as httpclient would have done retry on connection reset due to idle timeout and we still see the issue happening. If you don't mid reviewing the logs from wireshack once and let us know your feedback than we can go back and provide feedback to MS support. We are trying to find the root cause which we haven't found it and your help will be greatly appreciated.

https://docs.microsoft.com/en-us/dotnet/api/system.net.http.socketshttphandler.pooledconnectionidletimeout?view=netcore-3.1

@scalablecory - Talked to our Network engineer and according to him that reset mid-request can happen on Idle timeout as well.

@karelz
Copy link
Member

karelz commented May 12, 2021

If you don't mid reviewing the logs from wireshack once and let us know your feedback than we can go back and provide feedback to MS support.

Please work with your MS support contact to upload the Wireshark traces to Microsoft. They can take a first look and they can contact us internally if needed. Thanks!

@punitsinghi
Copy link
Author

Ok Thanks @karelz.

@geoffkizer
Copy link
Contributor

BTW, the server should try to avoid sending RST if possible in these sorts of cases. From the RFC:

If a server performs an immediate close of a TCP connection, there is
   a significant risk that the client will not be able to read the last
   HTTP response.  If the server receives additional data from the
   client on a fully closed connection, such as another request that was
   sent by the client before receiving the server's response, the
   server's TCP stack will send a reset packet to the client;
   unfortunately, the reset packet might erase the client's
   unacknowledged input buffers before they can be read and interpreted
   by the client's HTTP parser.

   To avoid the TCP reset problem, servers typically close a connection
   in stages.  First, the server performs a half-close by closing only
   the write side of the read/write connection.  The server then
   continues to read from the connection until it receives a
   corresponding close by the client, or until the server is reasonably
   certain that its own TCP stack has received the client's
   acknowledgement of the packet(s) containing the server's last
   response.  Finally, the server fully closes the connection.

https://www.rfc-editor.org/rfc/rfc7230#section-6.6

@geoffkizer
Copy link
Contributor

One of the engineer from MS support recommended to increase server idle timeout as the IIS idle time out is 120 seconds (default) and pooledconnectionidletimeout is also 120 seconds and due to a network delay between Azure and On Premise we can run into this issue, but based on @scalablecory reply, I think this may not help as httpclient would have done retry on connection reset due to idle timeout and we still see the issue happening.

I think it's worth it to try adjusting both of these values (server/client idle connection timeout). In particular, try making the client connection timeout significantly less than the server timeout, e.g. 120 and 60 or whatever.

It's always better for the client to close an idle connection, because when the server does it there's the timing issue you are seeing here.

@geoffkizer
Copy link
Contributor

No, we cannot provide any guarantees. As soon as we send any part of the request out, we can't tell if the server received it or processed it and if it made any action ... that seems to be your scenario per your description.

There is a 408 status code that could be used in scenarios like these, see here: https://www.rfc-editor.org/rfc/rfc7231.html#section-6.5.7

However, servers don't seem to implement this for idle connection close (as seen here in particular), and we don't support it in HttpClient. Perhaps we should consider that, though it would only help if the server participates as well.

@halter73
Copy link
Member

halter73 commented May 12, 2021

This looks like you're hitting IIS's connectionTimeout which defaults to 2 minutes. This has the same default value as SocketsHttpHandler.PooledConnectionIdleTimeout which is what makes this race possible. Basically, right at the 2 minute mark, the HttpClient might still consider the connection active right when IIS decides to close it.

This can be fixed with retry logic, but to avoid this error altogether you should be able to set the client's PooledConnectionIdleTimeout to less than the server's connectionTimeout.

@System.Net Discussion FWIW, Kestrel also uses 2 minutes as its KeepAliveTimeout. I think it might make sense to either increase our servers' default timeouts or reduce our clients' default timeout. Not sure which would be easier. Does anyone else agree or disagree?

This is from my email about what I'm assuming is the same issue, but it seems like this would be better discussed in public.

BTW, the server should try to avoid sending RST if possible in these sorts of cases. From the RFC:

Kestrel use to try to do this, but it was more trouble than it was worth. Many clients would wait entirely-too-long to close their half of the connection, and I cannot think of a single scenario were Kestrel would close the connection while the client was still trying to send data that wasn't really supposed to be abortive. HTTP clients don't typically pipeline and Kestrel sets the appropriate Connection: close headers in the rare cases it does decide to close the connection. HTTP/2 similarly has ways to gracefully close the connection without relying on TCP semantics.

In this scenario, it wouldn't help anyway. Even if the connection was only half closed, the client still would not get a response to the new request.

@dotnet/ncl @dotnet/http I really think we need to consider changing server and/or client idle connection timeouts. With HTTP/1.1 it can cause problems like this when they're exactly the same. The server timeout should be larger by default.

@geoffkizer
Copy link
Contributor

I cannot think of a single scenario were Kestrel would close the connection while the client was still trying to send data that wasn't really supposed to be abortive.

I think the only interesting case is the race between the server deciding a connection is idle and closing it, and the client at the same time trying to send a new request on the connection.

If it's a GET request, or a POST with 100-continue, etc, then a FIN will trigger us to retry the request, whereas a RST generally will not. But then, if it's a POST without 100-continue, we will start sending the request body anyway and won't retry because of that.

What might help, as I mention above, is using the 408 response code, as that would allow us to retry in all cases.

@halter73
Copy link
Member

What might help, as I mention above, is using the 408 response code, as that would allow us to retry in all cases.

The server cannot respond with a 408 in this case, it's already closed its half of the connection. It cannot send more data. Theoretically, it could continue acking the request to avoid the RST, but what's the point when the last data it flushed happened at least 2 minutes ago?

@geoffkizer
Copy link
Contributor

@dotnet/ncl @dotnet/http I really think we need to consider changing server and/or client idle connection timeouts. With HTTP/1.1 it can cause problems like this when they're exactly the same. The server timeout should be larger by default.

Yes, I completely agree. What is the Kestrel default here? SocketsHttpHandler is 120 seconds.

@geoffkizer
Copy link
Contributor

The server cannot respond with a 408 in this case, it's already closed its half of the connection. It cannot send more data.

The idea would be to send a 408 response instead of closing an idle connection. It's not in response to a request; rather it's sent proactively in case the client is racing to use the connection at the same time.

That way, if the client receives a 408 response, it knows the server is shutting down the connection and didn't process the request at all, and the request can be retried.

@geoffkizer
Copy link
Contributor

It's basically GOAWAY for HTTP/1.1.

@jhudsoncedaron
Copy link

If you send an unsolicited 408 on an idle connection, you might block.

@geoffkizer
Copy link
Contributor

Block what?

@jhudsoncedaron
Copy link

The send() system call that sends the 408 might never finish.

@halter73
Copy link
Member

What is the Kestrel default here?

They're all 120 seconds. KestrelServerLimits.KeepAliveTimeout. I'll update my earlier comment to include the other links from the email.

The idea would be to send a 408 response instead of closing an idle connection. It's not in response to a request; rather it's sent proactively in case the client is racing to use the connection at the same time.

So the suggestion is to do this every time we close an HTTP/1.1 connection due to a keep-alive timeout? Theoretically, I guess the client should ignore this unsolicited response. Does any server actually do this though? Seems risky.

@geoffkizer
Copy link
Contributor

Does any server actually do this though? Seems risky.

I don't know. I'm not aware of any.

@geoffkizer
Copy link
Contributor

Many clients would wait entirely-too-long to close their half of the connection

FWIW, SocketsHttpHandler is definitely guilty of this in some cases. But we could make this better if there was a good reason to.

@punitsinghi
Copy link
Author

Thanks We are right now increasing our IIS idle time out to 150 seconds and see if that works in PROD.

@punitsinghi
Copy link
Author

We set IIS Connection Time Out to 240 seconds but we still see the similar error. Looks like the issue could be due to something else.

@karelz karelz added this to the 6.0.0 milestone May 25, 2021
@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions and removed needs more info untriaged New issue has not been triaged by the area owner labels May 25, 2021
@karelz karelz assigned geoffkizer and unassigned geoffkizer May 25, 2021
@karelz karelz removed this from the 6.0.0 milestone May 25, 2021
@karelz karelz added the untriaged New issue has not been triaged by the area owner label May 25, 2021
@karelz
Copy link
Member

karelz commented Jun 3, 2021

@punitsinghi do you have any update here?

@punitsinghi
Copy link
Author

@karelz - Thanks for the follow up, issue was due to F5 server, we applied patch from F5 last week and don't see the error now. Thanks everyone for the input and feedback.

@karelz
Copy link
Member

karelz commented Jun 8, 2021

Closing as external based on the last update.

@karelz karelz closed this as completed Jun 8, 2021
@karelz karelz added this to the 6.0.0 milestone Jun 8, 2021
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Jun 8, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

8 participants