HttpClient throws an exception or deadlocks when server closes the connection #71549

MageFroh · 2022-07-01T17:06:09Z

Description

Hi there,

Just noticed a surprising behaviour in HttpClient when a server rejects a large request by closing the connection early.

This happens for example with an ASP.NET Core server when the request body is greater than 30MB: it seems the server side sends a 400 response early and immediately closes the connection before the whole request body is transmitted.

Reproduction Steps

Clone https://github.com/MageFroh/HttpClientTimeout
Start the WebApplication
Execute the console app

Or, in words:
Use HttpClient to POST a request body of more than 30MB to an ASP.NET Core Web application.

Expected behavior

The Web app console shows that 400 was returned to a client.

The console app shows:

Status code: BadRequest
{"type":"https://tools.ietf.org/html/rfc7231#section-6.5.1","title":"One or more validation errors occurred.","status":400,"traceId":"00-eaecec73d4e64df8784aacfcdb405a74-ab12efe65e517633-00","errors":{"":["Failed to read the request form. Request body too large. The max request body size is 30000000 byte
s."]}}

Actual behavior

The Web app side always shows the expected behaviour.

The console app behaviour is not deterministic.

When the WebApp is started on Windows natively, it reports an exception from PostAsync all the time:

Unhandled exception. System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.

When the WebApp is started in Docker on Windows:

The first run of the console app seems to dead lock something in the HttpClient, which eventually reports a read timeout after 100 seconds:

Unhandled exception. System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
 ---> System.TimeoutException: The operation was canceled.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException: Unable to write data to the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.

Subsequent runs result in the expected behaviour.

Regression?

Not entirely sure - I've noticed some closed issues in this area but they might not be exactly the same.

Known Workarounds

Set client.Timeout to a TimeSpan smaller than 100 seconds.
This might not be acceptable for some apps that need to deal with poor network conditions though.

Configuration

No response

Other information

I don't mind much seeing an HttpRequestException instead of a 400 response on the client side: we loose some good info in the response body but that's not a big deal.
But sometimes the client blocks for 100 seconds by default! This is much more annoying...

The reproducer repo I've sent seems to hang only for the first request. But I've seen other Web app that cause the hangs much more often - maybe this is timing related?

Our system uses HTTPS, so the reproducer uses HTTPS.
After further checks, it looks like there is no difference of behaviour between HTTP and HTTPS.

Edit: Forgot to mention that Swagger UI in a browser such as Firefox reports the 400 and its body properly.

The text was updated successfully, but these errors were encountered:

ghost · 2022-07-01T17:06:22Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Hi there,

Just noticed a surprising behaviour in HttpClient when a server rejects a large request by closing the connection early.

This happens for example with an ASP.NET Core server when the request body is greater than 30MB: it seems the server side sends a 400 response early and immediately closes the connection before the whole request body is transmitted.

Reproduction Steps

Clone https://github.com/MageFroh/HttpClientTimeout
Start the WebApplication
Execute the console app

Or, in words:
Use HttpClient to POST a request body of more than 30MB to an ASP.NET Core Web application.

Expected behavior

The Web app console shows that 400 was returned to a client.

The console app shows:

Status code: BadRequest
{"type":"https://tools.ietf.org/html/rfc7231#section-6.5.1","title":"One or more validation errors occurred.","status":400,"traceId":"00-eaecec73d4e64df8784aacfcdb405a74-ab12efe65e517633-00","errors":{"":["Failed to read the request form. Request body too large. The max request body size is 30000000 byte
s."]}}

Actual behavior

The Web app side always shows the expected behaviour.

The console app behaviour is not deterministic.

When the WebApp is started on Windows natively, it reports an exception from PostAsync all the time:

Unhandled exception. System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.

When the WebApp is started in Docker on Windows:

The first run of the console app seems to dead lock something in the HttpClient, which eventually reports a read timeout after 100 seconds:

Unhandled exception. System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
 ---> System.TimeoutException: The operation was canceled.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException: Unable to write data to the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.

Subsequent runs result in the expected behaviour.

Regression?

Not entirely sure - I've noticed some closed issues in this area but they might not be exactly the same.

Known Workarounds

Set client.Timeout to a TimeSpan smaller than 100 seconds.
This might not be acceptable for some apps that need to deal with poor network conditions though.

Configuration

No response

Other information

I don't mind much seeing an HttpRequestException instead of a 400 response on the client side: we loose some good info in the response body but that's not a big deal.
But sometimes the client blocks for 100 seconds by default! This is much more annoying...

The reproducer repo I've sent seems to hang only for the first request. But I've seen other Web app that cause the hangs much more often - maybe this is timing related?

Our system uses HTTPS, so the reproducer uses HTTPS.
After further checks, it looks like there is no difference of behaviour between HTTP and HTTPS.

Author:	MageFroh
Assignees:	-
Labels:	`area-System.Net.Http`, `untriaged`
Milestone:	-

wfurt · 2022-07-01T17:57:26Z

I think this will depend on underlying TCP. When server their side TCP will stain in half-open state for a while. Eventually it may be closed and peer will send RST and pending write will fail.
However in the Docker example the underlying error seems to be wrapped in TimeoutException.

To get more predictable results, I would suggest to set 100Continue header.

MageFroh · 2022-07-04T08:09:53Z

Thanks for the tip: I've added the following and properly observe the server response in all cases:

client.DefaultRequestHeaders.Expect.Add(new NameValueWithParametersHeaderValue("100-continue"));

Is it an actual fix, or just a workaround until investigations in the lock are complete?
It's a bit surprising to have the client wait for the whole duration of the read timeout when the connection has been closed after just a few milliseconds. It would be good to be confident it will not happen in other cases we have not encountered yet.

wfurt · 2022-07-07T17:14:05Z

There is no easy answer. The 100-continue gives server nice option to reduce large payload upfront. Without it it will IMHO remain prune to reach conditions and to how network error bubbles up.

wfurt · 2022-07-14T15:58:21Z

triage: we should look if we can surface some better error - like the error response even if we failed to write the body.

ghost added the untriaged New issue has not been triaged by the area owner label Jul 1, 2022

dotnet-issue-labeler bot added the area-System.Net.Http label Jul 1, 2022

wfurt removed the untriaged New issue has not been triaged by the area owner label Jul 14, 2022

wfurt added this to the Future milestone Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HttpClient throws an exception or deadlocks when server closes the connection #71549

HttpClient throws an exception or deadlocks when server closes the connection #71549

MageFroh commented Jul 1, 2022 •

edited

ghost commented Jul 1, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

wfurt commented Jul 1, 2022

MageFroh commented Jul 4, 2022

wfurt commented Jul 7, 2022

wfurt commented Jul 14, 2022

HttpClient throws an exception or deadlocks when server closes the connection #71549

HttpClient throws an exception or deadlocks when server closes the connection #71549

Comments

MageFroh commented Jul 1, 2022 • edited

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

ghost commented Jul 1, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

wfurt commented Jul 1, 2022

MageFroh commented Jul 4, 2022

wfurt commented Jul 7, 2022

wfurt commented Jul 14, 2022

MageFroh commented Jul 1, 2022 •

edited