TCP RST was sent with large response body #459

Closed
keimoon opened this Issue Nov 2, 2016 · 16 comments

Projects

Done in Bug Hunting

8 participants

@keimoon
keimoon commented Nov 2, 2016

Akka HTTP version: 3.0.0-RC1

How to reproduce:

  • Use small MTU for network interface (for example 1500). You can set MTU for loopback interface with sudo ifconfig lo0 mtu 1500
  • Response with large body content (should be larger than 128KiB) using complete and ToResponseMarshallable
  • Make sure client send header Connection: close.

Expected result:

Client should receive normally

Actual result:

Client got Recv failure: Connection reset by peer

This causes problems with some proxy like nginx or kong because they send Connection: close to upstream by default.

@ktoso
Member
ktoso commented Nov 2, 2016

Thanks for reporting. I'll mark as bug though I'm not sure about it.

@jrudolph
Member
jrudolph commented Nov 2, 2016

It could be like this: when the server sees Connection: close on a request it fully closes the connection directly after having sent out the last bit of the response which might close the socket too soon?

@keimoon which client did you use? Could you try to capture a network trace using tcpdump and attach it here? Thanks ;)

@jrudolph jrudolph added this to the backlog milestone Nov 2, 2016
@keimoon
keimoon commented Nov 2, 2016

I used curl. Here is network trace that you can load into wireshark

akka.pcap.tar.gz

screen shot 2016-11-02 at 7 34 33 pm

@keimoon
keimoon commented Nov 2, 2016

Also I can only reproduce the bug with HttpEntity.Strict

@jrudolph
Member
jrudolph commented Nov 2, 2016

Great, thanks @keimoon. That will help a lot.

@jrudolph
Member
jrudolph commented Nov 2, 2016

I had a quick try, this issue can be reproduced using your instructions.

The problem is that when the HTTP server decides to close the connection, it does so by completely killing the HTTP stack which will complete the write-side of the TCP connection stream and cancel the read-side. If cancel arrives first (which seems to be the case), the Tcp stream implementation will abort the connection (= socket.close with soLinger(0)). The result is that the OS just doesn't know about the connection any more and if any data was not sent yet this data is lost and any further packets will be responded with RST.

We've seen issues like that for a long time (IIRC also with spray). I guess the solution could be just to keep cancellation from reaching the TCP connection. (I don't think that this creates leaks, hopefully leaks would be prevented by the idle-timeout on the connection but it might make sense to think a bit about it.)

@keimoon
keimoon commented Nov 2, 2016

Great. Currently we are implementing an work around that will use HttpEntity.Chunked if the response body is large.

@mariszin
mariszin commented Dec 5, 2016

We migrated our app to akka-http 10.0.0, and received the same error - when using anything, that sends header "Connection: close" (nginx, curl), the request will fail on Strict entities, that are larger than 128K.
Error in curl:
curl: (56) Recv failure: Connection reset by peer

@guntiso
guntiso commented Dec 5, 2016

Chunked does not help - changed response entity to HttpEntity.Chunked(mediaType, Source(Seq(ByteString(prettyJsonString)))), and now it works with curl, but still fails with nginx. Is there any reliable workaround?

@daandi
daandi commented Dec 9, 2016

Same here :( Would really appreciate a way to solve this.

@ktoso
Member
ktoso commented Dec 12, 2016

So if I understand correctly, we'd need a "CancellationBarrier" that can be "opened" by external signal, which we could use to resolve the race (making it always "cancel" after the completion went "all the way"). Reasonable @jrudolph ?

@jrudolph
Member

We might not need to send cancellation to the tcp stream at all but I haven't checked if that would work.

@jrudolph
Member

I reproduced it with this code jrudolph@a929a03 and curl:

curl -vvv -H "connection:close" http://localhost:9001/big > /dev/null
@gisql
gisql commented Dec 30, 2016

Workaround with nginx here: akka/akka#19542

@jrudolph jrudolph added 3 - in progress and removed 1 - triaged labels Jan 3, 2017
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Jan 3, 2017
@jrudolph jrudolph =htp #459 prevent "Connection closed by peer" errors during connectio…
…n closure

A race between completion and cancellation towards the TCP stream endpoints
might lead to the connection being canceled (= RST frames sent) when an HTTP
connection is regularly closed (e.g. when the client sets `Connection: close`
in the request).

We now prevent cancellation from reaching the TCP stream at all and rely
on closing the connection from the write side.

Fixes #459.
6b63a53
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Jan 3, 2017
@jrudolph jrudolph =htp #459 prevent "Connection closed by peer" errors during connectio…
…n closure

A race between completion and cancellation towards the TCP stream endpoints
might lead to the connection being canceled (= RST frames sent) when an HTTP
connection is regularly closed (e.g. when the client sets `Connection: close`
in the request).

We now prevent cancellation from reaching the TCP stream at all and rely
on closing the connection from the write side.

Fixes #459.
63c6794
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Jan 3, 2017
@jrudolph jrudolph =htp #459 prevent "Connection closed by peer" errors during connectio…
…n closure

A race between completion and cancellation towards the TCP stream endpoints
might lead to the connection being canceled (= RST frames sent) when an HTTP
connection is regularly closed (e.g. when the client sets `Connection: close`
in the request).

We now prevent cancellation from reaching the TCP stream at all and rely
on closing the connection from the write side.

Fixes #459.
b321cc0
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Jan 3, 2017
@jrudolph jrudolph =htp #459 prevent "Connection closed by peer" errors during connectio…
…n closure

A race between completion and cancellation towards the TCP stream endpoints
might lead to the connection being canceled (= RST frames sent) when an HTTP
connection is regularly closed (e.g. when the client sets `Connection: close`
in the request).

We now prevent cancellation from reaching the TCP stream at all and rely
on closing the connection from the write side.

Fixes #459.
0e1779d
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Jan 4, 2017
@jrudolph jrudolph =htp #459 prevent "Connection closed by peer" errors during connectio…
…n closure

A race between completion and cancellation towards the TCP stream endpoints
might lead to the connection being canceled (= RST frames sent) when an HTTP
connection is regularly closed (e.g. when the client sets `Connection: close`
in the request).

We now prevent cancellation from reaching the TCP stream at all and rely
on closing the connection from the write side.

Fixes #459.
e281031
@ktoso ktoso closed this in #708 Jan 4, 2017
@jonas
Contributor
jonas commented Jan 10, 2017

Tested a locally published version of Akka HTTP and I no longer see connection resets.

@ktoso
Member
ktoso commented Jan 10, 2017

Cool, thanks for testing!

@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Feb 8, 2017
@jrudolph jrudolph =htc #851 fix leaking AbsorbCancellation
Under circumstances that I could not reproduce so far,
the AbsorbCancellation stage introduced in #459 may keep the graph running
if data was already buffered but never read. The reason is that the stage
so far requires completion to reach the stage after cancellation was
absorbed which might be blocked by incoming elements that were never
pulled.

The solution is two-fold. After cancellation:

 1) pull in and ignore all incoming elements to eventually fetch completion
 2) complete the stage after a configurable time

In the HTTP use case, we delay the cancellation by no more than the time specified
in the new linger-timeout setting.
8180fe6
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Feb 8, 2017
@jrudolph jrudolph =htc #851 fix leaking AbsorbCancellation
Under circumstances that I could not reproduce so far,
the AbsorbCancellation stage introduced in #459 may keep the graph running
if data was already buffered but never read. The reason is that the stage
so far requires completion to reach the stage after cancellation was
absorbed which might be blocked by incoming elements that were never
pulled.

The solution is two-fold. After cancellation:

 1) pull in and ignore all incoming elements to eventually fetch completion
 2) complete the stage after a configurable time

In the HTTP use case, we delay the cancellation by no more than the time specified
in the new linger-timeout setting.
eca65c2
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Feb 9, 2017
@jrudolph jrudolph =htc #851 fix leaking AbsorbCancellation
Under circumstances that I could not reproduce so far,
the AbsorbCancellation stage introduced in #459 may keep the graph running
if data was already buffered but never read. The reason is that the stage
so far requires completion to reach the stage after cancellation was
absorbed which might be blocked by incoming elements that were never
pulled.

The solution is two-fold. After cancellation:

 1) pull in and ignore all incoming elements to eventually fetch completion
 2) complete the stage after a configurable time

In the HTTP use case, we delay the cancellation by no more than the time specified
in the new linger-timeout setting.
2361612
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Feb 9, 2017
@jrudolph jrudolph =htc #851 fix leaking AbsorbCancellation
Under circumstances that I could not reproduce so far,
the AbsorbCancellation stage introduced in #459 may keep the graph running
if data was already buffered but never read. The reason is that the stage
so far requires completion to reach the stage after cancellation was
absorbed which might be blocked by incoming elements that were never
pulled.

The solution is two-fold. After cancellation:

 1) pull in and ignore all incoming elements to eventually fetch completion
 2) complete the stage after a configurable time

In the HTTP use case, we delay the cancellation by no more than the time specified
in the new linger-timeout setting.
f33e163
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Feb 16, 2017
@jrudolph jrudolph =htc #851 fix leaking AbsorbCancellation
Under circumstances that I could not reproduce so far,
the AbsorbCancellation stage introduced in #459 may keep the graph running
if data was already buffered but never read. The reason is that the stage
so far requires completion to reach the stage after cancellation was
absorbed which might be blocked by incoming elements that were never
pulled.

The solution is two-fold. After cancellation:

 1) pull in and ignore all incoming elements to eventually fetch completion
 2) complete the stage after a configurable time

In the HTTP use case, we delay the cancellation by no more than the time specified
in the new linger-timeout setting.
62a6046
@jrudolph jrudolph added a commit to jrudolph/akka-http that referenced this issue Feb 16, 2017
@jrudolph jrudolph =htc #851 fix leaking AbsorbCancellation
Under circumstances that I could not reproduce so far,
the AbsorbCancellation stage introduced in #459 may keep the graph running
if data was already buffered but never read. The reason is that the stage
so far requires completion to reach the stage after cancellation was
absorbed which might be blocked by incoming elements that were never
pulled.

The solution is two-fold. After cancellation:

 1) pull in and ignore all incoming elements to eventually fetch completion
 2) complete the stage after a configurable time

In the HTTP use case, we delay the cancellation by no more than the time specified
in the new linger-timeout setting.
dfa7dd9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment