Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Connection reset by peer" on large response body,last-chunk is not sent #2791

Closed
Infernalzero opened this issue Aug 9, 2018 · 14 comments
Closed

Comments

@Infernalzero
Copy link

Hi,
Recently,I encountered a strange issue of "Connection reset by peer".
My server side is using Jetty(9.4.9.v20180320),and the client side is using RxNetty 0.4.19 with netty version 4.1.5.
When jetty returns large response body,the client side throws an exception with message "Connection reset by peer".
The client side is using connection pools provided by RxNetty.
I tried to capture the packets and find that jetty does not send the last-chunk,so netty continues to read for more and result in receiving [RST,ACK].
image

part of the response:
HTTP/1.1 200 OK
Content-Type: application/json;charset=utf-8
Transfer-Encoding: chunked
Server: Jetty(9.4.9.v20180320)

479A
[{"type":0,"state":1,"created":1527750203000....................
......
.....
...."ssnFront":"http://qhyxpic.oss.kujiale.com/2016/12/30/SEDFNA

I does not find the last-chunk in the response.

the netty debug log is similar with result of tcpdump,no last-chunk:
2018-08-09 10:17:02,986 DEBUG [io.netty.handler.logging.LoggingHandler] - [id: 0x99a28050, L:/10.4.3.173:45926 - R:/10.111.6.225:80] RECEIVED: 320B
+-------------------------------------------------+
| 0 1 2 3 4 5 6 7 8 9 a b c d e f |
+--------+-------------------------------------------------+----------------+
|00000000| 2f 54 33 42 59 46 44 57 4f 52 55 58 54 51 46 44 |/T3BYFDWORUXTQFD|
|00000010| 49 4c 51 4d 36 59 4e 41 38 2e 6a 70 67 22 2c 22 |ILQM6YNA8.jpg","|
|00000020| 63 6f 76 65 72 44 65 73 63 22 3a 22 22 2c 22 6e |coverDesc":"","n|
|00000030| 61 6d 65 22 3a 22 e7 8e 8b e8 8a ac e8 8a b3 22 |ame":"........."|
|00000040| 7d 2c 7b 22 74 79 70 65 22 3a 30 2c 22 73 74 61 |},{"type":0,"sta|
|00000050| 74 65 22 3a 31 2c 22 63 72 65 61 74 65 64 22 3a |te":1,"created":|
|00000060| 31 35 32 31 34 34 37 37 35 32 30 30 30 2c 22 65 |1521447752000,"e|
|00000070| 78 70 65 72 74 49 64 22 3a 31 39 31 37 38 33 39 |xpertId":1917839|
|00000080| 34 2c 22 75 73 65 72 49 64 22 3a 39 31 37 38 33 |4,"userId":91783|
|00000090| 39 34 2c 22 64 69 73 74 72 69 63 74 22 3a 31 34 |94,"district":14|
|000000a0| 35 38 2c 22 61 70 70 6c 79 43 6f 75 6e 74 22 3a |58,"applyCount":|
|000000b0| 32 2c 22 70 61 73 73 54 69 6d 65 22 3a 31 35 32 |2,"passTime":152|
|000000c0| 36 39 37 34 38 38 37 30 30 30 2c 22 76 69 65 77 |6974887000,"view|
|000000d0| 43 6f 75 6e 74 22 3a 37 32 2c 22 6c 61 73 74 6d |Count":72,"lastm|
|000000e0| 6f 64 69 66 69 65 64 22 3a 31 35 33 33 35 34 34 |odified":1533544|
|000000f0| 33 39 34 30 30 30 2c 22 64 65 6c 65 74 65 64 22 |394000,"deleted"|
|00000100| 3a 30 2c 22 61 76 61 74 61 72 22 3a 22 68 74 74 |:0,"avatar":"htt|
|00000110| 70 3a 2f 2f 71 68 79 78 70 69 63 2e 6f 73 73 2e |p://qhyxpic.oss.|
|00000120| 6b 75 6a 69 61 6c 65 2e 63 6f 6d 2f 61 76 61 74 |kujiale.com/avat|
|00000130| 61 72 73 2f 4c 4b 58 58 47 47 49 4b 51 51 4f 42 |ars/LKXXGGIKQQOB|
+--------+-------------------------------------------------+----------------+
2018-08-09 10:17:02,987 DEBUG [io.netty.handler.logging.LoggingHandler] - [id: 0x99a28050, L:/10.4.3.173:45926 - R:/10.111.6.225:80] EXCEPTION: java.io.IOException: Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_172]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_172]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_172]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_172]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_172]
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) ~[netty-buffer-4.1.5.Final.jar:4.1.5.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1100) ~[netty-buffer-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:367) ~[netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:118) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:610) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:551) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:465) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:437) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) [netty-common-4.1.5.Final.jar:4.1.5.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) [netty-common-4.1.5.Final.jar:4.1.5.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]

On the server side,I haven't set any special configs for jetty such as soLingerTime.
I tried to enable the jetty debug log,but everything is right,I cant't find any log that jetty closed the connection.

What made me more confusing is that when I tried to send the request by using curl,it's all OK.
image

This issue does not appear in smaller response body or when using h2c.

Could this be a bug of jetty?

@joakime
Copy link
Contributor

joakime commented Aug 9, 2018

When troubleshooting these kinds of issues, its best if you use the latest version of Jetty.
Can you test with 9.4.11.v20180605 or 9.4.12.RC1?

Does the same request work without error with a web browser?
The fact that it works with curl is good.

Note that soLinger is 100% incompatible with Java NIO, client and server. don't use it, don't configure it.

@Infernalzero
Copy link
Author

Because I didn't see bugfix or changes in changelog of later versions related with this issue,so I haven't tried to use the latest version,I'll try it later.
web browser can not directly send such request to test because the api is provided for internal usage,only backend service can invoke it.
soLinger is not set either client side or server side.

@gregw
Copy link
Contributor

gregw commented Aug 10, 2018

How is the large content being generated?
Static content? writer? output stream? lots of little writes? one big write? Async? Blocking? Any code you can share?

Note also that if jetty closed the connection, a FIN would probably be sent rather than a RST.

Is this on windows? do you have a software firewall enabled?

@Infernalzero
Copy link
Author

The large content is generated by SpringMVC MappingJackson2HttpMessageConverter.
The server side code is just normal SpringMVC Controller.
These are all on linux centos7.2,application is running with docker.
I'm sure that no firewall is enabled,such as SELinux,iptables,firewall are all disabled.

@Infernalzero
Copy link
Author

I upgraded the jetty version to 9.4.12.RC1 and netty version to 4.1.28,but the issue still exists.

@sbordet
Copy link
Contributor

sbordet commented Aug 13, 2018

I find strange that if curl works, then there must be a difference in how the request is sent, or it's the RxNetty client that has a problem.

Can you pack and attach here a reproducible test case that we can try?

@Infernalzero
Copy link
Author

I can't simply reproduce it,that's what annoyed me.
Connection reset by peer sometime occurs even when the responsebody is not so large.But when I restart the jetty then it disappeared.
Now I changed the container to tomcat in one of my services,and it's using BIO.In this case,"Connection reset by peer" never appears in any condition.

I also want to reproduce it by unit test case,but failed to do so.
Here is a sample of the client and server code.It's still a little different from the production enviroment.

add ribbon for the maven dependency,it's same with the springcloud

com.netflix.ribbon
ribbon
2.2.4

Client side:
final Server server = new Server("10.x.x.x", 80);
server.setAlive(true);
final BaseLoadBalancer BaseLoadBalancer = new BaseLoadBalancer();
BaseLoadBalancer.addServer(server);
final LoadBalancingHttpClient loadBalancingHttpClient = RibbonTransport
.newHttpClient(BaseLoadBalancer);
final HttpClientRequest httpClientRequest = HttpClientRequest.createPost("/connection/reset");
final Observable<HttpClientResponse> httpResponseObservable = loadBalancingHttpClient
.submit(httpClientRequest);
final String result = httpResponseObservable
.flatMap(new Func1<HttpClientResponse, Observable>() {
@OverRide
public Observable call(final HttpClientResponse t1) {
return t1.getContent();
}
}).map(new Func1<ByteBuf, String>() {
@OverRide
public String call(final ByteBuf t1) {
return t1.toString(StandardCharsets.UTF_8);
}
}).toBlocking().last();
System.out.println(result);
// don't exit rightnow

ServerSide

@RestController
public class TestController {

@PostMapping("/connection/reset")
public Object testConnectionReset() {
final List result = new ArrayList<>();
for (int i = 0; i < 1000; i++) {
final User user = new User();
user.setId(123456789 + i);
user.setName("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
user.setAvatar(
"https://www.qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqwwwwwwwwwweeeeeeeee.com/avatars/843634632453253259.jpg");
user.setCover(
"https://www.qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqwwwwwwwwwweeeeeeeee.com/avatars/843634632453253259.jpg");
user.setAddress(
"ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc");
result.add(user);
}
return result;
}
}

@gregw
Copy link
Contributor

gregw commented Aug 15, 2018 via email

@usulkies
Copy link

Hello, we're also experiencing this sort of problem.
We upgraded Dropwizard version from 1.2.2 to 1.3.5, causing jetty to upgrade from 9.4.7 to 9.4.11.
Our service runs on AWS EBS, with nginx proxy.
The nginx, now, occasionally returns a "Bad Gateway" (502) status. These are example lines from nginx error log:
2018/08/15 06:51:30 [error] 7807#0: *6074 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.30.x.x, server: , request: "POST /some_heavy_post_request HTTP/1.1", upstream: "http://127.0.0.1:8080/some_heavy_post_request", host: "some_hostname"
2018/08/15 06:55:36 [error] 7807#0: *21658 writev() failed (104: Connection reset by peer) while sending request to upstream, client: 10.30.x.x, server: , request: "POST /some_heavy_post_request HTTP/1.1", upstream: "http://127.0.0.1:8080/some_heavy_post_request", host: "some_hostname"

@Infernalzero
Copy link
Author

I also tried to add same headers for curl in different requests,and I even set --tcp-nodelay for curl considering that netty enable this option by default,but curl is always right,with out RST.
What's more,I also tried to modify tcp_rmem and tcp_wmem in sysctl,but all of these make no sense.

here is the cap file,search tcp.stream eq 46 for the reset connection

https://drive.google.com/open?id=1irKdtBe5p1UB2ygSacpsYjXj-MSh6ymp

@Infernalzero
Copy link
Author

The nginx, now, occasionally returns a "Bad Gateway" (502) status. These are example lines from nginx error log:

I also suffer from this kind of problem.
I optimized it by setting the keepalive number to a propitiate value according to the access statistics.
http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive
It's a much bigger value before,after I adjust this value,less 502 response is found.
So I guess it may have something to do with the idletimeout with jetty.

@joakime
Copy link
Contributor

joakime commented Aug 15, 2018

If you use nginx keepalive (to limit the number of active keepalive connections between nginx and jetty), then you MUST also use the configuration (on nginx) to force HTTP/1.1 and strip client request Connection headers (which btw will break WebSocket upgrade!)

proxy_http_version 1.1;
proxy_set_header Connection "";

If you have an idle timeout on Jetty, then you MUST set the idle keepalive_timeout on nginx to be smaller then what Jetty is configured for.

keepalive_timeout 55s;

@Infernalzero
Copy link
Author

If the value of keepalive_timeout is set to be larger than idle timeout of jetty.
Then the error info by nginx should be like this right?
"2018/08/16 13:48:39 [error] 21453#0: *421301801 upstream prematurely closed connection while reading response header from upstream"
Not Connection reset by peer,in this case jetty should close the connection gracefully,No RST is sent as respected.

But even I set the keepalive_timeout smaller than jetty idleTimeout,I can still find error infos like
"upstream prematurely closed connection while reading response header from upstream" and "recv() failed (104: Connection reset by peer) while reading response header from upstream" occasionally everyday.

My nginx.conf
keepalive_timeout 65s;

start.ini
jetty.http.idleTimeout=120000

@panghy
Copy link

panghy commented Sep 6, 2018

We were able to narrow this down to essentially us using the GZIPDecodingInterceptor from resteasy. After switching to the GZipDecoder from Dropwizard (the Jersey one should work as well), the 104 from nginx disappeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants