Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client side response time is slower than actual when client side is in tcp delayed ack mode #1744

Closed
nobodyiam opened this issue May 5, 2018 · 2 comments

Comments

@nobodyiam
Copy link
Contributor

nobodyiam commented May 5, 2018

1. Issue Description

Dubbo's netty 3 server implementation does not enable TCP_NODELAY option, which causes the server side not responding in time when client side is in delayed ack mode and the response size is less than MSS.

However, the netty 4 server implementation does enable this option.

bootstrap.group(bossGroup, workerGroup)
                .channel(NioServerSocketChannel.class)
         --->   .childOption(ChannelOption.TCP_NODELAY, Boolean.TRUE) 
                .childOption(ChannelOption.SO_REUSEADDR, Boolean.TRUE)
                .childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)

Considering Netty 4 enables this option by default(Enable TCP_NODELAY and SCTP_NODELAY by default, Consider turning on TCP_NODELAY by default), dubbo's netty 3 server implementation should also enable this option by default.

2. Solution

Simply set this option when constructing ServerBootstrap:

bootstrap.setOption("child.tcpNoDelay", true);

3. Issue Demonstration

Here is an example captured, with 10.5.160.181 as the client side and 10.5.169.180 as the server side.

3.1 Case with normal ack

The demo server side logic costs 10-11 ms, so normally, the client side response time is around 12 ms.

The frame 9 highlighted below is a normal request, whose request id is 0x02.

image

The frame 11 highlighted below is the response, whose response id is 0x02.
image

The actual response time is 12 ms.

Also we can see from the above screenshots that both client side and server side respond ack normally.

3.2 Case with delayed ack

Now let's take a look when client side is in delayed ack mode.

From the screen shots below, there is no single ack packet, which means it's in delayed ack mode - ack is returned along with a data packet.

The frame 239 highlighted below is a request, whose request id is 0x7f.
We can see the 240 packet responded immediately when 239 was sent to server, considering the server logic costs 10-11 ms, this packet is not the response.

image

As we can see, the frame 240's response id is 0x7e, which is the response to the previous request.

image

Then frame 241 was sent, whose request id is 0x80.

image

After frame 241 was sent, the response to request 0x7f was returned (frame 242)

image

Request 0x7f was sent in 14:37:53.267, so we know that the response was ready around 14:37:53.277 in the server side. However, since the server side did not enable TCP_NODELAY, so according to Nagel algorithm, the response cannot be sent until an ack to previous little packet is received or a timeout occurrs(40ms).

So the response is held in server side and is sent when another request is sent from client side, which was 14:37:53.292730.

In this case, the client side response time (25ms) is much slower than it should be (12ms).

Even worse, in this situation, the response time is determined by the interval in which the client side sends the request.

We've done some experiments and the result shows in this situation, that if client side sends requests in QPS 50, then the client side response time is 20ms, because the client side request sending interval is 20ms (1000/50). If client side QPS is 40, the the response time is 25ms. If client side qps is 30, then the response time is 33ms, etc.

The response time increases when the qps decreases, until the QPS is 25. Because when the sending interval is larger than 40ms(1000/25), tcp will cancel the delayed ack mode.

4. Conclusion

Based on the above example, we can see not enabling the option TCP_NODELAY causes the performance degration dramatically in such case, so I would hope this fix can be applied so that we could always expect a stable performance.

BTW, I could submit a PR if necessary.

@nobodyiam
Copy link
Contributor Author

As #1746 was merged, this issue could be closed.

@356082462
Copy link
Contributor

very perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants