Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Transport closing when client is idle #124

Closed
dennwc opened this issue Jun 24, 2019 · 7 comments
Closed

Transport closing when client is idle #124

dennwc opened this issue Jun 24, 2019 · 7 comments
Assignees

Comments

@dennwc
Copy link
Member

dennwc commented Jun 24, 2019

Currently, the client opens a connection when the client is created and keeps it open even when idle. The connection can then be terminated if it's idle for too long.

However, there are cases when the gRPC library cannot determine connection state precisely and may fail the first request with transport closing after being idle for long enough.

The solution is to use a grpc.WithKeepaliveParams to keep the connection healthy while idle. We can either hard-code it or allow passing additional DialOptions when creating a client.

@dennwc dennwc added the bug label Jun 24, 2019
@dennwc dennwc self-assigned this Jun 24, 2019
@creachadair
Copy link
Contributor

It sounds to me as if there may be a bug in how we are using gRPC. Once we have a gRPC stub open, it should transparently re-connect as needed (even if the underlying connection actually went away due to a timeout) provided the resolver can still find a valid network address for the stub to connect to.

It seems like a good idea to let idle connections close, since otherwise they are tying up a descriptor, but there may be some combination of options we can set to ensure the stub will correctly reconnect on a timeout.

@dennwc
Copy link
Member Author

dennwc commented Jun 24, 2019

I was also thinking that we are using it in the wrong way, but I can't find anything relevant in the code.

Also, here is the log (after idling):

2019/06/24 18:23:23 State of bblfish client is READY - Filename App.java
2019/06/24 18:23:47 Babelfish can't parse the file App.java rpc error: code = Unavailable desc = transport is closing
2019/06/24 18:23:47 State of bblfish client is CONNECTING - Filename Cache.java
2019/06/24 18:23:47 State of bblfish client is READY - Filename command/Command.java
2019/06/24 18:23:47 State of bblfish client is READY - Filename command/CommandFactory.java

So gRPC thinks that the channel is alive (as seen by READY status), and tries to send the parse request. Then for some reason, it returns the transport is closing to the client instead of reconnecting the underlying TCP connection. The next parse request is sent during CONNECTING status and eventually succeeds when the connection is re-established. So it may be a bug in gRPC after all.

In any case, allowing to set additional dial options sounds useful, so I'll PR it shortly.

@creachadair
Copy link
Contributor

I wonder if maybe gRPC only reconnects if there is a different address it can use, i.e., another endpoint in a load-balancing channel, rather than the original?

dennwc added a commit to dennwc/go-client that referenced this issue Jun 24, 2019
Signed-off-by: Denys Smirnov <denis.smirnov.91@gmail.com>
@dennwc
Copy link
Member Author

dennwc commented Jun 24, 2019

It would make sense if it tries to connect, fails and then returns an error because there is only one endpoint, but that's not that case, as far as I know.

Related: grpc/grpc-go#2443

dennwc added a commit to dennwc/go-client that referenced this issue Jun 24, 2019
Signed-off-by: Denys Smirnov <denis.smirnov.91@gmail.com>
@bzz
Copy link
Contributor

bzz commented Jun 26, 2019

Related: grpc/grpc-go#2443

Great find @dennwc !

So does it mean that we should set grpc.KeepaliveEnforcementPolicy and PermitWithoutStream=true on the server as well, to enforce keepalive packets coming?

@dennwc
Copy link
Member Author

dennwc commented Jun 26, 2019

@bzz Good point! We probably should.

dennwc added a commit that referenced this issue Jun 26, 2019
Signed-off-by: Denys Smirnov <denis.smirnov.91@gmail.com>
@dennwc dennwc removed their assignment Jul 8, 2019
@kuba-- kuba-- self-assigned this Jul 9, 2019
@bzz
Copy link
Contributor

bzz commented Jul 11, 2019

Most probably should be good to close as soon as #299 and #127 are merged now

@bzz bzz closed this as completed Jul 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants