Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection to the gRPC server gets stuck #79

Closed
avlazarov opened this issue Aug 23, 2019 · 8 comments
Closed

Connection to the gRPC server gets stuck #79

avlazarov opened this issue Aug 23, 2019 · 8 comments

Comments

@avlazarov
Copy link

avlazarov commented Aug 23, 2019

AnyCable-Go version: 0.6.3
AnyCable gem version: 0.6.3 (same anycable-rails version)
gRPC gem version: 1.20.0
nginx version: 1.17.3

What did you do?

  1. Use nginx + grpc module setup. Gist link
  2. Run two instance of gRPC servers via bundle exec anycable --rpc_host 0.0.0.0:50052 and bundle exec anycable --rpc_host 0.0.0.0:50051
  3. Run an instance of anycable-go via anycable-go --headers=origin,cookie --debug=true --rpc_host=localhost:50050
  4. Subscribe to a channel. Nothing fancy here, using the JS ActionCable.subscribe.
  5. Perform actions on the subscription periodically every 10 seconds in JS – subscription.perform 'do_stuff'.
  6. Stop both gRPC anycable instances without de-registering any from nginx.
  7. On the next do_stuff action, the anycable-go server receives error 502 from nginx since both gRPC servers are gone.

What did you expect to happen?

The anycable-go server to raise an error similar to when no connection to the gRPC is available (Perform error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure,) and retry communicating with the gRPC server on the next do_stuff action.

What actually happened?

After 7., no attempt to send requests to the gRPC server are made (nothing is logged in the anycable-go server and nothing is available in the nginx access log), even if the gRPC servers are started up again. Meanwhile, the client get successful ping messages and can receive broadcasts and through the WS.

If another client gets to subscribe to the same channel they'll either 1) get an error forcing them to reconnect when the gRPC servers are all down or 2) successfully subscribe and perform actions when the gRPC servers are up. The first client will still remain "stuck" however.

Bottom line is that performing actions on a subscription after getting error 502 blocks all new actions from being performed by the anycable-go server for a particular client/subscription.

Could you please give some directions on how to deal with this scenario? One possibility is to 'ack' for actions on the client side and reconnect altogether, but it adds some complexity.

@sponomarev
Copy link
Member

Hey @avlazarov! Have you tried to play with grpc_read_timeout configuration? What happens when the default timeout, 60s, passes?

@sponomarev
Copy link
Member

I assume that AnycableGo is not aware of you RPC servers went done because nginx still keeps the connection because of the grpc_read_timeout and grpc_send_timeout directives.

@palkan
Copy link
Member

palkan commented Aug 23, 2019

I assume that AnycableGo is not aware of you RPC servers went done

As I understood, other clients (new connections) work fine, i.e., gRPC connectivity is restored.

The problem is that the first one, the one that "caught" the broken connection, is getting stuck:

Bottom line is that performing actions on a subscription after getting error 502 blocks all new actions from being performed by the anycable-go server for a particular client/subscription.

@avlazarov Right?

And that's strange: if other clients could successfully perform an action, the first one should do this as well on the next attempt, since they uses the same grpc pool.

@avlazarov
Copy link
Author

avlazarov commented Aug 23, 2019

@palkan Yes, the odd part is that even when the next client makes a series of successful actions, the first one remains stuck. If instead of error 502 I totally shutdown nginx (causing refused connection), anycable-go will perform the operations, print errors but once nginx is back again, the gRPC servers will correctly receive the actions and the client will no longer be stuck.

@palkan
Copy link
Member

palkan commented Aug 24, 2019

I'll try to reproduce it locally and come back when I find something.

@bibendi
Copy link
Member

bibendi commented Nov 11, 2019

I've tried to reproduce it at that simple chat application, but unfortunately (or fortunately) couldn't experience the problem.

@palkan
Copy link
Member

palkan commented Nov 11, 2019

@avlazarov Please, take a look the @bibendi 's PR above. We couldn't reproduce the problem. Are we missing something?

@avlazarov
Copy link
Author

@palkan Sorry, I can't reproduce it after upgrading from Ubuntu 16.04 to 18.04. It might have been something related to that specific version of Nginx for Ubuntu, or I might have misconfigured something else in Nginx that I have not noticed.

@palkan palkan removed the question label Nov 22, 2019
@palkan palkan closed this as completed Nov 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants