Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No automatic reconnects #14

Closed
stefanfritsch opened this issue Apr 12, 2021 · 1 comment
Closed

No automatic reconnects #14

stefanfritsch opened this issue Apr 12, 2021 · 1 comment

Comments

@stefanfritsch
Copy link

Hi,

I always have to reconnect manually if the connection has been idle for more than a minute or so.

> amqp_publish(conn, message.raw, exchange = "run.function", routing_key = "#")
Error in amqp_publish(conn, message.raw, exchange = "run.function", routing_key = "#") : 
  Failed to publish message. Disconnected from server

I then have to reconnect and try again:

> amqp_reconnect(conn)
> amqp_publish(conn, message.raw, exchange = "run.function", routing_key = "#")
>

That's not a problem per se but ?amqp_reconnect says:

When possible, we automatically recover from connection errors, so manual reconnection is not usually necessary.

So before I write tryCatch() wrappers I wanted to ask if there's something I'm doing wrong. I followed the Basic Usage example just with a remote server. Is this related to the timeout= parameter in amqp_connect? Do I have to set some other parameter to enable automatic reconnects?

Thank you.

Best regards,
Stefan

@atheriel
Copy link
Owner

If you have a publish-only workload you will always encounter this issue when you publish infrequently. The underlying reason is that we need to send heartbeats to the RabbitMQ server every 30s in order for the connection to remain active, and those heartbeats aren't sent unless you call publish() (or anything else that interacts with the connection).

As an aside: originally this package did not use hearbeats, but we discovered that this causes extremely brittle network connections that can crash/timeout R for periods of 15 minutes, so they are now enabled with no option to turn them off. It's the lesser of two evils.

The fundamental problem here is that R (and the underlying librabbitmq) is single-threaded, and so we can't really do stuff "in the background" like send heartbeats.

I have a few suggestions, based on our experience:

  • If you are publishing and consuming on the same connection, ensure you call amqp_consume() more often than the heartbeat timeout. Of course, if you are only publishing this is not very helpful.

  • If you are publishing infrequently, consider also publishing a separate "hello, world" message to a dedicated exchange every 10-30s. This is actually a pretty common pattern for monitoring the health of an application anyway.

  • If you are publishing very infrequently, just create a new connection, publish, and then close the connection.

Unfortunately there is no way to indicate whether we got disconnected due to missed heartbeats, which is why the error message won't tell you. And we do try to recover from errors, but this is not actually an error -- it is intentional behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants