Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acknowledgements #2

Closed
dselivanov opened this issue Feb 2, 2020 · 16 comments
Closed

Acknowledgements #2

dselivanov opened this issue Feb 2, 2020 · 16 comments

Comments

@dselivanov
Copy link
Contributor

Hi @atheriel. Thanks for your great work on rabbitMQ client.

I see that client sends acknowledgement before callback is executed and regardless the callback status. Shouldn't we send it only if callback haven't thrown an error? Otherwise it becomes a situation that message is not properly consumed and nobody notice that. So essentially message is lost for a system.

@atheriel
Copy link
Owner

atheriel commented Feb 3, 2020

This is intentional, yes. In fact, an earlier version would only acknowledge the message if the callback returned TRUE (and n'ack it if FALSE), but I moved away from this for a few reasons:

  1. My current understanding of the AMQP spec and RabbitMQ docs is basically that clients should acknowledge messages if they are successfully delivered to the application in almost all cases. So I consider it correct behaviour to ack messages as they are received.

  2. longears is not a true low-level library. Certain things (channels and acknowledgements, among others) are generally hidden from users and handled in what I believe to be best-practice ways. For example, automatically ack-ing messages is how EasyNetQ (which influenced many of my best-practice design decisions) behaves.

  3. Catching R-level errors in C code is a pain in the ass to implement, and would be even more annoying in the background thread case, where the message IDs for acknowledgements would need to be correctly guarded by the mutex.

However, I'm open to feedback here. What kind of pattern are you trying to implement where you need to fail to acknowledge messages?

ps. Complete implementations for manually ack-ing and nack-ing messages are actually still in the source code, they're just not exported.

@dselivanov
Copy link
Contributor Author

What kind of pattern are you trying to implement where you need to fail to acknowledge messages?

When data consistency is a must I would prefer to be on the safe side and acknowledge only after function finished whatever it supposed to do. For example message queues commonly used as buffer to organize mini-batching and insertion to a database. So if error happens in a callback then messages are not inserted to a database. But at the same time they are consumed from a queue. This essentially means that messages are lost...

@atheriel
Copy link
Owner

atheriel commented Feb 6, 2020

I suppose that in cases like that I would handle errors at the application level instead of pushing them back to the message broker, but I can see why you'd want guaranteed correct delivery. What's the behaviour that you would expect if the callback fails? Negative acknowledgement + requeue, or just no acknowledgement at all?

@dselivanov
Copy link
Contributor Author

What's the behaviour that you would expect if the callback fails? Negative acknowledgement + requeue, or just no acknowledgement at all?

I would say safest option by default might be nack and requeue, but ideally this should be configurable ( as in ptyhon pika client for example).

P.S. I started to learn rabbitmq not so long time ago. What I discovered last week that in order to requeue you need to pass requeue=true to the nack. For me this is quite surprising, I would expect it to be the only action with nack.

@aleksarias
Copy link

Have you guys used the amqp_ack and amqp_nack functions?
I tried using it in one of my programs and it causes a server disconnect.

@atheriel
Copy link
Owner

@aleksarias Yes, because (n-)acknowledging a message ID that has already been ack'd is a channel error under AMQP. Those functions aren't exported, it shouldn't be surprising they don't work...

@hampos
Copy link

hampos commented Oct 18, 2020

@atheriel
I also think the best approach would be to nack and requeue in case of callback failure.
Maybe an option that stops auto acking and gives the ability to ack/nack-requeue to the consumer?

@atheriel
Copy link
Owner

@hampos I'm still not personally convinced of a real use for nacks (since they can't guarantee data safety), and would caution against adopting an architecture that relied on them. Can you give an example of what you're trying to achieve?

@dselivanov
Copy link
Contributor Author

dselivanov commented Oct 19, 2020

since they can't guarantee data safety

Could you clarify on this point?

Usual flow for stream processing is following:

  • got message
  • handle it
    • acknowledge at the end after success
    • once ack received broker remove message from queue
  • if error during handling
    • either send nack and re-queue
    • or client catch exception in user-code and re-queue

Also client usually counts number of msg retries and in case it exceed some pre-configured number then client sends message to dead letter queue.

@aleksarias
Copy link

aleksarias commented Oct 19, 2020 via email

@atheriel
Copy link
Owner

  • if error during handling
    • either send nack and re-queue
    • or client catch exception in user-code and re-queue

Can you explain this a bit more?

@atheriel
Copy link
Owner

@dselivanov On data safety, RabbitMQ cannot guarantee "exactly once" message delivery, so you must either take extreme care with messages that are redelivered or admit that some messages will be lost.

Currently longears does the latter: if your code crashes after a message is acknowledged but before it is "successfully" processed, you can lose data. I think this is what users will expect, and it's easy to reason about.

I think this basically comes down to the question, what should an application do when it receives a message where the redelivered property is TRUE?

At the moment, you can be sure that such a message was delivered to a consumer but never passed to any R code, so it's safe to run your callback with side effects (as any real code will have). In fact you can ignore the flag entirely.

But if consumers can nack messages after processing, you must consider (and determine) which of the following happened:

  1. The message was delivered but no processing ever happened, perhaps due to a crash while processing an earlier message. It is therefore safe to process the message as normal.

  2. The message was partially processed but the application crashed before it could acknowledge the message. In this case, some or all of the side effects might have taken place, and your consumer will need to talk to all of the related systems to determine what happened.

  3. The message was nack'd by another consumer, so processing is safe but will almost certainly fail for the same reason.

You can overcome (1) and (2) by writing your message processing code so that it is idempotent. But longears has absolutely no way to ensure users do this until the discoverer the related pain for themselves, and I think "this code might run twice on the same message" would surprise new RabbitMQ users especially.

For (3), you'd need the current consumer to be sufficiently different from that earlier consumer in order to process it successfully, otherwise you're wasting its time. This is not usually the case, so more likely it will be rejected again and again (until, as in your example, it gets dead-lettered).

I don't really know how folks deal with working this out in practice. Instead, we usually handle errors by logging them and sometimes by forwarding the offending messages to an error exchange (which is equivalent to manual dead-lettering), where they will be picked up by a dedicated error service (and, more than likely, logged).

Worth reading is this StackOverflow thread; there are several good answers.

@dselivanov
Copy link
Contributor Author

dselivanov commented Oct 20, 2020

Can you explain this a bit more?

Your next comment explains main cases. Essentially there are 2 edge cases:

  • handler function logic works fine, but message can't be processed due to some external errors (for example connection to some DB is temporary not available). We would like to give another chance to handle this message and requeue.
  • handler throws exception due to some corner case (developer mistake) during msg processing. We can requeue this msg up to MAX_ATTEMPTS and then put letter to dead letter queue.

Overall requeue increase the chance to eventually handle message and makes system much more robust and reduce the chance to loose data.

On data safety, RabbitMQ cannot guarantee "exactly once" message delivery

You can overcome (1) and (2) by writing your message processing code so that it is idempotent. But longears has absolutely no way to ensure users do this until the discoverer the related pain for themselves, and I think "this code might run twice on the same message" would surprise new RabbitMQ users especially.

That is the whole point of this thread and comments from me and others. As a user of RabbitMQ client I need to be aware that message can be delivered several times and I need to implement handling logic accordingly (side effects ideally should be idempotent). Otherwise we are getting "fire and forget" style message queue which RabbitMQ is not (primarily).

@SyncM1972
Copy link

+1
Hi @atheriel. Thanks a lot for your great rmq client!
We also have use case in which a manual acknowledge / nack signal is required.
We use rmq to enqueue jobs which are then processed on worker servers. We have to be sure that each and every job is processed. If, e.g., the worker server crashes, or a worker process is terminated, any unfinished job has to be processed by another worker. Therefore, it would be great if it would be possible to manually acknowledge / nack the job once it was finished.
Atm, we are using a wrapper. which calls the R code used to perform the job, and, if the job succeeded, acknowledges it.
It would be so nice if we could do this using your longears package.

atheriel added a commit that referenced this issue Nov 11, 2020
The existing behaviour is basically the same as always having
no_ack=TRUE, in that messages are unequivocally acknowledged. However,
it is sometimes desirable to distinguish processing which fails from
that which succeeds, so callbacks which throw an error will now nack
messages instead -- without requeing them.

Since nack without requeue is only *semantically* (not operationally)
different than acknowledgment, existing workflows are unlikely to be
affected.

Documentation has been updated to reflect this behaviour.

Part of #2.
@atheriel
Copy link
Owner

#10 implements what I believe is enough machinery to accomplish what is discussed in this thread. Here are a few examples:

# 1. Messages will be acknowledged *after* the callback runs.
amqp_consume(conn, queue, function(msg) {
  message("got msg")
})

# 2. Errors will nack (instead of ack) but not requeue the message.
amqp_consume(conn, queue, function(msg) {
  stop("can't handle this")
})

# 3. Errors will instead nack and requeue the message.
amqp_consume(conn, queue, function(msg) {
  stop("can't handle this, maybe someone else can")
}, requeue_on_error = TRUE)

# 4. Nack behaviour can be manually controlled by signalling with amqp_nack().
amqp_consume(conn, queue, function(msg) {
  if (msg$redelivered) {
    message("can't handle this either, dead-letter it")
    amqp_nack(requeue = FALSE) # Works like stop().
  } else {
    stop("can't handle this, maybe someone else can")
  }
}, requeue_on_error = TRUE)

@aleksarias
Copy link

aleksarias commented Nov 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants