Acknowledgements #2

dselivanov · 2020-02-02T07:41:26Z

Hi @atheriel. Thanks for your great work on rabbitMQ client.

I see that client sends acknowledgement before callback is executed and regardless the callback status. Shouldn't we send it only if callback haven't thrown an error? Otherwise it becomes a situation that message is not properly consumed and nobody notice that. So essentially message is lost for a system.

atheriel · 2020-02-03T01:34:28Z

This is intentional, yes. In fact, an earlier version would only acknowledge the message if the callback returned TRUE (and n'ack it if FALSE), but I moved away from this for a few reasons:

My current understanding of the AMQP spec and RabbitMQ docs is basically that clients should acknowledge messages if they are successfully delivered to the application in almost all cases. So I consider it correct behaviour to ack messages as they are received.
longears is not a true low-level library. Certain things (channels and acknowledgements, among others) are generally hidden from users and handled in what I believe to be best-practice ways. For example, automatically ack-ing messages is how EasyNetQ (which influenced many of my best-practice design decisions) behaves.
Catching R-level errors in C code is a pain in the ass to implement, and would be even more annoying in the background thread case, where the message IDs for acknowledgements would need to be correctly guarded by the mutex.

However, I'm open to feedback here. What kind of pattern are you trying to implement where you need to fail to acknowledge messages?

ps. Complete implementations for manually ack-ing and nack-ing messages are actually still in the source code, they're just not exported.

dselivanov · 2020-02-04T15:45:52Z

What kind of pattern are you trying to implement where you need to fail to acknowledge messages?

When data consistency is a must I would prefer to be on the safe side and acknowledge only after function finished whatever it supposed to do. For example message queues commonly used as buffer to organize mini-batching and insertion to a database. So if error happens in a callback then messages are not inserted to a database. But at the same time they are consumed from a queue. This essentially means that messages are lost...

atheriel · 2020-02-06T18:15:54Z

I suppose that in cases like that I would handle errors at the application level instead of pushing them back to the message broker, but I can see why you'd want guaranteed correct delivery. What's the behaviour that you would expect if the callback fails? Negative acknowledgement + requeue, or just no acknowledgement at all?

dselivanov · 2020-02-07T05:42:27Z

What's the behaviour that you would expect if the callback fails? Negative acknowledgement + requeue, or just no acknowledgement at all?

I would say safest option by default might be nack and requeue, but ideally this should be configurable ( as in ptyhon pika client for example).

P.S. I started to learn rabbitmq not so long time ago. What I discovered last week that in order to requeue you need to pass requeue=true to the nack. For me this is quite surprising, I would expect it to be the only action with nack.

aleksarias · 2020-02-12T02:51:21Z

Have you guys used the amqp_ack and amqp_nack functions?
I tried using it in one of my programs and it causes a server disconnect.

atheriel · 2020-02-12T03:47:23Z

@aleksarias Yes, because (n-)acknowledging a message ID that has already been ack'd is a channel error under AMQP. Those functions aren't exported, it shouldn't be surprising they don't work...

hampos · 2020-10-18T09:21:28Z

@atheriel
I also think the best approach would be to nack and requeue in case of callback failure.
Maybe an option that stops auto acking and gives the ability to ack/nack-requeue to the consumer?

atheriel · 2020-10-19T18:32:22Z

@hampos I'm still not personally convinced of a real use for nacks (since they can't guarantee data safety), and would caution against adopting an architecture that relied on them. Can you give an example of what you're trying to achieve?

dselivanov · 2020-10-19T18:47:41Z

since they can't guarantee data safety

Could you clarify on this point?

Usual flow for stream processing is following:

got message
handle it
- acknowledge at the end after success
- once ack received broker remove message from queue
if error during handling
- either send nack and re-queue
- or client catch exception in user-code and re-queue

Also client usually counts number of msg retries and in case it exceed some pre-configured number then client sends message to dead letter queue.

aleksarias · 2020-10-19T18:57:44Z

That’s the processing flow I’m used to seeing when working with rabbitmq. It’s what makes rabbitmq so powerful.

On Mon, Oct 19, 2020 at 2:47 PM Dmitriy Selivanov ***@***.***> wrote: since they can't guarantee data safety Could you clarify on this point? Usual flow for stream processing is following: - got message - handle it - acknowledge at the end after success - once ack received broker remove message from queue - if error during handling - either send nack and re-queue - or client catch exception in user-code and re-queue Also client usually counts number of msg retries and in case it exceed some pre-configured number then client sends message to dead letter queue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7TYRACF7HKATMAIXSZKB3SLSCV5ANCNFSM4KOYCC5A> .

--

…

-- Alex L. Arias Data Engineer Materials Science & Engineering, B.S. Cell: +1 305 965 0326 aleksarias@gmail.com

atheriel · 2020-10-19T19:09:45Z

if error during handling

either send nack and re-queue

or client catch exception in user-code and re-queue

Can you explain this a bit more?

atheriel · 2020-10-19T20:55:25Z

@dselivanov On data safety, RabbitMQ cannot guarantee "exactly once" message delivery, so you must either take extreme care with messages that are redelivered or admit that some messages will be lost.

Currently longears does the latter: if your code crashes after a message is acknowledged but before it is "successfully" processed, you can lose data. I think this is what users will expect, and it's easy to reason about.

I think this basically comes down to the question, what should an application do when it receives a message where the redelivered property is TRUE?

At the moment, you can be sure that such a message was delivered to a consumer but never passed to any R code, so it's safe to run your callback with side effects (as any real code will have). In fact you can ignore the flag entirely.

But if consumers can nack messages after processing, you must consider (and determine) which of the following happened:

The message was delivered but no processing ever happened, perhaps due to a crash while processing an earlier message. It is therefore safe to process the message as normal.
The message was partially processed but the application crashed before it could acknowledge the message. In this case, some or all of the side effects might have taken place, and your consumer will need to talk to all of the related systems to determine what happened.
The message was nack'd by another consumer, so processing is safe but will almost certainly fail for the same reason.

You can overcome (1) and (2) by writing your message processing code so that it is idempotent. But longears has absolutely no way to ensure users do this until the discoverer the related pain for themselves, and I think "this code might run twice on the same message" would surprise new RabbitMQ users especially.

For (3), you'd need the current consumer to be sufficiently different from that earlier consumer in order to process it successfully, otherwise you're wasting its time. This is not usually the case, so more likely it will be rejected again and again (until, as in your example, it gets dead-lettered).

I don't really know how folks deal with working this out in practice. Instead, we usually handle errors by logging them and sometimes by forwarding the offending messages to an error exchange (which is equivalent to manual dead-lettering), where they will be picked up by a dedicated error service (and, more than likely, logged).

Worth reading is this StackOverflow thread; there are several good answers.

dselivanov · 2020-10-20T08:21:58Z

Can you explain this a bit more?

Your next comment explains main cases. Essentially there are 2 edge cases:

handler function logic works fine, but message can't be processed due to some external errors (for example connection to some DB is temporary not available). We would like to give another chance to handle this message and requeue.
handler throws exception due to some corner case (developer mistake) during msg processing. We can requeue this msg up to MAX_ATTEMPTS and then put letter to dead letter queue.

Overall requeue increase the chance to eventually handle message and makes system much more robust and reduce the chance to loose data.

On data safety, RabbitMQ cannot guarantee "exactly once" message delivery

You can overcome (1) and (2) by writing your message processing code so that it is idempotent. But longears has absolutely no way to ensure users do this until the discoverer the related pain for themselves, and I think "this code might run twice on the same message" would surprise new RabbitMQ users especially.

That is the whole point of this thread and comments from me and others. As a user of RabbitMQ client I need to be aware that message can be delivered several times and I need to implement handling logic accordingly (side effects ideally should be idempotent). Otherwise we are getting "fire and forget" style message queue which RabbitMQ is not (primarily).

SyncM1972 · 2020-11-10T12:07:17Z

+1
Hi @atheriel. Thanks a lot for your great rmq client!
We also have use case in which a manual acknowledge / nack signal is required.
We use rmq to enqueue jobs which are then processed on worker servers. We have to be sure that each and every job is processed. If, e.g., the worker server crashes, or a worker process is terminated, any unfinished job has to be processed by another worker. Therefore, it would be great if it would be possible to manually acknowledge / nack the job once it was finished.
Atm, we are using a wrapper. which calls the R code used to perform the job, and, if the job succeeded, acknowledges it.
It would be so nice if we could do this using your longears package.

The existing behaviour is basically the same as always having no_ack=TRUE, in that messages are unequivocally acknowledged. However, it is sometimes desirable to distinguish processing which fails from that which succeeds, so callbacks which throw an error will now nack messages instead -- without requeing them. Since nack without requeue is only *semantically* (not operationally) different than acknowledgment, existing workflows are unlikely to be affected. Documentation has been updated to reflect this behaviour. Part of #2.

atheriel · 2020-11-11T01:45:49Z

#10 implements what I believe is enough machinery to accomplish what is discussed in this thread. Here are a few examples:

# 1. Messages will be acknowledged *after* the callback runs.
amqp_consume(conn, queue, function(msg) {
  message("got msg")
})

# 2. Errors will nack (instead of ack) but not requeue the message.
amqp_consume(conn, queue, function(msg) {
  stop("can't handle this")
})

# 3. Errors will instead nack and requeue the message.
amqp_consume(conn, queue, function(msg) {
  stop("can't handle this, maybe someone else can")
}, requeue_on_error = TRUE)

# 4. Nack behaviour can be manually controlled by signalling with amqp_nack().
amqp_consume(conn, queue, function(msg) {
  if (msg$redelivered) {
    message("can't handle this either, dead-letter it")
    amqp_nack(requeue = FALSE) # Works like stop().
  } else {
    stop("can't handle this, maybe someone else can")
  }
}, requeue_on_error = TRUE)

aleksarias · 2020-11-11T02:22:11Z

😍

On Tue, Nov 10, 2020 at 8:46 PM Aaron Jacobs ***@***.***> wrote: #10 <#10> implements what I believe is enough machinery to accomplish what is discussed in this thread. Here are a few examples: # 1. Messages will be acknowledged *after* the callback runs. amqp_consume(conn, queue, function(msg) { message("got msg") }) # 2. Errors will nack (instead of ack) but not requeue the message. amqp_consume(conn, queue, function(msg) { stop("can't handle this") }) # 3. Errors will instead nack and requeue the message. amqp_consume(conn, queue, function(msg) { stop("can't handle this, maybe someone else can") }, requeue_on_error = TRUE) # 4. Nack behaviour can be manually controlled by signalling with amqp_nack(). amqp_consume(conn, queue, function(msg) { if (msg$redelivered) { message("can't handle this either, dead-letter it") amqp_nack(requeue = FALSE) # Works like stop(). } else { stop("can't handle this, maybe someone else can") } }, requeue_on_error = TRUE) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7TYRHI6PSBWD3ZC2TW4YDSPHUFTANCNFSM4KOYCC5A> .

--

…

-- Alex L. Arias Data Engineer Materials Science & Engineering, B.S. Cell: +1 305 965 0326 aleksarias@gmail.com

atheriel added the discussion label Feb 7, 2020

atheriel mentioned this issue Nov 11, 2020

Ack only successful messages and expose ways to control nacks for users that want them #10

Merged

atheriel closed this as completed in fd4240a Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acknowledgements #2

Acknowledgements #2

dselivanov commented Feb 2, 2020

atheriel commented Feb 3, 2020

dselivanov commented Feb 4, 2020

atheriel commented Feb 6, 2020

dselivanov commented Feb 7, 2020

aleksarias commented Feb 12, 2020

atheriel commented Feb 12, 2020

hampos commented Oct 18, 2020

atheriel commented Oct 19, 2020

dselivanov commented Oct 19, 2020 •

edited

Loading

aleksarias commented Oct 19, 2020 via email

atheriel commented Oct 19, 2020

atheriel commented Oct 19, 2020

dselivanov commented Oct 20, 2020 •

edited

Loading

SyncM1972 commented Nov 10, 2020

atheriel commented Nov 11, 2020

aleksarias commented Nov 11, 2020 via email

Acknowledgements #2

Acknowledgements #2

Comments

dselivanov commented Feb 2, 2020

atheriel commented Feb 3, 2020

dselivanov commented Feb 4, 2020

atheriel commented Feb 6, 2020

dselivanov commented Feb 7, 2020

aleksarias commented Feb 12, 2020

atheriel commented Feb 12, 2020

hampos commented Oct 18, 2020

atheriel commented Oct 19, 2020

dselivanov commented Oct 19, 2020 • edited Loading

aleksarias commented Oct 19, 2020 via email

atheriel commented Oct 19, 2020

atheriel commented Oct 19, 2020

dselivanov commented Oct 20, 2020 • edited Loading

SyncM1972 commented Nov 10, 2020

atheriel commented Nov 11, 2020

aleksarias commented Nov 11, 2020 via email

dselivanov commented Oct 19, 2020 •

edited

Loading

dselivanov commented Oct 20, 2020 •

edited

Loading