Failed messages that are close to the max body limit are retried endlessly #991

danielmarbach · 2023-03-25T10:25:48Z

Describe the bug

Description

This is a tricky situation, and we might not be able to improve the situation fully. This happens when we receive a message that is close to the ASQ message size limit of 64 KB. When we receive such a large message and either send it to the audit queue or send it to the error queue (due to an exception), we enrich the message with more headers. Once that happens, we reach the body size limit and then that messages are indefinitely retried. Because there is no delay and it also cannot be handled by the delayed delivery infrastructure, such a message ends up churning away significant resources of the endpoint which might even cause the endpoint to completely stall.

We already have solutions like the data bus but the problem is once you got such a message thing problematic and you have to somehow manually receive messages from the queue until you reach the poison message to delete it with the corresponding pop receipt while requeue-ing all other messages.

Expected behavior

The message is sent to the correct queue without being indefinitely retried.

Actual behavior

As the message is decorated with headers, there by pushing it over the message limit, the message gets indefinitely retried. This results in consuming significant resources.

Versions

Version 12.0.0, 11.0.0, 10.0.4

Steps to reproduce

Send a very large message (very close to the ASQ message size limit)
Throw an exception so that the message is decorated with additional headers and sent to error queue
Notice that the message does not reach the error queue but gets in a loop of indefinite retries.

Relevant log output

None

Additional information

Describe the suggested solution

Azure storage queue messages have a size limit of 64 KB. When a message that is very close to this size limit is received and is being sent to the audit queue or error queue, the message is enriched with more headers. This pushes the size of the message to exceed the azure storage queue size limit and the message ends up being indefinitely retried. In an attempt to try to send such a message to the error queue, the message is unwrapped and rewrapped with minimal headers - FailedQ and ExceptionType. The FailedQ header is required by ServiceControl or it will end up being a failed error import. The ExceptionType is highly desirable as it will allow grouping. The failure group view by exception type is the default failed message view in ServicePulse, which is the common way for users to interact with failed messages. If this attempt also fails, the message is sent to the error queue without any headers.

Describe alternatives you've considered

Some sort of oversized message handler like we had once in ASB?
Native databus support inside the transport similar to what SQS does with S3
Log warn/error and at least use the client side native delivery to move the message with backoff to give the users awareness of this problem
Eventually backup the message somehow and ack it?
Allow specifying a DLQ for the endpoint and move the message there (my favorite)
See also MaxDeliveryCount set to int.MaxValue can result in infinite retries NServiceBus.Transport.AzureServiceBus#138 (comment)

Backported to

danielmarbach mentioned this issue May 12, 2023

Make sure messages close to the size limit can be moved as is to the error queue #1002

Merged

WilliamBZA closed this as completed in #1002 May 31, 2023

soujay mentioned this issue Jun 1, 2023

Improve handling of messages that are close to the max body limit #1019

Merged

soujay added the Feature label Jun 1, 2023

soujay added this to the 12.0.1 milestone Jun 1, 2023

soujay added Bug and removed Feature labels Jun 6, 2023

soujay changed the title ~~Improve handling of messages that are close to the max body limit~~ Prevent indefinite retries when messages fail that are close to the max body limit Jun 6, 2023

This was referenced Jun 6, 2023

Failed messages that are close to the max body limit are retried endlessly #1022

Merged

Failed messages that are close to the max body limit are retried endlessly #1023

Merged

soujay changed the title ~~Prevent indefinite retries when messages fail that are close to the max body limit~~ Failed messages that are close to the max body limit are retried endlessly Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed messages that are close to the max body limit are retried endlessly #991

Failed messages that are close to the max body limit are retried endlessly #991

danielmarbach commented Mar 25, 2023 •

edited

Failed messages that are close to the max body limit are retried endlessly #991

Failed messages that are close to the max body limit are retried endlessly #991

Comments

danielmarbach commented Mar 25, 2023 • edited

Describe the bug

Description

Expected behavior

Actual behavior

Versions

Steps to reproduce

Relevant log output

Additional information

Describe the suggested solution

Describe alternatives you've considered

Backported to

danielmarbach commented Mar 25, 2023 •

edited