You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a tricky situation, and we might not be able to improve the situation fully. This happens when we receive a message that is close to the ASQ message size limit of 64 KB. When we receive such a large message and either send it to the audit queue or send it to the error queue (due to an exception), we enrich the message with more headers. Once that happens, we reach the body size limit and then that messages are indefinitely retried. Because there is no delay and it also cannot be handled by the delayed delivery infrastructure, such a message ends up churning away significant resources of the endpoint which might even cause the endpoint to completely stall.
We already have solutions like the data bus but the problem is once you got such a message thing problematic and you have to somehow manually receive messages from the queue until you reach the poison message to delete it with the corresponding pop receipt while requeue-ing all other messages.
Expected behavior
The message is sent to the correct queue without being indefinitely retried.
Actual behavior
As the message is decorated with headers, there by pushing it over the message limit, the message gets indefinitely retried. This results in consuming significant resources.
Versions
Version 12.0.0, 11.0.0, 10.0.4
Steps to reproduce
Send a very large message (very close to the ASQ message size limit)
Throw an exception so that the message is decorated with additional headers and sent to error queue
Notice that the message does not reach the error queue but gets in a loop of indefinite retries.
Relevant log output
None
Additional information
Describe the suggested solution
Azure storage queue messages have a size limit of 64 KB. When a message that is very close to this size limit is received and is being sent to the audit queue or error queue, the message is enriched with more headers. This pushes the size of the message to exceed the azure storage queue size limit and the message ends up being indefinitely retried. In an attempt to try to send such a message to the error queue, the message is unwrapped and rewrapped with minimal headers - FailedQ and ExceptionType. The FailedQ header is required by ServiceControl or it will end up being a failed error import. The ExceptionType is highly desirable as it will allow grouping. The failure group view by exception type is the default failed message view in ServicePulse, which is the common way for users to interact with failed messages. If this attempt also fails, the message is sent to the error queue without any headers.
Describe alternatives you've considered
Some sort of oversized message handler like we had once in ASB?
Native databus support inside the transport similar to what SQS does with S3
Log warn/error and at least use the client side native delivery to move the message with backoff to give the users awareness of this problem
Eventually backup the message somehow and ack it?
Allow specifying a DLQ for the endpoint and move the message there (my favorite)
soujay
changed the title
Improve handling of messages that are close to the max body limit
Prevent indefinite retries when messages fail that are close to the max body limit
Jun 6, 2023
soujay
changed the title
Prevent indefinite retries when messages fail that are close to the max body limit
Failed messages that are close to the max body limit are retried endlessly
Jun 7, 2023
Describe the bug
Description
This is a tricky situation, and we might not be able to improve the situation fully. This happens when we receive a message that is close to the ASQ message size limit of 64 KB. When we receive such a large message and either send it to the audit queue or send it to the error queue (due to an exception), we enrich the message with more headers. Once that happens, we reach the body size limit and then that messages are indefinitely retried. Because there is no delay and it also cannot be handled by the delayed delivery infrastructure, such a message ends up churning away significant resources of the endpoint which might even cause the endpoint to completely stall.
We already have solutions like the data bus but the problem is once you got such a message thing problematic and you have to somehow manually receive messages from the queue until you reach the poison message to delete it with the corresponding pop receipt while requeue-ing all other messages.
Expected behavior
The message is sent to the correct queue without being indefinitely retried.
Actual behavior
As the message is decorated with headers, there by pushing it over the message limit, the message gets indefinitely retried. This results in consuming significant resources.
Versions
Version 12.0.0, 11.0.0, 10.0.4
Steps to reproduce
Relevant log output
None
Additional information
Describe the suggested solution
Azure storage queue messages have a size limit of 64 KB. When a message that is very close to this size limit is received and is being sent to the audit queue or error queue, the message is enriched with more headers. This pushes the size of the message to exceed the azure storage queue size limit and the message ends up being indefinitely retried. In an attempt to try to send such a message to the error queue, the message is unwrapped and rewrapped with minimal headers - FailedQ and ExceptionType. The FailedQ header is required by ServiceControl or it will end up being a failed error import. The ExceptionType is highly desirable as it will allow grouping. The failure group view by exception type is the default failed message view in ServicePulse, which is the common way for users to interact with failed messages. If this attempt also fails, the message is sent to the error queue without any headers.
Describe alternatives you've considered
Backported to
The text was updated successfully, but these errors were encountered: