[*ERS] AMQP Prefetch, Acknowledgement Deadlock, Reconnect #4160

Integration-IT · 2023-10-21T10:22:36Z

This is related to #4146 (basic.ack exception).
The last patch is working fine.

New behaviors seems to appear.

*amqp_json_map does not have a max prefetch count to limit the number of unacknowledged messages on a channel (or connection) when consuming (aka "prefetch count").
- This is done by setting a "prefetch count" value using the basic.qos method.
After a while I have Unacked message similar to dead lock pending states and i can't see any connections from ERS.
If a consumer does not ack its delivery for more than the timeout value (30 minutes by default), its channel will be closed with a PRECONDITION_FAILED channel exception.
When the channel is closed ERS module will never connect again to subscribe and consume new ready messages.
Finally I have to restart manually the engine again.

About the last patch:

The reader is now closed immediately if the message delivery
channel closes. Therefore, it prevents an endless loop by avoiding
continuous consumption from empty or closed channels.

Because after a while the queue is not empty, does the module could auto reconnect with a timer option to consume new ready messages ?

Implemented functionality that handles reconnecting to the amqp server and reinitializing the amqp channel in case of errors and timeouts. This is handled by a goroutine created in the client constructor (it also handles the initial connect/init). Reconnects and reinits will use a fibonacci backoff strategy, and the attempt amount and max waiting interval can be adjusted by the 'reconnects' and 'max_reconnect_interval' config options. Messages that fail processing are now dropped instead of being requeued, preventing infinite processing loops. However, this means that the messages are lost. Handling failed messages will need to be addressed separately. 'concurrent_requests' will now set the prefetch count. Setting the prefetch count using the Qos function was able to replace our old approach that was using channels. Default value is 1024 which, according to the rabbitmq docs, 'runs into the law of diminishing returns'. The recommended value is between 100-300. Source: https://www.rabbitmq.com/confirms.html#channel-qos-prefetch-throughput Fix test compilation errors caused by these changes. References cgrates#4160

Implemented functionality that handles reconnecting to the amqp server and reinitializing the amqp channel in case of errors and timeouts. This is handled by a goroutine created in the client constructor (it also handles the initial connect/init). Reconnects and reinits will use a fibonacci backoff strategy, and the attempt amount and max waiting interval can be adjusted by the 'reconnects' and 'max_reconnect_interval' config options. Messages that fail processing are now dropped instead of being requeued, preventing infinite processing loops. However, this means that the messages are lost. Handling failed messages will need to be addressed separately. 'concurrent_requests' will now set the prefetch count. Setting the prefetch count using the Qos function was able to replace our old approach that was using channels. Default value is 1024 which, according to the rabbitmq docs, 'runs into the law of diminishing returns'. The recommended value is between 100-300. Source: https://www.rabbitmq.com/confirms.html#channel-qos-prefetch-throughput Fix test compilation errors and failing tests caused by these changes. References cgrates#4160

Implemented functionality that handles reconnecting to the amqp server and reinitializing the amqp channel in case of errors and timeouts. This is handled by a goroutine created in the client constructor (it also handles the initial connect/init). Reconnects and reinits will use a fibonacci backoff strategy, and the attempt amount and max waiting interval can be adjusted by the 'reconnects' and 'max_reconnect_interval' config options. Messages that fail processing are now dropped instead of being requeued, preventing infinite processing loops. However, this means that the messages are lost. Handling failed messages will need to be addressed separately. 'concurrent_requests' will now set the prefetch count. Setting the prefetch count using the Qos function was able to replace our old approach that was using channels. Default value is 1024 which, according to the rabbitmq docs, 'runs into the law of diminishing returns'. The recommended value is between 100-300. Source: https://www.rabbitmq.com/confirms.html#channel-qos-prefetch-throughput Fix test compilation errors and failing tests caused by these changes. References #4160

ionutboangiu · 2023-11-09T10:30:24Z

Hello,

The feedback was much appreciated. The commit referenced throughout this issue added reconnect support for the AMQP event reader among other things detailed in the commit message. You can now update to latest master and test it. I hope it addressed all your concerns. Let me know if you have any other suggestions.

Thanks,
Ionuț

Integration-IT · 2023-12-17T19:49:11Z

Thank @ionutboangiu ,
No issue after the update.

danbogos assigned ionutboangiu Oct 26, 2023

Integration-IT closed this as completed Dec 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[*ERS] AMQP Prefetch, Acknowledgement Deadlock, Reconnect #4160

[*ERS] AMQP Prefetch, Acknowledgement Deadlock, Reconnect #4160

Integration-IT commented Oct 21, 2023

ionutboangiu commented Nov 9, 2023

Integration-IT commented Dec 17, 2023

[*ERS] AMQP Prefetch, Acknowledgement Deadlock, Reconnect #4160

[*ERS] AMQP Prefetch, Acknowledgement Deadlock, Reconnect #4160

Comments

Integration-IT commented Oct 21, 2023

ionutboangiu commented Nov 9, 2023

Integration-IT commented Dec 17, 2023