Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[*ERS] AMQP Prefetch, Acknowledgement Deadlock, Reconnect #4160

Closed
Integration-IT opened this issue Oct 21, 2023 · 2 comments
Closed

[*ERS] AMQP Prefetch, Acknowledgement Deadlock, Reconnect #4160

Integration-IT opened this issue Oct 21, 2023 · 2 comments
Assignees

Comments

@Integration-IT
Copy link

Hi @ionutboangiu ,

This is related to #4146 (basic.ack exception).
The last patch is working fine.

New behaviors seems to appear.

  • *amqp_json_map does not have a max prefetch count to limit the number of unacknowledged messages on a channel (or connection) when consuming (aka "prefetch count").
    • This is done by setting a "prefetch count" value using the basic.qos method.
  • After a while I have Unacked message similar to dead lock pending states and i can't see any connections from ERS.
  • If a consumer does not ack its delivery for more than the timeout value (30 minutes by default), its channel will be closed with a PRECONDITION_FAILED channel exception.
  • When the channel is closed ERS module will never connect again to subscribe and consume new ready messages.
  • Finally I have to restart manually the engine again.

About the last patch:

The reader is now closed immediately if the message delivery
channel closes. Therefore, it prevents an endless loop by avoiding
continuous consumption from empty or closed channels.

Because after a while the queue is not empty, does the module could auto reconnect with a timer option to consume new ready messages ?

ionutboangiu added a commit to ionutboangiu/cgrates that referenced this issue Nov 6, 2023
Implemented functionality that handles reconnecting to the amqp server
and reinitializing the amqp channel in case of errors and timeouts. This
is handled by a goroutine created in the client constructor (it also
handles the initial connect/init).

Reconnects and reinits will use a fibonacci backoff strategy, and the
attempt amount and max waiting interval can be adjusted by the
'reconnects' and 'max_reconnect_interval' config options.

Messages that fail processing are now dropped instead of being requeued,
preventing infinite processing loops. However, this means that the
messages are lost. Handling failed messages will need to be addressed
separately.

'concurrent_requests' will now set the prefetch count. Setting the
prefetch count using the Qos function was able to replace our old
approach that was using channels. Default value is 1024 which,
according to the rabbitmq docs, 'runs into the law of diminishing
returns'. The recommended value is between 100-300. Source:
https://www.rabbitmq.com/confirms.html#channel-qos-prefetch-throughput

Fix test compilation errors caused by these changes.

References cgrates#4160
ionutboangiu added a commit to ionutboangiu/cgrates that referenced this issue Nov 6, 2023
Implemented functionality that handles reconnecting to the amqp server
and reinitializing the amqp channel in case of errors and timeouts. This
is handled by a goroutine created in the client constructor (it also
handles the initial connect/init).

Reconnects and reinits will use a fibonacci backoff strategy, and the
attempt amount and max waiting interval can be adjusted by the
'reconnects' and 'max_reconnect_interval' config options.

Messages that fail processing are now dropped instead of being requeued,
preventing infinite processing loops. However, this means that the
messages are lost. Handling failed messages will need to be addressed
separately.

'concurrent_requests' will now set the prefetch count. Setting the
prefetch count using the Qos function was able to replace our old
approach that was using channels. Default value is 1024 which,
according to the rabbitmq docs, 'runs into the law of diminishing
returns'. The recommended value is between 100-300. Source:
https://www.rabbitmq.com/confirms.html#channel-qos-prefetch-throughput

Fix test compilation errors and failing tests caused by these changes.

References cgrates#4160
ionutboangiu added a commit to ionutboangiu/cgrates that referenced this issue Nov 8, 2023
Implemented functionality that handles reconnecting to the amqp server
and reinitializing the amqp channel in case of errors and timeouts. This
is handled by a goroutine created in the client constructor (it also
handles the initial connect/init).

Reconnects and reinits will use a fibonacci backoff strategy, and the
attempt amount and max waiting interval can be adjusted by the
'reconnects' and 'max_reconnect_interval' config options.

Messages that fail processing are now dropped instead of being requeued,
preventing infinite processing loops. However, this means that the
messages are lost. Handling failed messages will need to be addressed
separately.

'concurrent_requests' will now set the prefetch count. Setting the
prefetch count using the Qos function was able to replace our old
approach that was using channels. Default value is 1024 which,
according to the rabbitmq docs, 'runs into the law of diminishing
returns'. The recommended value is between 100-300. Source:
https://www.rabbitmq.com/confirms.html#channel-qos-prefetch-throughput

Fix test compilation errors and failing tests caused by these changes.

References cgrates#4160
danbogos pushed a commit that referenced this issue Nov 8, 2023
Implemented functionality that handles reconnecting to the amqp server
and reinitializing the amqp channel in case of errors and timeouts. This
is handled by a goroutine created in the client constructor (it also
handles the initial connect/init).

Reconnects and reinits will use a fibonacci backoff strategy, and the
attempt amount and max waiting interval can be adjusted by the
'reconnects' and 'max_reconnect_interval' config options.

Messages that fail processing are now dropped instead of being requeued,
preventing infinite processing loops. However, this means that the
messages are lost. Handling failed messages will need to be addressed
separately.

'concurrent_requests' will now set the prefetch count. Setting the
prefetch count using the Qos function was able to replace our old
approach that was using channels. Default value is 1024 which,
according to the rabbitmq docs, 'runs into the law of diminishing
returns'. The recommended value is between 100-300. Source:
https://www.rabbitmq.com/confirms.html#channel-qos-prefetch-throughput

Fix test compilation errors and failing tests caused by these changes.

References #4160
@ionutboangiu
Copy link
Collaborator

Hello,

The feedback was much appreciated. The commit referenced throughout this issue added reconnect support for the AMQP event reader among other things detailed in the commit message. You can now update to latest master and test it. I hope it addressed all your concerns. Let me know if you have any other suggestions.

Thanks,
Ionuț

@Integration-IT
Copy link
Author

Thank @ionutboangiu ,
No issue after the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants