-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the messenger queue layout #461
Conversation
New queue system seems to work fine ("ready" in the RabbitMQ graph is indeed misleading and might need to be documented correctly in your Mbin docs): We also need to make a special release out of this Mbin release, since it's not a simple post-upgrade release. More then only checking "Admin guide changes" in our release table... |
This comment was marked as off-topic.
This comment was marked as off-topic.
About the queue size: yes it is annoying that the retried messages with their delay count as "ready messages". I put the delays up top: 0 -> 10 min -> 1h 40min -> 16h 40min -> ~1 week -> 1 week. So a message that is constantly returning a timeout for example will stay in the "ready messages" for about 2 weeks and one day. I do think that this is reasonable. It does not hurt to have the messages stay there and wait a long time to be retried. As for the huge publish rates: that comes from a circulating message, that is constantly dispatching itself again. I'd be suprised if your publish rate has gone down after reverting back to main, as I do not see a reason why they should... |
I agree, as long as it make sense on errors which can get resolved maybe over time. But errors that are fatal and you know a retry will not help, should not enter this queue imo. Is there a cronjob way to clean-up |
Fatal errors can now really use the "UnrecoverableMessageException and will end up in
Yes. It is in the sql table and there is a DELETE FROM messenger_messages WHERE queue_name='dead'; |
I'm OK with the changes as long as you document them. Extend for example: https://github.com/MbinOrg/mbin/blob/main/FAQ.md#how-to-clean-up-all-failed-messages. With your SQL query. But I was actually hoping for some kind of automated clean-up without running this Also try to explain in the admin / docker / FAQ guide what the expected behavior now is (with retries, higher amount of messages 'ready'). So we can point admins to this kind of docs. TLDR: Explain your changes to server admins, so we do not get overwhelmed with support. Please :) |
Agree with @melroy89's comments, and we need to put big RED letters about updating the supervisor config on the next release. Otherwise, looks good! |
a04ab7c
LGTM. But we need to add a special notice on the release page.. This is a change for server admins. |
Have you updated the supervisor config and restarted all the processes? It looks to me like you messengers are constantly crashing. Supervisor should not be at the top in htop 😄 |
It turns out the problem was my config set to /mbin/ while my setup still uses /kbin |
This change should improve the message handling. Messages are being retried in case of a temporary error, the
failed
queue is being worked on with a newdead
queue that stores dead messages and can probably be cleaned on a regular basis.The normal message handler will try 5 times before sending a message to the
failed
queue. It starts with a 10min delay and multiplies it by 10 each time the message fails.0 -> 10 min -> 1h 40min -> 16h 40min -> ~1 week -> 1 week. It caps the delay at 1 week
The failed queue keeps the delay at the old 30 minutes and the multiplier of 2. We may want to change that. The numbers are just arbitrary. I just copied them from the old settings. And if a message fails in the failed queue it goes to the dead queue.
The messenger supervisor config changed, so it just has one worker that handles all the queues with descending priority. That means the normal
async
queue is getting worked at first, then theasync_ap
and thenfailed
queue.The docker compose is changed accordingly