-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(clean): use ZRANGEBYSCORE to improve performance #2363
Conversation
Thanks for the PR. |
@manast I think the PR #2326 only sped up the cases where the cleared job queue was a list. We're clearing the "completed" queue, which isn't a list, so that optimization doesn't apply.
This change allows those scripts to exit more quickly by taking advantage of the fact that the jobs are ordered by creation time. If we encounter one job that's after Let me know if you want me to add more comments in the code! Happy to do that too. |
When you use limit in the sets case this is the method used:
So the range will be limited to the "limit" parameter you send to the command, there should be no issue as long as this limit is small enough, maybe between the order of magnitudes 10k to 100k. If this is not the case then, could you provide some code that shows the contrary it please? |
Got it. Here's the specific example that I dealt with. I debugged this by manually running the redis-cli --ldb --eval node_modules/bull/lib/commands/cleanJobsInSet-3.lua bull:sendEmail:completed bull:sendEmail:priority bull:sendEmail:limiter , bull:sendEmail: 1653036781370 true 100 completed To test it without the fix, just change the "true" to "false" in that command. This is equivalent to the Javascript code: queue.clean(7 * 24 * 60 * 60 * 1000, 'completed', 100); With even this very small limit, here's what happens when all of the jobs are after
I verified this is the behavior by adding This behavior does happen even with a larger limit and slow down the system, although it's somewhat faster since the batches are larger. I would consider removing the new parameter and just making |
Ok, so to sumarize, the problem is when you want to delete completed/failed jobs having a grace period that happens to have a huge amount of jobs in that grace period.
|
Here the ZRANGE reference: |
Got it. @manast - are you sure you'd like to use ZRANGE (or ZRANGEBYSCORE, for backwards compatibility)? I can push a new PR. It's slightly non-backwards compatible, as the score is the creation time, and the current limiting is done by any timestamp (finishedOn and processedOn take precedence). Also, just to be clear: this PR is for the old |
The score for completed/failed jobs should be the same as finishedOn. For delayed jobs it represents the time left to be delayed, I do not think it makes any big difference in which order the delayed jobs are deleted. |
@manast I changed this to use the suggested solution. Take a look! |
@manast - if you have some time, can you take a look and see if this is ready to merge? We'd love to switch off our fork and be able to just install the normal package again. I implemented all your recommended changes and added much better tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## [4.8.5](v4.8.4...v4.8.5) (2022-07-27) ### Performance Improvements * **clean:** use ZRANGEBYSCORE to improve performance ([#2363](#2363)) ([3331188](3331188))
🎉 This PR is included in version 4.8.5 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Motivation
When running
clean
on large queues, it can take a long time when tons of entries are within the grace period. The cleanJobsInSet Lua script never reaches the limit and endlessly iterates through items. Since redis scripts are atomic, this can lock up the rest of the server.Fix
For sorted sets, the timestamp is just the score. As such, if there's a limit, we can just immediately get all the relevant ones using
ZRANGEBYSCORE
in one iteration that also filters out the timestamps within the grace period.Testing
Added automated tests