-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revoked tasks storage & communication improvements #6075
Comments
Note I have opened a separate issue regarding what appears to be a bug in the revoke store (LimitedSet). It's a bug in terms of the documentation (the original PR has comments which suggests the data logic was implemented in such a way as to continue to be performant. I'm unsure if the performance issues are still relevant in py3 (or if other choices exist that we could switch to). |
Would be glad if you can push it forward |
Small update. I've added a set of custom control commands to my project (including purging & clearing) for revoked info on workers. I don't know that I've run it against the 500k default, but certainly have run it against a set somwhere on the higher range. It seems fast! I was looking into leveraging sortedcollections But still, both were quite fast for add/update/purge. I purged all values, and ran another 500k onto an already filled data. Those operations (at least for I may run some So, mostly what I'm saying right now is although I trust the implementer(s) on the current Edit: On 5m data set add, it took 9s for ours, and 17s for my new implementation. |
I would say you should keep using and improving it. later optional cythonization can be applied if needed. |
@auvipy thank you for reading through so far! I do concur re:add non-laziness purging options and/or cythonization/other speed ups; I was hoping that we could simply improve the speed even way above 500k, to be able to do non-lazy purging, to get around some of the storage & communication issues. But, knowing that most projects probably won't see/use my github gist for purge_revoked, we still have a process & tech issue (esp for redis, but probably for RabbitMQ, too), with all of this data flying through the broker on worker(s) startup. For instance, if we introduced/plugin for an alternative "Persistent" revoked backend, (not using the Currently to me that feels like a much neater & scalable approach just in terms of reducing the issues around I am interested in the leader election/framework, but I don't really know much about it. I'd like to learn more. I have looked through some of the docs but haven't necessarily found what I'm looking for -- if you happen to know/wish to elaborate on the leader system, please do/send over doc/code links. I do think it could be really cool/useful to leverage it, if it's an appropriate solution. |
I have some sparse knowledge on them but will review and come with a compiled collection. But need some time as I have some backlog to clear in celery related projects. In this time of global crisis, I can dedicate way more time in celery then before and this will be true for the next few months I guess. But you shouldn't stop your trial n error R&D on these topics. |
you can check this issue to have some insight #6067 |
Pinging this because it would be amazing to have a redis option for statedb. Been googling for a bit and found this issue. |
we need champions for the effort |
Checklist
for similar or identical enhancement to an existing feature.
for existing proposed enhancements.
to find out if the if the same enhancement was already implemented in the
master branch.
(If there are none, check this box anyway).
Related Issues and Possible Duplicates
Related Issues
Possible Duplicates
Brief Summary
Part of this is a question about the current implementation around revoked task storage and communication, and then depending on the discussion, possible improvements (including documentation).
This conversation is more related to the redis broker (implementation discussion may be more related to it, for brevity & consistency), but I imagine the same issue can be seen on other brokers.
When a worker starts up, it asks (via
mingle
) all of the workers in the system for their revoked task lists. It also sends it own in that message. It then receives the revoked task lists from all workers that were able to respond, and then it merges those lists one by one into its own.For a fairly healthy celery cluster, this means that it sends it revoked tasks to all other workers (if persistent mode is disabled there will be no revoked tasks yet), and it also then receives what will likely be 99% identical revoked tasks lists from all other workers in the cluster (other than the ones that are just coming up like this worker).
If for instance you have the default 500k revoked tasks list fully saturated (or really, just any large number), that is a lot of messages and data flying through the system at about the same time. In redis' case, depending on the number of existing workers in the cluster (whether they're being killed or not), the number of new workers coming up, and how many revoked tasks are generally in each [existing and fully started up] worker, if the redis' size isn't "large" (under several GB), we risk a redis OOM error (my company has seen this so many times, and now I'm working through extensive debugging, tests, and improvements). This can cripple a celery cluster, and any other process that relies on celery or uses the same broker.
So, part of this is a question in implementation, and the other part is, assuming we wish to make alternative implementations, the discussion of those.
Design
Architectural Considerations
None
I don't think persistent mode should be discussed here (at least in its current standard implementation); I think, I can verify of course, that it still does the send hello revoked & clocks sync messaging at start up (it may have revoked IDs in this case).
If the general assumption (and this be considered to be a non-starter) is that a healthy cluster will have each worker sharing the same or very similar revoked task IDs, then as above, we're doing a lot of work for very little (if any) gain.
Celery has leader election. Could we instead use the leader to be the process (or let it delegate a random worker on the fly) that responds back with its revoked task IDs?
We would still have each worker hold its local copy of the revoked task IDs, but the transmission/communication via
mingle
will shrink down in complexity.That's assuming at least 2 things:
Another approach for storing these IDs (and their times), would be to allow having it support different backends (e.g. redis, dynamodb, elasticsearch, etc), and so the IDs are stored there, and TTL/expiration (and probably also max len) could be supported the same as ever -- and better than celery currently does it.
Having it stored elsewhere would be generally much better in terms of data storage, and also would remove the need to keep the data locally. That said, data locality is not necessarily a bad thing, and so I could see on worker startup syncing against the backend, and when revokes go through, also send them to the backend. This should get rid of (I think?) the startup thundering herd issue.
Yes it true projects should try to write their code and tasks in such a way that revokes aren't happening too often. It is a valiant goal -- reduction of their bugs, changes to their retry policies, etc, but the tooling should also try to improve itself if it can. This feels nearly like the tool is broken (it's not), but it has that side-effect. Deleting pidbox queues is not a good policy either -- when workers die/stop listening to the broker because it responds with OOM, you then have orphaned pidbox [reply] queues just...sitting there. That's more of a non-amqp issue, but it's a huge use case.
I may try to reduce some of the initial issue creation -- there's a lot of info here; but so far I haven't seen any "simple" solution to this, and there's a lot of context at play here.
Proposed Behavior
Proposed UI/UX
Diagrams
N/A
Alternatives
None
The text was updated successfully, but these errors were encountered: