-
Notifications
You must be signed in to change notification settings - Fork 93
Discord update_all_groups/update_all_nicknames running out of retries #1064
Comments
Rate limiting in celery is a total PITA. I've looked into it before and the internet says there's no real good way. The nature of group updates prevents auth from knowing to queue and process them sequentially. The only way I could see this being more appropriately handled is if we routed all discord-related tasks to a single queue and rate limited a single worker pulling from it. That would be a big documentation change but not too much code work. I just don't trust users to read the docs. |
What about running the updates as a single task? Or is that a similarly awkward change? |
It's the "as a single task" part which is problematic. Whever someone's group changes on auth it triggers a service group update. These service group updates don't apply to all users, only the user who had a group change. As auth can't predict when groups will change it can't bundle these requests together to perform them as a single task. Unfortunately for large organizations the way we currently handle Discord is definitely not optimal. A proposed solution has been to write our own "worker" of sorts based on an async discord client package that already handles rate limiting properly, and implement our own messaging broker to send the requests across. That's a lot of work but would probably fix it. I'd poked around at it a few months ago but work stalled when the XML API deadline got close I'm wondering if a different approach would be even better: instead of writing a web API to send requests across, why not create a celery "worker" that pulls from a normal queue (so I can skip this whole messaging broker problem) and knows to stall processing of specific types of tasks when it hits a rate limit. I'll have to do some more reading but I think that should be possible, and as I type this out I'm beginning to think that's the better answer. |
Why not switch to a task worker that can handle rate limiting properly instead of hacky work arounds? |
Celery can (almost) handle it properly through bootsteps. This sample code almost does what we want by scaling rate limits by number of workers. Unfortunately Discord has dynamic rate limits so this specific example wouldn't work for us, but the idea is there. |
Maybe I am missing something, but wouldn't this celery task option work if it was exposed at the task level? http://docs.celeryproject.org/en/latest/userguide/tasks.html Task.rate_limit If this is None no rate limit is in effect. If it is an integer or float, it is interpreted as “tasks per second”. The rate limits can be specified in seconds, minutes or hours by appending “/s”, “/m” or “/h” to the value. Tasks will be evenly distributed over the specified time frame. Example: “100/m” (hundred tasks a minute). This will enforce a minimum delay of 600ms between starting two tasks on the same worker instance. Default is the task_default_rate_limit setting: if not specified means rate limiting for tasks is disabled by default. Note that this is a per worker instance rate limit, and not a global rate limit. To enforce a global rate limit (e.g., for an API with a maximum number of requests per second), you must restrict to a given queue. |
I've noticed that, if you have a large number of discord members, when this task is running there are a lot of failures (which strangely do not appear in the admin notifications).
For example:
This seems to happen because each nickname/group update is raised as a task and run in parallel. When the ratelimit is reached, tasks all receive the same backoff time and therefore they all retry at the same time, which hits the rate limit again. This quickly causes the maximum number of retries to be hit.
I don't know a huge amount about celery but this should be fixable by somehow getting these updates to run sequentially.
The text was updated successfully, but these errors were encountered: