New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pseudo random jitter can cause dead lock #4316
Comments
* Add options for exponential backoff with task autoretry * Add test for exponential backoff * closer to a fixed test * Move autoretry backoff functionality inside run wrapper * Add a test for jitter * Correct for semantics of `random.randrange()` `random.randrange()` treats the argument it receives as just *outside* the bound of possible return values. For example, if you call `random.randrange(2)`, you might get 0 or 1, but you'll never get 2. Since we want to allow the `retry_jitter` parameter to occasionally apply no jitter at all, we need to add one to the value we pass to `randrange()`, so that there's a chance that we receive that original value back. * Put side_effect on patch lines * Fix flake8 * Add celery.utils.time.get_exponential_backoff_interval * Use exponential backoff calculation from utils in task * Update docs around retry_jitter * Remove unnecessary random.choice patching * Update task auto-retry documentation * PEP8: remove unused import * PEP8: remove trailing whitespace * PEP8: Fix E123 warning
|
As far as I understand from the Python random module documentation, the majority of systems will support |
|
I agree, this is a super duper edge case ;) I also didn't run into it yet, but if you have billions of tasks this might just happen. |
|
will reopen if the issue has a real effect. |
Checklist
celery -A proj reportin the issue.(if you are not able to do this, then at least specify the Celery
version affected).
masterbranch of Celery.Versions affected: None (not yet released)
On master: Yes, added in 0d5b840
Steps to reproduce
celery/celery/utils/time.py
Lines 390 to 404 in d59518f
Image you have two workers that pull the same task from a queue at the same time and fail, do tue a lock for example. If you would retry and backoff, we have the
jitterthat should prevent those tasks to be executed at the same time and thus preventing a deadlock. Butrandom.seedwill default to the current system time, if there is no better function provided. This can mean that two if two workers would fail at the same milli-second, they would in fact end up in a dead lock, since thejitterwill be the same, since the seed is the same.Expected behavior
In python 3.5 we have secrets, which would prevent this. In Python 2 we have to find an alternative. If we have a unique identifier for a worker, we can add this to the epoch as a new and unique seed.
Actual behavior
Two or more tasks could be dead locked because of the pseudo random
jitter.The text was updated successfully, but these errors were encountered: