New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Successful tasks are not acked if acks_late=True on warm shutdown. #3802
Comments
|
I wonder if this is related to #3796 |
|
We're running into this issue as well. All long-running tasks that we kill gracefully mid-execution will get requeued, even if they complete successfully. Has anyone made progress on this? |
|
I can confirm this is happening with Celery 4.0.2 (Redis backend) |
|
FWIW, I just reproduced this problem with rabbitmq and no redis backend. |
|
I'm using RabbitMQ and I notice as well that warm shut downs will requeue tasks with acks_late=True. However, it seems to me that warm shut down is actually killing the task as well, not really letting it finish, which is probably the reason it is queued back. My question is, shouldn't warm shutdown ALWAYS allow the task to finish, even with acks_late=True and also ack the task once it completes, rather than killing it mid progress? I couldn't find more info about warm shutdowns and acks_late=True behavior anywhere, can anyone shine some light here? |
|
After testing more, doing a warm shutdown without killing the actual workers, I get the task succeed log and yet it's never acked, and it is queued again once celery is started. This is quite bad and so far the only work around for some "critical" tasks that must always run and yet avoid duplicates as much as possible, is to store w/e task id it was queued with, and clear the task id once the job is done, and then always check for a non empty task id when the task starts. Are there any other work arounds for this? Is this something that might be fixed soon? |
|
Similar issue here, although our configuration is to ack early (the default). What seems to have happened is that a task was pre-fetched by the worker, the worker received a TERM, the task started executing just after and successfully finished, the worker restarted and when it came back it received the same task again. Here's a trace of our logs to illustrate: |
|
if anyone can verify the issue on latest master branch using the latest dependencies? |
|
I'm able to reproduce with latest master (f3ef0df) and rabbitmq. Here's the output, I ran the same kind of (Also hi there @alexmic! 👋) |
|
@sww I just do a tricky work and fixed this problem, please check https://gist.github.com/lovemyliwu/af5112de25b594205a76c3bfd00b9340 |
|
I battled this for a few days, googling like crazy, tripping through python code, and thinking I must have screwed something up. Finally, I found finding the magic spell that seems to have patched this issue on my end: I downgraded celery to 4.1.1. Note: patch != solution Hope this helps someone else. |
|
so this is a regression |
|
I'm getting this problem with celery 4.2.0
|
|
Might be fixed by celery/kombu#926 |
|
This is still reproducible on celery 4.4.0rc3. From the conversation, it doesn't state which commit closed it. celery/kombu#926 doesn't seem related. |
|
thanks for your report david |
|
@davidt99 Do you have the results backend configured? If yes, what does the task state in the result store say? Just enquiring to check the behaviour for one of my use-cases. |
|
The celery output: We have |
|
Not able to reproduce this issue on celery version
|
|
I can reproduce this on 4.4.2 but only if I use a RabbitMQ broker. If I use a Redis broker it works as expected. |
|
Can reproduce with Celery 5.0.4 as well: |
|
Is this issue resolved for celery 5.3.1 ??? |
Checklist
celery -A proj reportin the issue.(if you are not able to do this, then at least specify the Celery
version affected).
masterbranch of Celery.Steps to reproduce
Tasks that finish and succeed after the warm shutdown signal are not acked if
acks_late=True. To reproduce:app.py:$ python -c "from app import foo; foo.apply_async()"(Note the second to last line even after the log says the task succeded.)
Expected behavior
The task is acked before exit.
Actual behavior
The task is not acked and restored. The next time the worker starts up, the same task will be executed again.
The text was updated successfully, but these errors were encountered: