New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task ID started twice in rapid succession #176

Closed
fredboyle opened this Issue Aug 27, 2012 · 19 comments

Comments

Projects
None yet
9 participants
@fredboyle
Copy link

fredboyle commented Aug 27, 2012

A scheduled (ETA) task was started twice in rapid succession which caused some issues in our app. Didn't think the server allowed a task ID to be run more than once unless it was a retry situation.

How is this possible and how do I prevent it?

Version and log info:

django-celery==3.0.6, using Redis (redis==2.4.13)

[2012-08-26 22:30:00,255: WARNING/PoolWorker-1] Starting task 06b83735-474c-4a2a-a0c6-935fec602683 (2012-08-27 02:30:00.243931+00:00)
[2012-08-26 22:30:00,258: WARNING/PoolWorker-4] Starting task 06b83735-474c-4a2a-a0c6-935fec602683 (2012-08-27 02:30:00.247161+00:00)

@ask

This comment has been minimized.

Copy link
Member

ask commented Aug 28, 2012

Does it also say that it received the task twice? (log line 'Received task from broker' when using the info loglevel).

I have never had reports of a worker starting a task twice after receiving one message.
It is possible for a message to be received twice, as 'exactly once delivery' is hard to accomplish in a scalable way, e.g. amqp only guarantees this for limited cases (e.g. when clustering is not used). Exactly once delivery requires a centralized deduplication filter.

It could theoretically also be a race condition in the request, so that the second task's request was overwritten by the first one (getting the task_Id of another task).

Another possibility is the new ack emulation for redis which introduces a visibility timeout: if a worker has not acked
a message within 1 hour (default, but configurable) the message will be restored and sent to another worker. This is a very new addition and so there could be bugs present. You did say that you used long eta's, which is currently not very compatible with the visibility timeout:
See http://docs.celeryproject.org/en/latest/whatsnew-3.0.html#redis-ack-emulation-improvements

@ask

This comment has been minimized.

Copy link
Member

ask commented Aug 28, 2012

Btw, see the note in the above link. Using very large values for the visibility timeout should be fine, as you can always
manually restore messages after power failure/abrupt termination. Just make sure that you always terminate the worker with the TERM signal.

@fredboyle

This comment has been minimized.

Copy link

fredboyle commented Aug 28, 2012

Did find the "Got task from broker" for that task ID twice prior to the task being run. However the tasks were received within seconds of each other.

Didn't think a single task ID could be received more than once or even triggered more than once, guess I was wrong. I've since added code to prevent the task from proceeding if it is indeed run more than once so at least it won't make a mess in the app.

I am using this as more of a background scheduler with follow-up tasks. So a scheduled first task sets off a process, so perhaps there's a better way to accomplish this. For now it seems that my added checks in code should do the trick in case this occurs again.

Thanks for the clarification and link.

@fredboyle

This comment has been minimized.

Copy link

fredboyle commented Aug 29, 2012

Seems my code fix didn't quite do the trick. It appears the duplicate tasks are being run in faster succession than my check can validate against the DB so it's making a mess of the process.

Is there a way to ensure a task only ever runs once? Or some sort of maintenance process that could be run to remove duplicate tasks?

@zgoda

This comment has been minimized.

Copy link

zgoda commented May 20, 2013

I was bitten by this just recently. Tasks are started multiple times in fraction of second intervals. These tasks are all called with apply_async() without ETA (iow immediately), visibility timeout is default - but this should not have such an impact on immediate tasks?

software -> celery:3.0.6 (Chiastic Slide) kombu:2.5.10 py:2.7.3 billiard:2.7.3.28 redis:2.7.2.1

@ask

This comment has been minimized.

Copy link
Member

ask commented May 21, 2013

@zgoda Received multiple times, or just started multiple times? Are you using any other settings (late ack?) Are you able to reproduce?

There should be exceptional circumstances for something like this to happen, and possibly it is a bug.
Theoretically a deduplication filter can be used to ignore already processed tasks. Though there's no existing solution for this, it should be pretty simple to make one.

@zgoda

This comment has been minimized.

Copy link

zgoda commented May 22, 2013

Received once, started multiple times. Ack setup default (not late).

@leonsas

This comment has been minimized.

Copy link

leonsas commented Feb 4, 2014

I just encountered the same problem. With only one celery worker and the redis broker running on the same machine. Tasks with the same UUID were received with an ETA of about ~7hours and were executed multiple times.

@ask

This comment has been minimized.

@leonsas

This comment has been minimized.

Copy link

leonsas commented Feb 4, 2014

Thanks.

This is problematic. Simple DB checks when running the tasks fail as well, given that celery is executing the 4 in parallel faster than my code can reach the DB.

My tasks have varying ETA's, which is determined at runtime and is usually 1hr-48hr. I could go the safe way and setting the visibility timeout to 50 hours. I don't quite understand the side-effects of visibility timeouts, what would be the trade-off of setting such a high timeout?

That said, is there a recommended best practice to solving this issue? It'd be nice to have per-task visibility timeout.

@keras

This comment has been minimized.

Copy link

keras commented Jun 11, 2015

Happened here too with Celery 3.1.17 and Django 1.74.

I have tried using different brokers: RabbitMQ and Redis without any difference. This is happens around 90% of times I run a_task.delay(). The task returns correct result. But if it has any side effects like appending line to a log file that log entry is written 1 to [--concurrency] times to the log.

Only thing I have been able to do about this issue is manually acquiring a lock with Redis to that task and that way ignoring the extra tasks.

Any ideas?

@tokheim

This comment has been minimized.

Copy link

tokheim commented Sep 25, 2015

I ran into a similar issue with RabbitMQ as backend. My problem was that there was two queues bound with the same routing key to the celery exchange, so each task was pushed to two queues that were being read by workers. Unbinding queues from the exchange and fixing my celery config did the trick. Not sure if this is what you guys are experiencing, but it might be helpful checking out.

@auvipy

This comment has been minimized.

Copy link
Member

auvipy commented Jun 3, 2016

@tokheim were you able to fix the issue? if so a prposed change is welcome as PR

@AndreiPashkin

This comment has been minimized.

Copy link

AndreiPashkin commented Aug 9, 2016

@fredboyle, @zgoda, @leonsas, @keras, @tokheim,
do you use Flower in your setups?

Here @mihneagiurgea says, that in his case Flower caused this strange behavior:
https://groups.google.com/d/msg/celery-users/W0Qf09ahjas/FgVWqrefkAcJ

@zgoda

This comment has been minimized.

Copy link

zgoda commented Aug 9, 2016

@AndrewPashkin no.

@leonsas

This comment has been minimized.

Copy link

leonsas commented Aug 9, 2016

Thanks @AndrewPashkin !
I do use Flower, but at this point I don't think I've seen this problem in a while. I think the redis broker got much better in the past 2 years.

@auvipy

This comment has been minimized.

Copy link
Member

auvipy commented Aug 9, 2016

closing the issue for now. will re open if any further problem raise regarding this issue.

@auvipy auvipy closed this Aug 9, 2016

@dmitryTsatsarin

This comment has been minimized.

Copy link

dmitryTsatsarin commented Nov 18, 2016

The same problem (task with one uuid wich launched several times)
https://gist.githubusercontent.com/dmitryTsatsarin/a5cbf1d99df50564385e8f57a82617a9/raw/c65bada7d155e771cdbe309a6b4c5a9eacfcf78a/gistfile1.txt
Task was created in one place like

OrderDelayedPayment().apply_async(args=[company_order.pk], countdown=settings.PAYMENT_DELAY)

I think problem in "countdown". I plan to rewrite code to the celery PeriodicTask, I think it should help

@dmitryTsatsarin

This comment has been minimized.

Copy link

dmitryTsatsarin commented Nov 21, 2016

@auvipy Please reopen ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment