Fixed #2785 -- Changed multi stopwait to wait on pidfiles instead of process. #2789

jdufresne · 2015-08-26T23:22:11Z

Sending signals to kill a process has a race condition where a pid may
be recycled by a new, unrelated process.

Instead, now use watchdog to watch the directory containing the
pidfile. Hold a lock until the pidfile is deleted. Once all pidfiles
are deleted, all celery processes have finished.

Introduces a dependecy on watchdog.

Fixes #2785

All feedback welcome.

jdufresne · 2015-08-27T14:01:21Z

Check the build: https://travis-ci.org/celery/celery/jobs/77433748#L907

Thanks, I'll take a look.

Could we make watchdog optional, so it will only be required if celery multi is used?

Sure thing, makes sense. Should it fall back to the previous implementation if watchdog does not exist?

jdufresne · 2015-08-27T15:09:08Z

I don't think it should since it introduces an inconsistent state.

I'm not sure I'm 100% what you're asking for.

So if watchdog is unavailable, does that mean the stopwait command is also unavailable from multi?

Or do you mean, continue to always import watchdog from multi.py. So it is always required when running the multi command, but no other part of celery?

Sending signals to kill a process has a race condition where a pid may be recycled by a new, unrelated process. Instead, now use watchdog to watch the directory containing the pidfile. Hold a lock until the pidfile is deleted. Once all pidfiles are deleted, all celery processes have finished. Introduces a dependecy on watchdog. Fixes #2785

ask · 2015-09-04T20:22:42Z

I would prefer to not use watchdog at all, as if we add more dependencies there will be a large mob with pitchforks screaming at us how complicated celery is because installing it logs so much, or something similar. ;)

More seriously though, watchdog seems to rely on threading which we cannot use as one of the goals with multi is the ability to use it as a library. E.g. cyme uses it to start and stop processes, and you cannot use threads in combination with fork(). Every time I have used threads, it ends up being a major pain, so we are trying to be completely free of them.

It's also a C library, which is something we would like to avoid depending on by default.

We also already have code to watch files in celery/worker/autoreload, perhaps that could be generalized enough to be reused here?

ask · 2015-12-15T11:29:40Z

We should probably not cache the value of the pid for long, and instead reread the pidfile just before we kill the process, possibly that can be combined with checking /proc contents for args or something on Linux for extra safety.

Programs have used these mechanisms without filesystem event support for decades, so it must be possible to handle in some way compatible with old IRIX systems or whatever.

If using watchdog to watch the pid we will still have the race condition where the process is killed remotely, not giving it a chance to remove the pidfile.

ask · 2015-12-15T11:31:33Z

it should be enough to check for parent pid come to think of it

jdufresne closed this Nov 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fixed #2785 -- Changed multi stopwait to wait on pidfiles instead of process. #2789

Fixed #2785 -- Changed multi stopwait to wait on pidfiles instead of process. #2789

Uh oh!

jdufresne commented Aug 26, 2015

Uh oh!

jdufresne commented Aug 27, 2015

Uh oh!

jdufresne commented Aug 27, 2015

Uh oh!

ask commented Sep 4, 2015

Uh oh!

ask commented Dec 15, 2015

Uh oh!

ask commented Dec 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fixed #2785 -- Changed multi stopwait to wait on pidfiles instead of process. #2789

Fixed #2785 -- Changed multi stopwait to wait on pidfiles instead of process. #2789

Uh oh!

Conversation

jdufresne commented Aug 26, 2015

Uh oh!

jdufresne commented Aug 27, 2015

Uh oh!

jdufresne commented Aug 27, 2015

Uh oh!

ask commented Sep 4, 2015

Uh oh!

ask commented Dec 15, 2015

Uh oh!

ask commented Dec 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants