Skip to content

Conversation

@jdufresne
Copy link
Contributor

Sending signals to kill a process has a race condition where a pid may
be recycled by a new, unrelated process.

Instead, now use watchdog to watch the directory containing the
pidfile. Hold a lock until the pidfile is deleted. Once all pidfiles
are deleted, all celery processes have finished.

Introduces a dependecy on watchdog.

Fixes #2785


All feedback welcome.

@jdufresne
Copy link
Contributor Author

Check the build: https://travis-ci.org/celery/celery/jobs/77433748#L907

Thanks, I'll take a look.

Could we make watchdog optional, so it will only be required if celery multi is used?

Sure thing, makes sense. Should it fall back to the previous implementation if watchdog does not exist?

@jdufresne
Copy link
Contributor Author

I don't think it should since it introduces an inconsistent state.

I'm not sure I'm 100% what you're asking for.

So if watchdog is unavailable, does that mean the stopwait command is also unavailable from multi?

Or do you mean, continue to always import watchdog from multi.py. So it is always required when running the multi command, but no other part of celery?

Sending signals to kill a process has a race condition where a pid may
be recycled by a new, unrelated process.

Instead, now use watchdog to watch the directory containing the
pidfile. Hold a lock until the pidfile is deleted. Once all pidfiles
are deleted, all celery processes have finished.

Introduces a dependecy on watchdog.

Fixes #2785
@ask
Copy link
Contributor

ask commented Sep 4, 2015

I would prefer to not use watchdog at all, as if we add more dependencies there will be a large mob with pitchforks screaming at us how complicated celery is because installing it logs so much, or something similar. ;)

More seriously though, watchdog seems to rely on threading which we cannot use as one of the goals with multi is the ability to use it as a library. E.g. cyme uses it to start and stop processes, and you cannot use threads in combination with fork(). Every time I have used threads, it ends up being a major pain, so we are trying to be completely free of them.

It's also a C library, which is something we would like to avoid depending on by default.

We also already have code to watch files in celery/worker/autoreload, perhaps that could be generalized enough to be reused here?

@jdufresne jdufresne closed this Nov 29, 2015
@ask
Copy link
Contributor

ask commented Dec 15, 2015

We should probably not cache the value of the pid for long, and instead reread the pidfile just before we kill the process, possibly that can be combined with checking /proc contents for args or something on Linux for extra safety.

Programs have used these mechanisms without filesystem event support for decades, so it must be possible to handle in some way compatible with old IRIX systems or whatever.

If using watchdog to watch the pid we will still have the race condition where the process is killed remotely, not giving it a chance to remove the pidfile.

@ask
Copy link
Contributor

ask commented Dec 15, 2015

it should be enough to check for parent pid come to think of it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants