Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(revised caption) When doing async without poll, and with timeout of 0, don't babysit the process, just execute directly #4778

Closed
willthames opened this issue Nov 2, 2013 · 8 comments
Labels
feature This issue/PR relates to a feature request.

Comments

@willthames
Copy link
Contributor

See https://groups.google.com/forum/?fromgroups=#!topic/ansible-project/bMuOs5lLg_8 for more background.

Using command or shell to start a synchronous task that kicks of a job in the background under nohup does not succeed (for some reason the popen.communicate does not return)

It is not possible to start a long running task that does not have an incredibly large timeout. In the meantime, there are three supervisory processes on top of the task itself, checking every five seconds whether the timeout has expired.

Because async_wrapper kills the entire process group (and no reasonable mechanism seems to exist in bash for a process to remove itself from a process group, unlike removing itself from its parent) with a SIGKILL, it's near impossible to protect the long running task from being killed when the timeout expires, even if that's what you'd like to achieve. A SIGHUP would have similar effect but could be protected against with nohup. Even SIGTERM etc could be protected against with trap

I quite like the asynchronous approach, and believe that changing the signal sent, either by default or through an override, would be a reasonable solution. The hardcoded check every five seconds might also warrant examination, but I'm really not that concerned if I can set a timeout of 5 seconds but protect a process from being killed from the timeout.

@mpdehaan
Copy link
Contributor

mpdehaan commented Nov 2, 2013

Original caption was "There is no way to kick off a task that runs forever"

Ok, so this defect is not entirely correct. There is.

"async" with a poll interval of 0 does fire and forget a task, there's no need for the nohup. You could set a timeout that was longer than the death of the universe, and it would work.

The above problem was reported when trying to get around the babysitter process, as I understand it, that still runs around that task for "cleaner process output" -- not a critical feature, but nice to have -- but the babysitter is needed to set a maximum lifetime for the program. So what if the maximum lifetime is 0, implying the process should be forever, without passing in a crazy large value?

What was discussed in the thread was that it would be nice to just exec the task and not have a helper process in the case that you have given up on the idea that you want to poll it. This should only happen if the poll interval is 0 and the lifetime is set to 0 -- both 0.

There should be no need to change the kill handling -- just don't spawn the babysitter task in the async_wrapper if the user has waived the idea of polling the task, and let it go down a completely different path.

@mpdehaan
Copy link
Contributor

mpdehaan commented Nov 2, 2013

note: updated comment above to make it more clear what I'm proposing if both the poll interval is 0 and the lifetime is 0.

Current way to execute infinite task is poll interval of 0 -- fire and forget -- and a lifetime of VERY LARGE NUMBER.

Request would be basically let 0 be "infinite timeout" and if poll interval is also 0 just execute directly and don't have the watcher process -- just the necc. levels of daemonization.

@willthames
Copy link
Contributor Author

Sounds like a sensible approach - and you're right about the original title, it was incorrect.

I'll see if I can come up with a way of implementing this in the way you're proposing.

@mpdehaan
Copy link
Contributor

mpdehaan commented Nov 2, 2013

Outstanding, thank you!

@romabysen
Copy link

Shouldn't a process that is supposed to run forever really be some kind of service though? I just have a hard time seeing this as a common use-case that isn't better solved outside of ansible.

@willthames
Copy link
Contributor Author

I suppose that would be a reasonable alternative (handle the daemonization in the application rather than through ansible). Certainly my investigations of code fixes haven't come up with a means of doing this in any nice way.

@bcoca
Copy link
Member

bcoca commented Nov 5, 2013

demonizing over ssh is not normally a good idea, you can use many things
(runit, uwsgi, monit, start-stop-daemon, daemontools, superviserctl,
init/upstart/systemd script, etc) to do this correctly and then call that
from ssh.

@willthames
Copy link
Contributor Author

I don't really disagree - I just wonder if the documentation could be improved (I think it's the term fire-and-forget in the async documentation that led me down the wrong path).

I haven't done much documentation for Ansible so I guess now's my opportunity to contribute - basically outline some of the mechanisms (in the end I solved my issue by running the task synchronously under setsid - so there's no supervision but I could always live with that)

@ansibot ansibot added feature This issue/PR relates to a feature request. and removed feature_idea labels Mar 2, 2018
@ansible ansible locked and limited conversation to collaborators Apr 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature This issue/PR relates to a feature request.
Projects
None yet
Development

No branches or pull requests

5 participants