Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task children processes are killed #33410

Closed
Vladimir-csp opened this issue Nov 30, 2017 · 13 comments

Comments

@Vladimir-csp
Copy link

commented Nov 30, 2017

ISSUE TYPE
  • Bug Report
COMPONENT NAME

core

ANSIBLE VERSION
ansible 2.4.0.0
CONFIGURATION
OS / ENVIRONMENT

FreeBSD 11 server and clients (can be reproduced with service, command and shell modules).
Linux hosts (can be reproduced with command and shell modules)

SUMMARY

Subprocesses of tasks are silently killed after task finishes. This has several consequences:
service module fails silently (i.e. ntpd on FreeBSD) with symptoms exaclty as described in #17476.

Shell or command module that runs a forking script kills all forks. (adding nohup to command is a workaround).
These symptoms appeared after upgrade from ansible 2.3 to 2.4.

STEPS TO REPRODUCE

Run this on FreeBSD 11:

- name: start ntpd service
  service:
    name: ntpd
    state: started
EXPECTED RESULTS

Service is started and running

ACTUAL RESULTS

task runs "ok", but processes related to "service ntpd start" command on client host are killed right after start, no ntpd is running. See #17476 for defaults, symptoms are the same.

@Vladimir-csp

This comment has been minimized.

Copy link
Author

commented Dec 1, 2017

This also happens with linux server and client hosts.
Simple script to check:

#!/bin/sh

timedump(){
	cat /dev/null > /tmp/testoutput
	CYCLE=0
	while [ "$CYCLE" -lt "600" ]
	do
		date +%T >> /tmp/testoutput
		sleep 1
		CYCLE=$(( $CYCLE + 1 ))
	done
}
timedump &
exit 0

Save on target host as /tmp/timedump, then:

ansible target-host -m command -a /tmp/timedump

Only two timestams are written to /tmp/testoutput.

Another update: subshells survive if running task on localhost.

@Vladimir-csp

This comment has been minimized.

Copy link
Author

commented Dec 8, 2017

Upgrade to 2.4.2 fixed this issue, but since a similar problem occurred in earlier version, could anyone provide some details about what lead to the problem in 1.9.2 and 2.4.0 and it's resolution in 2.1.1 and 2.4.2?

@ansibot ansibot added bug and removed bug_report labels Mar 1, 2018
@hellojukay

This comment has been minimized.

Copy link

commented Mar 15, 2018

i have the same question, ansible 2.4.2.0

ansible version

[deploy@baochai ~]$ ansible --version
ansible 2.4.2.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/deploy/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.6/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.6.6 (r266:84292, Jan 22 2014, 09:42:36) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]

ansible config:

[defaults]
inventory=./hosts
host_key_checking = True
roles_path=./
deprecation_warnings=False
library=/deployment/apps/ansible-depoy/current/library
timeout=30
[ssh_connection]
ssh_args = "-o ControlMaster=no"


gathering = smart
fact_caching = jsonfile
fact_caching_connection = /appdata/ansible-deploy/facts_cache
fact_caching_timeout = 86400

host version:

[deploy@baochai ~]$ cat /proc/version
Linux version 2.6.32-431.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013
[deploy@baochai ~]$

description:
i use ansible to start my golang server

/usr/bin/ansible-playbook /deployment/apps/ansible-deploy/current/deploy_go.yml --extra-vars ' hosts="daiyu" remote_user="deploy" svn_username="op" svn_password="didapopo" repo="svn://code.didapinche.com/didapinche/deployment/go/dm_ride_relationship" date="2018-03-15" script="/deployment/apps/ansible-deploy/current/script" time="20180315111029" appname="dm_ride_relationship" buildid="25276" bin="dm_ride_relationship" sleep="" configpath="/home/deploy/.git/20180315111030/dm_ride_relationship/devtest/default/" instance="1" version="r54451" conf_to="conf" params="" path="dm_ride_relationship"' 

this is the real start command

ok: [daiyu] => {
    "restart": {
        "changed": true, 
        "cmd": "(env app.name=dm_ride_relationship nohup /deployment/apps/dm_ride_relationship/releases/20180315111029/dm_ride_relationship  >> /applogs/dm_ride_relationship/dm_ride_relationship-1.out.2018-03-15 2>&1 & echo $! > /deployment/apps/dm_ride_relationship/instances/1/pid.txt)", 
        "delta": "0:00:00.113956", 
        "end": "2018-03-15 11:10:52.066362", 
        "failed": false, 
        "rc": 0, 
        "start": "2018-03-15 11:10:51.952406", 
        "stderr": "", 
        "stderr_lines": [], 
        "stdout": "", 
        "stdout_lines": []
    }
}

i found process always be killed after print some logs, where is no panic and error, just be killed. i use the command

env app.name=dm_ride_relationship nohup /deployment/apps/dm_ride_relationship/releases/20180315111029/dm_ride_relationship  >> /applogs/dm_ride_relationship/dm_ride_relationship-1.out.2018-03-15 2>&1 & echo $! > /deployment/apps/dm_ride_relationship/instances/1/pid.txt

to start the server , everything is ok. but using ansible to start it , it happend this bug.

@BenJaziaSadok

This comment has been minimized.

Copy link

commented Apr 12, 2018

I got the same issue as @hellojukay, solved by using ansible async tasks http://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html
I'm still looking for an explanation on the issue

@hellojukay

This comment has been minimized.

Copy link

commented Apr 12, 2018

@BenJaziaSadok I Got the answer , because of the process got signal kill -2

@pcapdevila

This comment has been minimized.

Copy link

commented Aug 20, 2018

This has kicked in again in 2.6.3 for both services and shell modules. Haven't had an issue since 2.5 that I recall

@motorahead

This comment has been minimized.

Copy link

commented Sep 24, 2018

Same issue here on 2.6.3-1.el7 when trying to start Oracle Weblogic servers. The server starts successfully and ansible exists the task, then the server is killed.

Any updates on this?

@jborean93

This comment has been minimized.

Copy link
Contributor

commented Sep 25, 2018

Is This A Bug?

Hi!

Thanks very much for your submission to Ansible. It sincerely means a lot to us.

We're not sure this is a bug, and we don't mean for this to be confrontational. Let's explain what we're thinking:

  • Each task is run over SSH and are based on the terminal created for that session
  • Once the parent process is finished, the terminal is closed which then means any other processes associated with that terminal are also killed
  • This is standard practice for SSH sessions and the only way to avoid this is to have the process detach itself from the terminal
  • You can use async or a nohup command to achieve this if you want
  • But, you are probably better off running this as a proper service which gives you a lot more benefits than what you can get with async or a nohup command.

As such, we're going to close this ticket. However, we're open to being corrected, should you wish to discuss. You can stop by one of our two mailing lists
to talk about this and we might be persuaded otherwise.

Comments on closed tickets aren't something we monitor, so if you do disagree with this, a mailing list thread is probably appropriate.

Thank you once again for this and your interest in Ansible!

@jborean93 jborean93 closed this Sep 25, 2018
@motorahead

This comment has been minimized.

Copy link

commented Sep 25, 2018

I understand the concept you describe. I tested the startup of the process outside of Ansible using 'ssh -l user hostname "startcommand"' and it did not kill the process upon completing/exiting the terminal. Ansible seems to behave differently?

@UnitedMarsupials

This comment has been minimized.

Copy link

commented Oct 3, 2018

@jborean93, I run the process thus:
nohup $exec < /dev/null >> /var/tmp/mylog.txt 2>&1 &
it still gets the HUP...

@RabidCicada

This comment has been minimized.

Copy link

commented Oct 26, 2018

To make it work for me I use nohup <cmd> & disown. Hopefully this helps someone else.

@RabidCicada

This comment has been minimized.

Copy link

commented Oct 26, 2018

@jborean93 I think this was closed in error. Can you explain why the services module fails as in the original post? I understand your justification for other people's add on complaints...but the services module itself should not fail. This matches your desired and suggested use case of make it a proper service.

@crc32a

This comment has been minimized.

Copy link

commented Jul 1, 2019

In my case I have ansible run bash script inside a container. When I login manually and run the script the server starts up and is adopted by init PID1 then I can exit the ssh session. The trrouble is when this is done by ansiblke (which uses ssh I believe) the services are also adopted by init but a soon as ansible exits the processess die are killed or what ever.

I'm at a loss as for how ansible is behaving different;y then me logging into the container (runnning sshd) running the start script.

root@glassfish:/# ps -ef --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 08:23 ? 00:00:00 /bin/sh -c /sbin/my_init
root 6 1 0 08:23 ? 00:00:00 /usr/bin/python3 -u /sbin/my_init
root 17 6 0 08:23 ? 00:00:00 _ /usr/bin/runsvdir -P /etc/service
root 18 17 0 08:23 ? 00:00:00 _ runsv sshd
root 19 18 0 08:23 ? 00:00:00 _ /usr/sbin/sshd -D
root 293 19 0 08:25 ? 00:00:00 _ sshd: root@pts/1
root 545 293 0 08:25 pts/1 00:00:00 | _ -bash
root 926 19 0 08:26 ? 00:00:00 _ sshd: root@pts/0
root 928 926 0 08:26 pts/0 00:00:00 _ -bash
root 939 928 0 08:26 pts/0 00:00:00 _ ps -ef --forest
root 11 1 0 08:23 ? 00:00:00 /usr/sbin/syslog-ng --pidfile /var/run/syslog-ng.pid -F --no-caps
root 80 1 12 08:25 ? 00:00:12 /opt/jdk1.8.0_152/bin/java -Xms64M -Xmx1G -Djava.util.logging.config.file=logging.properties -Djava.security.auth.login.con
root 123 1 99 08:25 ? 00:02:36 /opt/jdk1.8.0_152/bin/java -cp /opt/glassfish5/glassfish/modules/glassfish.jar -agentlib:jdwp=transport=dt_socket,server=y,

The servers in qustion are the java processes pids (80, 123) which are activeMQ and glassfish. I will try the nohup suggestion.

@ansible ansible locked and limited conversation to collaborators Jul 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
You can’t perform that action at this time.