Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing runner in the background doesn't work properly #99

Closed
Ladas opened this issue Jul 26, 2018 · 8 comments
Closed

Executing runner in the background doesn't work properly #99

Ladas opened this issue Jul 26, 2018 · 8 comments

Comments

@Ladas
Copy link

Ladas commented Jul 26, 2018

Hello,

I am trying:
https://ansible-runner.readthedocs.io/en/latest/standalone.html#executing-runner-in-the-background

I have just a simple testing role:

---
- name: simulate long running op (120 sec), wait for up to 400 sec, poll every 5 sec
  command: /bin/sleep 120
  async: 400
  poll: 5

So running this as ansible-runner start /tmp/ansible-runner20180726-13642-1i9ck95 --json -i result --hosts localhost ...

I can see the python processes running this but:

  1. ansible-runner is-alive /tmp/ansible-runner20180726-13642-1i9ck95 -i result doesn't do anything, returns blank string
  2. ansible-runner stop /tmp/ansible-runner20180726-13642-1i9ck95 -i result doesn't stop the work, I still see the processes running
  3. Unclear status. It fails for a reason I can't find (last event is running this play, then nothing). And it gives inconsistent result. status file has failed in it, but rc file has None
@Ladas
Copy link
Author

Ladas commented Jul 26, 2018

@matburt ping ^

@djzager
Copy link
Contributor

djzager commented Jul 26, 2018

I'm actually seeing a similar scenario, but it is executing runner in the foreground. In my example:

$ docker run --net=host -v ~/.kube:/opt/apb/.kube:z -it --rm -u $UID --entrypoint /bin/bash docker.io/automationbroker/automation-broker-apb:sprint151
# This is related to our work to have APB run as non-root user
bash-4.2$ echo "${USER_NAME:-apb}:x:$(id -u):0:${USER_NAME:-apb} user:${HOME}:/sbin/nologin" >> /etc/passwd

# Since APBs normally had their playbooks in /opt/apb/actions, this moves them where runner can find them
bash-4.2$ mv /opt/apb/actions /opt/apb/project

# Do it
bash-4.2$ ansible-runner run --ident test --playbook test.yml /opt/apb

PLAY [automation-broker-apb test playbook] *************************************

...Excess Logs redacted...

FAILED - RETRYING: Wait for ClusterServiceBroker to become ready (52 retries left).Result was: {
    "attempts": 9,
    "changed": false,
    "msg": "Broker ready status: [u'False']",
    "retries": 61
}
FAILED - RETRYING: Wait for ClusterServiceBroker to become ready (51 retries left).Result was: {
    "attempts": 10,
    "changed": false,
    "msg": "Broker ready status: [u'False']",
    "retries": 61
}
FAILED - RETRYING: Wait for ClusterServiceBroker to become ready (50 retries left).Result was: {
    "attempts": 11,
    "changed": false,
    "msg": "Broker ready status: [u'False']",
    "retries": 61
}

Similar to what @Ladas was saying:

bash-4.2$ cat /opt/apb/artifacts/test/rc
None

bash-4.2$ cat /opt/apb/artifacts/test/status
failed

Not totally sure why it was moved into the failed state, why it stopped running in the middle, or why the return code is None.

@matburt
Copy link
Member

matburt commented Jul 26, 2018

I bet you both are running into some pretty dumb defaults on my part:

https://github.com/ansible/ansible-runner/blob/1.0.5/ansible_runner/runner_config.py#L178-L179

I need to disable those if they aren't provided. I also need to make it more explicitly clear if it's getting killed for this specific reason.

in the meantime, you can work around it by overriding these defaults and I'll get this fixed up by tomorrow and spin a 1.0.6

@Ladas
Copy link
Author

Ladas commented Jul 27, 2018

@matburt cool, thank you. 👍 Any idea why the is-alive and stop methods are not working for me?

@Ladas
Copy link
Author

Ladas commented Jul 27, 2018

@matburt hm, ok so doing another test:

It seems like is-alive gives return code 0 while the job is running and 1 once it's done. Is that the expected output?

But stop gives always return code 1, there is no error (even if I use -vvvvv or --debug), and the process keeps running. But after a current play finishes, it seems to stop (output file is blank and rc is missing). Is waiting till the current play finishes?

@matburt
Copy link
Member

matburt commented Jul 27, 2018

@Ladas I'm going to use this issue to fix the idle/job timeout issue. Can you open a new one to track the is-alive/stop/async issue you described here?

@Ladas
Copy link
Author

Ladas commented Jul 27, 2018

@matburt will do

@matburt
Copy link
Member

matburt commented Jul 27, 2018

timeout handling fixed here: abe1806

@matburt matburt closed this as completed Jul 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants