Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Play still fails for "UNREACHABLE" hosts even with "ignore_errors: true" set on tasks #18075
I don't think I've modified any relevant config properties, but here's my ansible.cfg: https://github.com/HHSIDEAlab/bluebutton-data-pipeline/blob/master/bluebutton-data-pipeline-benchmarks/src/test/ansible/ansible.cfg
OS / ENVIRONMENT
Management host is Ubuntu 14.04, and the systems being managed are RHEL 7.0.
I've got the following play, which is still failing if the host is unreachable. The playbook it's in is a "teardown" script that I need to keep (trying to) march on, no matter what, to ensure that my AWS resources are always removed (so they aren't wasting money):
When the "
STEPS TO REPRODUCE
Run that above play or one similar against a host in your inventory that is turned off or otherwise unreachable.
I expect the playbook run to continue on to the next task.
It went boom too early.
USE CASE / JUSTIFICATION
I've got two playbooks I use here as part of a benchmark setup: The first playbook sets up a bunch of systems in EC2 and starts my application running on it, waiting for it to finish the processing that I'm trying to benchmark. The second playbook scrapes the log files off of those systems, then terminates the hosts, to keep from spending unnecessary money.
This generally works okay, except that in one benchmark run just now I had an iteration fail in the first playbook because EC2 didn't respond that the instance was ready within the expected amount of time. Turns out, it did eventually finish setting up, but whatever, EC2 is just unexpectedly slow sometimes. But because of this error, when my second playbook ran to try and grab results (which wouldn't have been there: no big deal) and then remove the EC2 instances, it went boom, and AWS continued to burn money.
For a use case like this, I really need to trust that that teardown playbook will at least attempt to run every single task in it.
You can see my whole benchmarking project here: bluebutton-data-pipeline-benchmarks/src/test/ansible.
Would it be possible to try to reproduce this with a 2.2 release candidate (https://github.com/ansible/ansible/releases/tag/v188.8.131.52-0.2.rc2 is current rc)? There have been some recent fixes to partial failure handling.
But I think 'ignore_error' only comes into play if a task was able to connect, run the module, and got back a 'failed' result. In this case, it looks like the task fails to connect to the host, so ignore_errors wont apply.
Thanks very much for your submission to Ansible. It sincerely means a lot to us.
We believe the ticket you have filed is being somewhat misunderstood, as one thing works a little differently than stated.
As @alikins states above, the
If you need to clear unreachable errors you can use a
In the future, this might be a topic more well suited for the user list, which you can also post here if you'd like some more help with the above.
Thank you once again for this and your interest in Ansible!
@bcoca, some thoughts:
Anywho, I love Ansible as it's by far the most reliable CM tool I've used (out of Puppet, Chef, and Terraform), so thanks very kindly for your work on it.
With a fresh pull of the devel branch of ansible, the "meta" module's "clear_host_errors" is not restoring hosts back into play.
I agree this could be clearer. I have a use case where a new host has port 22 available before ssh is ready (awaiting cloud-init) and it would be great if the following worked.
I am running into a similar problem with multiple plays in a playbook. The meta: clear_host_errors is not honored and any plays that are to be called after the host UNREACHABLE error is thrown by a upstream play, all remaining plays are ignored or not run.
Using ansible 184.108.40.206
I have the same problem of @kcd83 . Maybe I'm doing it wrong but I use Ansible to check application health. I have fact checking off and I use
Ability to skip unreachable nodes will be very useful in clustered application deployments. At times a node in the cluster may be down and if some tasks are to be performed on running nodes, currently play will abort if any one node is unreachable (when tasks are executed in serial one node at a time - for example apply security patches and reboot servers one at a time so that cluster remains operational). One would like to keep track of nodes unreachable in last run and be able to sync them when they come back online again. Minimum , play should have option to skip a node if not reachable.
we found ourselves in the need for this functionality:
Folks, I'm having hard time to understand what's the problem?
I want to perform initial configuration on hosts that I dynamically add to a cluster. After this config applied,
This how you overcome that:
See this blogpost
I have no relation to the author of the blogpost btw =).