New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to control if entire play should be aborted if batch of hosts failed #40271

Open
wants to merge 2 commits into
base: devel
from

Conversation

Projects
None yet
3 participants
@agenosov

agenosov commented May 16, 2018

SUMMARY

When using serial strategy, if entire batch of hosts failed (due to they are unreachable or due to some processing error) then Ansible interrupts all remaining plays.

While it makes sense for some cases, there're also situations when such behaviour is absolutely unexpectable.

I'm not going to diving into details of possible cases, just one example.
Imagine you manage a system distributed within several hosts (usual case for all of us). Main playbook includes some other playbooks, which explicitly sets 'serial: 1' in order to process hosts one by one before going further. And it's absolutely vital to complete main play despite of the fact that some hosts were marked as failed.

We successfully introduced an option which allows to control whether it's acceptable to continue plays if current batch of hosts failed.
Using of such option is responsibility of playbook designer, i.e. you should understand what you do and that in concrete case such behaviour is acceptable.

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

Playbook executor

ANSIBLE VERSION
ansible-playbook 2.5.2
  config file = /home/andrey/.ansible.cfg
  configured module search path = [u'/home/andrey/projects/devops/ansible/lib']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = bin/ansible-playbook
  python version = 2.7.12 (default, Dec  4 2017, 14:50:18) [GCC 5.4.0 20160609]
ADDITIONAL INFORMATION

I demonstrate a case when in main playbook another playbook is included which is for communicating with several hosts using serial strategy with batch size equal to 1. If one of target hosts is unreachable than our main playbook wouldn't be completed.
The code is below.

  1. Main play:
    - import_playbook: test_unavailable_node_different_strategy.yml
    - hosts: 127.0.0.1
    connection: local
    tasks:
    - debug: msg="Playbook finished"

  2. Included play which targets several hosts using serial=1:
    - name: Serial processing of hosts
    hosts: all
    serial: 1
    tasks:
    - name: Create temporary file
    tempfile:
    state: file
    register: tmp_path
    - name: Remove the file created on previous step
    file:
    path: "{{tmp_path.path}}"
    state: absent

We will not reach execution of tasks in the main playbook - entire play would be interrupted.

Adding the option break_play_on_batch_failed: false which dictates to not interrupt entire play will fix this situation.

So having two hosts one of which is up and one is down, launch the main play:

andrey@aagenosov:~/projects/devops/ansible_notes$ ansible-playbook -i '10.16.42.59,10.16.43.57' ./main.yml --ask-pass -u user
SSH password: 

PLAY [Serial processing of hosts] ************************************************************************************************************************************************

TASK [Create temporary file] *****************************************************************************************************************************************************
changed: [10.16.42.59]

TASK [Remove the file created on previous step] **********************************************************************************************************************************
changed: [10.16.42.59]

PLAY [Serial processing of hosts] ************************************************************************************************************************************************

TASK [Create temporary file] *****************************************************************************************************************************************************
fatal: [10.16.43.57]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.16.43.57 port 22: No route to host\r\n", "unreachable": true}

PLAY RECAP ***********************************************************************************************************************************************************************
10.16.42.59                : ok=4    changed=2    unreachable=0    failed=0   
10.16.43.57                : ok=0    changed=0    unreachable=1    failed=0

Andrey Agenosov added some commits May 8, 2018

@ansibot

This comment has been minimized.

Contributor

ansibot commented Aug 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment