Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to control if entire play should be aborted if batch of hosts failed #40271

wants to merge 2 commits into
base: devel


Copy link

@agenosov agenosov commented May 16, 2018


When using serial strategy, if entire batch of hosts failed (due to they are unreachable or due to some processing error) then Ansible interrupts all remaining plays.

While it makes sense for some cases, there're also situations when such behaviour is absolutely unexpectable.

I'm not going to diving into details of possible cases, just one example.
Imagine you manage a system distributed within several hosts (usual case for all of us). Main playbook includes some other playbooks, which explicitly sets 'serial: 1' in order to process hosts one by one before going further. And it's absolutely vital to complete main play despite of the fact that some hosts were marked as failed.

We successfully introduced an option which allows to control whether it's acceptable to continue plays if current batch of hosts failed.
Using of such option is responsibility of playbook designer, i.e. you should understand what you do and that in concrete case such behaviour is acceptable.

  • Feature Pull Request

Playbook executor

ansible-playbook 2.5.2
  config file = /home/andrey/.ansible.cfg
  configured module search path = [u'/home/andrey/projects/devops/ansible/lib']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = bin/ansible-playbook
  python version = 2.7.12 (default, Dec  4 2017, 14:50:18) [GCC 5.4.0 20160609]

I demonstrate a case when in main playbook another playbook is included which is for communicating with several hosts using serial strategy with batch size equal to 1. If one of target hosts is unreachable than our main playbook wouldn't be completed.
The code is below.

  1. Main play:
    - import_playbook: test_unavailable_node_different_strategy.yml
    - hosts:
    connection: local
    - debug: msg="Playbook finished"

  2. Included play which targets several hosts using serial=1:
    - name: Serial processing of hosts
    hosts: all
    serial: 1
    - name: Create temporary file
    state: file
    register: tmp_path
    - name: Remove the file created on previous step
    path: "{{tmp_path.path}}"
    state: absent

We will not reach execution of tasks in the main playbook - entire play would be interrupted.

Adding the option break_play_on_batch_failed: false which dictates to not interrupt entire play will fix this situation.

So having two hosts one of which is up and one is down, launch the main play:

andrey@aagenosov:~/projects/devops/ansible_notes$ ansible-playbook -i ',' ./main.yml --ask-pass -u user
SSH password: 

PLAY [Serial processing of hosts] ************************************************************************************************************************************************

TASK [Create temporary file] *****************************************************************************************************************************************************
changed: []

TASK [Remove the file created on previous step] **********************************************************************************************************************************
changed: []

PLAY [Serial processing of hosts] ************************************************************************************************************************************************

TASK [Create temporary file] *****************************************************************************************************************************************************
fatal: []: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host port 22: No route to host\r\n", "unreachable": true}

PLAY RECAP ***********************************************************************************************************************************************************************                : ok=4    changed=2    unreachable=0    failed=0                : ok=0    changed=0    unreachable=1    failed=0

Copy link

@ansibot ansibot commented Aug 5, 2018

Copy link

@samccann samccann commented Jun 5, 2019

@agenosov Can you please rebase this PR so we can review? Rebasing details are documented at:


Andrey Agenosov added 2 commits May 8, 2018
@agenosov agenosov force-pushed the agenosov:option_break_play_on_batch_failed branch to 2ccaf46 Jul 3, 2019
Copy link

@agenosov agenosov commented Jul 3, 2019

@samccann, done, sorry for delay.

Copy link

@bcoca bcoca commented Jul 3, 2019

I would suggest creating a strategy plugin instead of a new keyword

Copy link

@samccann samccann commented Feb 11, 2020

@bcoca - based on your prior comment, should this PR be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants