support for "serial" on an individual task #12170

djudd · 2015-08-31T20:04:22Z

(Feature Idea)

The natural way to handle configuration or other updates that require a rolling restart would be to perform updates in parallel, then notify a handler, which performs a restart with serial. But this is not possible, requiring either manual rolling restart or ugly hacks. See https://groups.google.com/forum/#!topic/ansible-project/rBcWzXjt-Xc

The text was updated successfully, but these errors were encountered:

JensRantil · 2015-10-01T11:16:47Z

Looks like there's a bounty for this here: https://www.bountysource.com/issues/26342862-support-for-serial-on-an-individual-task

jgrmnprz · 2015-10-13T12:41:15Z

+1

ehorne · 2015-10-15T15:29:08Z

+1; however the serial needs not to fail the entire play for remaining hosts if all hosts in the current serial fail. I often have to do rolling restarts, one server at a time across 50+ servers. It sucks when the play fails on server 3 because server 3 had some strange unexpected condition that caused the restart to fail. Setting max_failpercent to something higher than 100% should force ansible to continue the play for remaining hosts.

StephaneBunel · 2016-03-18T09:11:51Z

+1

minhdanh · 2016-04-14T09:25:32Z

👍

folex · 2016-04-25T14:57:03Z

+1

raittes · 2016-06-08T19:35:56Z

+1

loechel · 2016-07-12T09:04:58Z

+1

That's a good idea, but should not only go from playbooks to individual tasks but also to blocks, including all dependent options like max_fail_percentage and run_once.

The update-reboot example could explain that easily:

Task: Update Server (all parallel)
optional Task: Check if server needs restart
Block: (serial)
- Restart Server
- Wait for Server to come back
Additional Steps (all in parallel)

leseb · 2016-07-29T09:34:07Z

+1, rolling restarts are useful for distributed systems that are chained all together. For Ceph, we don't want to restart all the storage daemons at the same time because the configuration file changed.

akulakhan · 2016-08-05T18:16:29Z

+1

alvaroaleman · 2016-09-22T13:12:52Z

workaround:

- name: service restart
  # serial: 1 would be the proper solution here, but that can only be set on play level
  # upstream issue: https://github.com/ansible/ansible/issues/12170
  run_once: true
  with_items: "[{% for h in play_hosts  %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]"
  delegate_to: "{{ item }}"
  command: "/bin/service restart"

leseb · 2016-09-22T16:30:29Z

@alvaroaleman thanks for your suggestion however it seems to lead to this bug: #15103
At least for me, I applied your workaround like so: with_items: "[{% for h in groups[mon_group_name] %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]".

Am I missing something?

joshovi · 2017-12-21T12:58:26Z

+1

hryamzik · 2018-01-17T08:14:54Z

If the task failes on one of this "pseudo-serial" hosts, the task get executed on the other hosts instead of failing immediately. No matter what we tried we could not skip the playbook directly after the failed host.

@kami8607 I've faced the same issue with failures as rolling updates and restarts require the whole playbook to fail on any error. Solved with any_errors_fatal: true.

I also confirm that this solution works with include_tasks, however check mode is executed in parallel.

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

hryamzik · 2018-01-29T18:41:07Z

There's also a working solution in this thread.

christiang830 · 2018-01-29T18:42:38Z

there is a workaround that works for most parts but no solution

jonhatalla · 2018-02-02T22:48:37Z

There is not a working solution in this thread. While it may work for some, its not a solution.
register doesn't work properly (will have gist to back this up later) - my guess is its not the only function that will not perform correctly.

hryamzik · 2018-02-05T07:45:01Z

@jonhatalla I don't have any issues with register, can you share a gist or a repo that doesn't work?

guillemsola · 2018-02-07T13:11:40Z

I would like to limit only an artifacts download task (due to some constrains) but to execute the rest of tasks in parallel.

I have come with a proposal after reading comments which still is not working as desired for the download case. Notice that 2 is the maximum number of concurrent tasks executions desired.

    - name: Download at ratio three at most
      win_get_url:
        url: http://ipv4.download.thinkbroadband.com/100MB.zip
        dest: c:/ansible/100MB.zip
        force: yes
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

While this will match the when on each iteration only if for certain hosts I still can see all the server performing the download at the same time.

Another way of testing it is with debug a message and a add a delay between iterations. This way is clear that only two are executed at each iterations.

    - debug:
        msg: "Item {{ item }} with modulus {{ (( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) }}"
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      loop_control:
        pause: 2
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

I discovered this issue thread thanks to this SO question

Any idea why the download doesn't seem to work as the debug message does?

zwindler · 2018-03-06T16:17:54Z

At this point I don't see any use case not covered by any of the workarounds above

As previous commenters said, I also see no workaround for handlers restarting services in a cluster where you don't want to restart all nodes at the same time. So there is at least one usecase where there doesn't seem to be a solution... This render handlers in this case totally useless, as handlers are used to restart the services WHEN this is needed only.

And all other workarounds (handling concurrents write on a local hosts file for example) do work but they are so ugly...

Finally, I concurr, closing an issue because it's too big a problem to solve is a bit depressing...

hryamzik · 2018-03-12T10:08:25Z

@zwindler you can use tasks instead of handlers. I actually use rolling restarts with API checks. Implemented with include_task, works as expected. You can even try include_task` directly in handlers but I've no idea if that works or not.

zwindler · 2018-03-12T10:25:37Z

Not really sure I understand what you suggest.

Do you mean you use include_task to restart the services, and only do so with a when: clause to check whether or not a restart has to occur on this node ?
How can you assure that only one node has his services restarted at a time (that's the whole issue here) ? Do you mean you can add serial with include_task ?

hryamzik · 2018-03-12T10:33:41Z

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

in this case serial=1 is simulated for all the tasks inside install_configure.yml.

erpadmin · 2018-06-18T13:15:10Z

how has healthy_servers been defined? how is it used in the workaround? i dont see it being referenced. I want a serial task to apply to all hosts that the playbook is being ran against.

hryamzik · 2018-06-18T20:10:19Z

@erpadmin why don't you use play_hosts in this case?

pablofuentesbodion · 2018-08-01T08:57:54Z

Hi all,
we have detected similar issue and we can not pass the argument and use in the playbook:
serial: ${serial_mode}
but if fails with:
ValueError: invalid literal for int() with base 10: 'serial_mode'

it seems to point to this bug but would like to clarify:

is there any workaround for this issue ??
what is the official version with this fix ?

thanks for your help and please keep us posted.

best regards, Pablo.

pablofuentesbodion · 2018-08-01T09:03:35Z

yes, also tried with this and same issue. thanks, Pablo.

…

On 01/08/2018 11:02, Johannes Najjar wrote: have you tried {{ serial_mode }} — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ansible_ansible_issues_12170-23issuecomment-2D409504564&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=S57T0QaaR3U1-rdS92VizJ7MMFzQcmoa9SvsdavdKz0&m=yK7T1nGurRdoVF74pYsp2Ww-gi_wzcik9FOhvfi0AO4&s=xx2w8JlL7xtYCFCYV2SVe6ghMflP4n0oJ1XT8yRJiK4&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AoC2bq84TS9DNhwP7QHo2lN6rwu6K2fjks5uMW6dgaJpZM4F1SdA&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=S57T0QaaR3U1-rdS92VizJ7MMFzQcmoa9SvsdavdKz0&m=yK7T1nGurRdoVF74pYsp2Ww-gi_wzcik9FOhvfi0AO4&s=0mncwDiylOIi-1VOf_7Bp6ltumjR5pCnTNSjqh_SWjU&e=>.

-- Oracle <http://www.oracle.com> Pablo Fuentes | Oracle Middleware Consultant Mobile: +34653961879 <tel:+34653961879> Oracle Oracle Consulting Oracle Spain | ORACLE Spain las Rozas Madrid Green Oracle <http://www.oracle.com/commitment> Oracle is committed to developing practices and products that help protect the environment

crossan007 · 2018-09-06T15:46:27Z

It seems to me that the run once + loop + delegate (limited to serial=1 behavior): approach does not work on an include_tasks statement when the inventory has two "inventory hosts", with each host having the same value for ansible_host.

Given two inventory hosts with the same ansible_host, the approach does run twice; however, both iterations are against the same host.

alexhexabeam · 2018-09-21T01:29:57Z

There is a major problem with most of the proposed workarounds is CPU and memory usage, as well as massive deployment slowdowns. The method of checking that the inventory_hostname == item in a with_items loop is O(n^2), which combined with a large number of hosts can balloon memory and CPU load greatly.

With 200 hosts, I've seen ansible use 20GB of ram and 70 load avg just to serialize an include_tasks block. That particular task took several minutes just to decide which hosts to include.

dagwieers · 2019-02-01T18:23:23Z

Anyone is invited to test #42528 for their use-cases, and add a 👍 to the PR if you approve.