Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for "serial" on an individual task #12170

Closed
djudd opened this issue Aug 31, 2015 · 65 comments
Closed

support for "serial" on an individual task #12170

djudd opened this issue Aug 31, 2015 · 65 comments
Labels
affects_2.1 This issue/PR affects Ansible v2.1 feature This issue/PR relates to a feature request.

Comments

@djudd
Copy link

djudd commented Aug 31, 2015

(Feature Idea)

The natural way to handle configuration or other updates that require a rolling restart would be to perform updates in parallel, then notify a handler, which performs a restart with serial. But this is not possible, requiring either manual rolling restart or ugly hacks. See https://groups.google.com/forum/#!topic/ansible-project/rBcWzXjt-Xc

@srgvg

This comment has been minimized.

@neutrinus

This comment has been minimized.

@ccciudatu

This comment has been minimized.

@JensRantil
Copy link
Contributor

Looks like there's a bounty for this here: https://www.bountysource.com/issues/26342862-support-for-serial-on-an-individual-task

@pratikdhandharia

This comment has been minimized.

1 similar comment
@jgrmnprz
Copy link

+1

@ehorne
Copy link

ehorne commented Oct 15, 2015

+1; however the serial needs not to fail the entire play for remaining hosts if all hosts in the current serial fail. I often have to do rolling restarts, one server at a time across 50+ servers. It sucks when the play fails on server 3 because server 3 had some strange unexpected condition that caused the restart to fail. Setting max_failpercent to something higher than 100% should force ansible to continue the play for remaining hosts.

@herdani

This comment has been minimized.

@jimi-c jimi-c removed the P4 label Dec 7, 2015
@yeroc

This comment has been minimized.

@AliakseiKorneu

This comment has been minimized.

4 similar comments
@StephaneBunel
Copy link

+1

@minhdanh
Copy link

👍

@folex
Copy link

folex commented Apr 25, 2016

+1

@raittes
Copy link

raittes commented Jun 8, 2016

+1

@loechel
Copy link

loechel commented Jul 12, 2016

+1

That's a good idea, but should not only go from playbooks to individual tasks but also to blocks, including all dependent options like max_fail_percentage and run_once.

The update-reboot example could explain that easily:

  • Task: Update Server (all parallel)
  • optional Task: Check if server needs restart
  • Block: (serial)
    • Restart Server
    • Wait for Server to come back
  • Additional Steps (all in parallel)

@stefiienko

This comment has been minimized.

@jeffrey4l

This comment has been minimized.

@bozzo

This comment has been minimized.

@leseb
Copy link

leseb commented Jul 29, 2016

+1, rolling restarts are useful for distributed systems that are chained all together. For Ceph, we don't want to restart all the storage daemons at the same time because the configuration file changed.

@AncientLeGrey

This comment has been minimized.

1 similar comment
@akulakhan
Copy link

+1

@tklicki

This comment has been minimized.

@ansibot ansibot added the affects_2.1 This issue/PR affects Ansible v2.1 label Sep 8, 2016
@alvaroaleman
Copy link
Contributor

workaround:

- name: service restart
  # serial: 1 would be the proper solution here, but that can only be set on play level
  # upstream issue: https://github.com/ansible/ansible/issues/12170
  run_once: true
  with_items: "[{% for h in play_hosts  %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]"
  delegate_to: "{{ item }}"
  command: "/bin/service restart"

@leseb
Copy link

leseb commented Sep 22, 2016

@alvaroaleman thanks for your suggestion however it seems to lead to this bug: #15103
At least for me, I applied your workaround like so: with_items: "[{% for h in groups[mon_group_name] %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]".

Am I missing something?

@joshovi
Copy link

joshovi commented Dec 21, 2017

+1

@hryamzik
Copy link
Contributor

hryamzik commented Jan 17, 2018

If the task failes on one of this "pseudo-serial" hosts, the task get executed on the other hosts instead of failing immediately. No matter what we tried we could not skip the playbook directly after the failed host.

@kami8607 I've faced the same issue with failures as rolling updates and restarts require the whole playbook to fail on any error. Solved with any_errors_fatal: true.

I also confirm that this solution works with include_tasks, however check mode is executed in parallel.

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

@cubranic

This comment has been minimized.

@hryamzik
Copy link
Contributor

There's also a working solution in this thread.

@christiang830
Copy link

there is a workaround that works for most parts but no solution

@jonhatalla
Copy link

There is not a working solution in this thread. While it may work for some, its not a solution.
register doesn't work properly (will have gist to back this up later) - my guess is its not the only function that will not perform correctly.

@hryamzik
Copy link
Contributor

hryamzik commented Feb 5, 2018

@jonhatalla I don't have any issues with register, can you share a gist or a repo that doesn't work?

@guillemsola
Copy link

guillemsola commented Feb 7, 2018

I would like to limit only an artifacts download task (due to some constrains) but to execute the rest of tasks in parallel.

I have come with a proposal after reading comments which still is not working as desired for the download case. Notice that 2 is the maximum number of concurrent tasks executions desired.

    - name: Download at ratio three at most
      win_get_url:
        url: http://ipv4.download.thinkbroadband.com/100MB.zip
        dest: c:/ansible/100MB.zip
        force: yes
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

While this will match the when on each iteration only if for certain hosts I still can see all the server performing the download at the same time.

Another way of testing it is with debug a message and a add a delay between iterations. This way is clear that only two are executed at each iterations.

    - debug:
        msg: "Item {{ item }} with modulus {{ (( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) }}"
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      loop_control:
        pause: 2
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

I discovered this issue thread thanks to this SO question

Any idea why the download doesn't seem to work as the debug message does?

@ansibot ansibot added feature This issue/PR relates to a feature request. and removed feature_idea labels Mar 2, 2018
@zwindler
Copy link
Contributor

zwindler commented Mar 6, 2018

At this point I don't see any use case not covered by any of the workarounds above

As previous commenters said, I also see no workaround for handlers restarting services in a cluster where you don't want to restart all nodes at the same time. So there is at least one usecase where there doesn't seem to be a solution... This render handlers in this case totally useless, as handlers are used to restart the services WHEN this is needed only.

And all other workarounds (handling concurrents write on a local hosts file for example) do work but they are so ugly...

Finally, I concurr, closing an issue because it's too big a problem to solve is a bit depressing...

@hryamzik
Copy link
Contributor

@zwindler you can use tasks instead of handlers. I actually use rolling restarts with API checks. Implemented with include_task, works as expected. You can even try include_task` directly in handlers but I've no idea if that works or not.

@zwindler
Copy link
Contributor

Not really sure I understand what you suggest.

  • Do you mean you use include_task to restart the services, and only do so with a when: clause to check whether or not a restart has to occur on this node ?
  • How can you assure that only one node has his services restarted at a time (that's the whole issue here) ? Do you mean you can add serial with include_task ?

@hryamzik
Copy link
Contributor

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

in this case serial=1 is simulated for all the tasks inside install_configure.yml.

@erpadmin
Copy link

how has healthy_servers been defined? how is it used in the workaround? i dont see it being referenced. I want a serial task to apply to all hosts that the playbook is being ran against.

@hryamzik
Copy link
Contributor

@erpadmin why don't you use play_hosts in this case?

@pablofuentesbodion
Copy link

Hi all,
we have detected similar issue and we can not pass the argument and use in the playbook:
serial: ${serial_mode}
but if fails with:
ValueError: invalid literal for int() with base 10: 'serial_mode'

it seems to point to this bug but would like to clarify:

  • is there any workaround for this issue ??
  • what is the official version with this fix ?

thanks for your help and please keep us posted.

best regards, Pablo.

@pablofuentesbodion
Copy link

pablofuentesbodion commented Aug 1, 2018 via email

@crossan007
Copy link
Contributor

crossan007 commented Sep 6, 2018

It seems to me that the run once + loop + delegate (limited to serial=1 behavior): approach does not work on an include_tasks statement when the inventory has two "inventory hosts", with each host having the same value for ansible_host.

Given two inventory hosts with the same ansible_host, the approach does run twice; however, both iterations are against the same host.

@alexhexabeam
Copy link

There is a major problem with most of the proposed workarounds is CPU and memory usage, as well as massive deployment slowdowns. The method of checking that the inventory_hostname == item in a with_items loop is O(n^2), which combined with a large number of hosts can balloon memory and CPU load greatly.

With 200 hosts, I've seen ansible use 20GB of ram and 70 load avg just to serialize an include_tasks block. That particular task took several minutes just to decide which hosts to include.

@uriklagnes

This comment has been minimized.

@dagwieers
Copy link
Contributor

Anyone is invited to test #42528 for their use-cases, and add a 👍 to the PR if you approve.

@ansible ansible locked and limited conversation to collaborators Apr 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.1 This issue/PR affects Ansible v2.1 feature This issue/PR relates to a feature request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.