New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for "serial" on an individual task #12170

Closed
djudd opened this Issue Aug 31, 2015 · 63 comments

Comments

Projects
None yet
@djudd

djudd commented Aug 31, 2015

(Feature Idea)

The natural way to handle configuration or other updates that require a rolling restart would be to perform updates in parallel, then notify a handler, which performs a restart with serial. But this is not possible, requiring either manual rolling restart or ugly hacks. See https://groups.google.com/forum/#!topic/ansible-project/rBcWzXjt-Xc

@srgvg

This comment has been minimized.

Show comment
Hide comment
@srgvg

srgvg Aug 31, 2015

Member

👍

Member

srgvg commented Aug 31, 2015

👍

@neutrinus

This comment has been minimized.

Show comment
Hide comment
@neutrinus

neutrinus Sep 11, 2015

+1
This would be great to have for our ansible-ceph deployment scripts!

neutrinus commented Sep 11, 2015

+1
This would be great to have for our ansible-ceph deployment scripts!

@ccciudatu

This comment has been minimized.

Show comment
Hide comment
@ccciudatu

ccciudatu Sep 12, 2015

Contributor

👍

Contributor

ccciudatu commented Sep 12, 2015

👍

@JensRantil

This comment has been minimized.

Show comment
Hide comment
@JensRantil
Contributor

JensRantil commented Oct 1, 2015

@pratikdhandharia

This comment has been minimized.

Show comment
Hide comment
@pratikdhandharia

pratikdhandharia commented Oct 7, 2015

+1

@jgrmnprz

This comment has been minimized.

Show comment
Hide comment
@jgrmnprz

jgrmnprz commented Oct 13, 2015

+1

@ehorne

This comment has been minimized.

Show comment
Hide comment
@ehorne

ehorne Oct 15, 2015

+1; however the serial needs not to fail the entire play for remaining hosts if all hosts in the current serial fail. I often have to do rolling restarts, one server at a time across 50+ servers. It sucks when the play fails on server 3 because server 3 had some strange unexpected condition that caused the restart to fail. Setting max_failpercent to something higher than 100% should force ansible to continue the play for remaining hosts.

ehorne commented Oct 15, 2015

+1; however the serial needs not to fail the entire play for remaining hosts if all hosts in the current serial fail. I often have to do rolling restarts, one server at a time across 50+ servers. It sucks when the play fails on server 3 because server 3 had some strange unexpected condition that caused the restart to fail. Setting max_failpercent to something higher than 100% should force ansible to continue the play for remaining hosts.

@herdani

This comment has been minimized.

Show comment
Hide comment
@herdani

herdani commented Oct 20, 2015

+1

@jimi-c jimi-c removed the P4 label Dec 7, 2015

@yeroc

This comment has been minimized.

Show comment
Hide comment
@yeroc

yeroc commented Jan 22, 2016

+1!

@AliakseiKorneu

This comment has been minimized.

Show comment
Hide comment
@AliakseiKorneu

AliakseiKorneu commented Mar 3, 2016

+1

@StephaneBunel

This comment has been minimized.

Show comment
Hide comment
@StephaneBunel

StephaneBunel commented Mar 18, 2016

+1

@minhdanh

This comment has been minimized.

Show comment
Hide comment
@minhdanh

minhdanh commented Apr 14, 2016

👍

@folex

This comment has been minimized.

Show comment
Hide comment
@folex

folex commented Apr 25, 2016

+1

@raittes

This comment has been minimized.

Show comment
Hide comment
@raittes

raittes commented Jun 8, 2016

+1

@loechel

This comment has been minimized.

Show comment
Hide comment
@loechel

loechel Jul 12, 2016

+1

That's a good idea, but should not only go from playbooks to individual tasks but also to blocks, including all dependent options like max_fail_percentage and run_once.

The update-reboot example could explain that easily:

  • Task: Update Server (all parallel)
  • optional Task: Check if server needs restart
  • Block: (serial)
    • Restart Server
    • Wait for Server to come back
  • Additional Steps (all in parallel)

loechel commented Jul 12, 2016

+1

That's a good idea, but should not only go from playbooks to individual tasks but also to blocks, including all dependent options like max_fail_percentage and run_once.

The update-reboot example could explain that easily:

  • Task: Update Server (all parallel)
  • optional Task: Check if server needs restart
  • Block: (serial)
    • Restart Server
    • Wait for Server to come back
  • Additional Steps (all in parallel)
@stefiienko

This comment has been minimized.

Show comment
Hide comment
@stefiienko

stefiienko commented Jul 19, 2016

+1

@jeffrey4l

This comment has been minimized.

Show comment
Hide comment
@jeffrey4l

jeffrey4l Jul 25, 2016

Contributor

+1 for this.

Contributor

jeffrey4l commented Jul 25, 2016

+1 for this.

@bozzo

This comment has been minimized.

Show comment
Hide comment
@bozzo

bozzo commented Jul 27, 2016

+1

@leseb

This comment has been minimized.

Show comment
Hide comment
@leseb

leseb Jul 29, 2016

+1, rolling restarts are useful for distributed systems that are chained all together. For Ceph, we don't want to restart all the storage daemons at the same time because the configuration file changed.

leseb commented Jul 29, 2016

+1, rolling restarts are useful for distributed systems that are chained all together. For Ceph, we don't want to restart all the storage daemons at the same time because the configuration file changed.

@AncientLeGrey

This comment has been minimized.

Show comment
Hide comment
@AncientLeGrey

AncientLeGrey commented Aug 1, 2016

+1

@akulakhan

This comment has been minimized.

Show comment
Hide comment
@akulakhan

akulakhan commented Aug 5, 2016

+1

@leseb leseb referenced this issue Aug 9, 2016

Closed

Improve handlers #691

@tklicki

This comment has been minimized.

Show comment
Hide comment
@tklicki

tklicki commented Sep 6, 2016

+1

@ansibot ansibot added the affects_2.1 label Sep 8, 2016

@alvaroaleman

This comment has been minimized.

Show comment
Hide comment
@alvaroaleman

alvaroaleman Sep 22, 2016

Contributor

workaround:

- name: service restart
  # serial: 1 would be the proper solution here, but that can only be set on play level
  # upstream issue: https://github.com/ansible/ansible/issues/12170
  run_once: true
  with_items: "[{% for h in play_hosts  %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]"
  delegate_to: "{{ item }}"
  command: "/bin/service restart"
Contributor

alvaroaleman commented Sep 22, 2016

workaround:

- name: service restart
  # serial: 1 would be the proper solution here, but that can only be set on play level
  # upstream issue: https://github.com/ansible/ansible/issues/12170
  run_once: true
  with_items: "[{% for h in play_hosts  %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]"
  delegate_to: "{{ item }}"
  command: "/bin/service restart"
@leseb

This comment has been minimized.

Show comment
Hide comment
@leseb

leseb Sep 22, 2016

@alvaroaleman thanks for your suggestion however it seems to lead to this bug: #15103
At least for me, I applied your workaround like so: with_items: "[{% for h in groups[mon_group_name] %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]".

Am I missing something?

leseb commented Sep 22, 2016

@alvaroaleman thanks for your suggestion however it seems to lead to this bug: #15103
At least for me, I applied your workaround like so: with_items: "[{% for h in groups[mon_group_name] %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]".

Am I missing something?

@dpedu2

This comment has been minimized.

Show comment
Hide comment
@dpedu2

dpedu2 Nov 21, 2017

+1

Roles are effectively USELESS at large scale without this.

dpedu2 commented Nov 21, 2017

+1

Roles are effectively USELESS at large scale without this.

@sangrealest

This comment has been minimized.

Show comment
Hide comment
@sangrealest

sangrealest Dec 1, 2017

+1 really good idea

sangrealest commented Dec 1, 2017

+1 really good idea

@JoelFeiner

This comment has been minimized.

Show comment
Hide comment
@JoelFeiner

JoelFeiner Dec 6, 2017

Also in support of this. We use roles and in order to use any of the workarounds, we would have to extract just the serialized tasks into the play or a task file included from it, thus breaking role encapsulation.

JoelFeiner commented Dec 6, 2017

Also in support of this. We use roles and in order to use any of the workarounds, we would have to extract just the serialized tasks into the play or a task file included from it, thus breaking role encapsulation.

@shellshock1953

This comment has been minimized.

Show comment
Hide comment
@shellshock1953

shellshock1953 commented Dec 8, 2017

+1

@joshovi

This comment has been minimized.

Show comment
Hide comment
@joshovi

joshovi commented Dec 21, 2017

+1

@hryamzik

This comment has been minimized.

Show comment
Hide comment
@hryamzik

hryamzik Jan 17, 2018

Contributor

If the task failes on one of this "pseudo-serial" hosts, the task get executed on the other hosts instead of failing immediately. No matter what we tried we could not skip the playbook directly after the failed host.

@kami8607 I've faced the same issue with failures as rolling updates and restarts require the whole playbook to fail on any error. Solved with any_errors_fatal: true.

I also confirm that this solution works with include_tasks, however check mode is executed in parallel.

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item
Contributor

hryamzik commented Jan 17, 2018

If the task failes on one of this "pseudo-serial" hosts, the task get executed on the other hosts instead of failing immediately. No matter what we tried we could not skip the playbook directly after the failed host.

@kami8607 I've faced the same issue with failures as rolling updates and restarts require the whole playbook to fail on any error. Solved with any_errors_fatal: true.

I also confirm that this solution works with include_tasks, however check mode is executed in parallel.

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item
@cubranic

This comment has been minimized.

Show comment
Hide comment
@cubranic

cubranic Jan 29, 2018

If you look at comments getting 👎 , it's because they're pointless, as their entire content is saying "+1".

cubranic commented Jan 29, 2018

If you look at comments getting 👎 , it's because they're pointless, as their entire content is saying "+1".

@hryamzik

This comment has been minimized.

Show comment
Hide comment
@hryamzik

hryamzik Jan 29, 2018

Contributor

There's also a working solution in this thread.

Contributor

hryamzik commented Jan 29, 2018

There's also a working solution in this thread.

@kami8607

This comment has been minimized.

Show comment
Hide comment
@kami8607

kami8607 Jan 29, 2018

there is a workaround that works for most parts but no solution

kami8607 commented Jan 29, 2018

there is a workaround that works for most parts but no solution

@jonhatalla

This comment has been minimized.

Show comment
Hide comment
@jonhatalla

jonhatalla Feb 2, 2018

There is not a working solution in this thread. While it may work for some, its not a solution.
register doesn't work properly (will have gist to back this up later) - my guess is its not the only function that will not perform correctly.

jonhatalla commented Feb 2, 2018

There is not a working solution in this thread. While it may work for some, its not a solution.
register doesn't work properly (will have gist to back this up later) - my guess is its not the only function that will not perform correctly.

@hryamzik

This comment has been minimized.

Show comment
Hide comment
@hryamzik

hryamzik Feb 5, 2018

Contributor

@jonhatalla I don't have any issues with register, can you share a gist or a repo that doesn't work?

Contributor

hryamzik commented Feb 5, 2018

@jonhatalla I don't have any issues with register, can you share a gist or a repo that doesn't work?

@guillemsola

This comment has been minimized.

Show comment
Hide comment
@guillemsola

guillemsola Feb 7, 2018

I would like to limit only an artifacts download task (due to some constrains) but to execute the rest of tasks in parallel.

I have come with a proposal after reading comments which still is not working as desired for the download case. Notice that 2 is the maximum number of concurrent tasks executions desired.

    - name: Download at ratio three at most
      win_get_url:
        url: http://ipv4.download.thinkbroadband.com/100MB.zip
        dest: c:/ansible/100MB.zip
        force: yes
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

While this will match the when on each iteration only if for certain hosts I still can see all the server performing the download at the same time.

Another way of testing it is with debug a message and a add a delay between iterations. This way is clear that only two are executed at each iterations.

    - debug:
        msg: "Item {{ item }} with modulus {{ (( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) }}"
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      loop_control:
        pause: 2
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

I discovered this issue thread thanks to this SO question

Any idea why the download doesn't seem to work as the debug message does?

guillemsola commented Feb 7, 2018

I would like to limit only an artifacts download task (due to some constrains) but to execute the rest of tasks in parallel.

I have come with a proposal after reading comments which still is not working as desired for the download case. Notice that 2 is the maximum number of concurrent tasks executions desired.

    - name: Download at ratio three at most
      win_get_url:
        url: http://ipv4.download.thinkbroadband.com/100MB.zip
        dest: c:/ansible/100MB.zip
        force: yes
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

While this will match the when on each iteration only if for certain hosts I still can see all the server performing the download at the same time.

Another way of testing it is with debug a message and a add a delay between iterations. This way is clear that only two are executed at each iterations.

    - debug:
        msg: "Item {{ item }} with modulus {{ (( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) }}"
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      loop_control:
        pause: 2
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

I discovered this issue thread thanks to this SO question

Any idea why the download doesn't seem to work as the debug message does?

@ansibot ansibot added feature and removed feature_idea labels Mar 2, 2018

@zwindler

This comment has been minimized.

Show comment
Hide comment
@zwindler

zwindler Mar 6, 2018

Contributor

At this point I don't see any use case not covered by any of the workarounds above

As previous commenters said, I also see no workaround for handlers restarting services in a cluster where you don't want to restart all nodes at the same time. So there is at least one usecase where there doesn't seem to be a solution... This render handlers in this case totally useless, as handlers are used to restart the services WHEN this is needed only.

And all other workarounds (handling concurrents write on a local hosts file for example) do work but they are so ugly...

Finally, I concurr, closing an issue because it's too big a problem to solve is a bit depressing...

Contributor

zwindler commented Mar 6, 2018

At this point I don't see any use case not covered by any of the workarounds above

As previous commenters said, I also see no workaround for handlers restarting services in a cluster where you don't want to restart all nodes at the same time. So there is at least one usecase where there doesn't seem to be a solution... This render handlers in this case totally useless, as handlers are used to restart the services WHEN this is needed only.

And all other workarounds (handling concurrents write on a local hosts file for example) do work but they are so ugly...

Finally, I concurr, closing an issue because it's too big a problem to solve is a bit depressing...

@hryamzik

This comment has been minimized.

Show comment
Hide comment
@hryamzik

hryamzik Mar 12, 2018

Contributor

@zwindler you can use tasks instead of handlers. I actually use rolling restarts with API checks. Implemented with include_task, works as expected. You can even tryinclude_task` directly in handlers but I've no idea if that works or not.

Contributor

hryamzik commented Mar 12, 2018

@zwindler you can use tasks instead of handlers. I actually use rolling restarts with API checks. Implemented with include_task, works as expected. You can even tryinclude_task` directly in handlers but I've no idea if that works or not.

@zwindler

This comment has been minimized.

Show comment
Hide comment
@zwindler

zwindler Mar 12, 2018

Contributor

Not really sure I understand what you suggest.

  • Do you mean you use include_task to restart the services, and only do so with a when: clause to check whether or not a restart has to occur on this node ?
  • How can you assure that only one node has his services restarted at a time (that's the whole issue here) ? Do you mean you can add serial with include_task ?
Contributor

zwindler commented Mar 12, 2018

Not really sure I understand what you suggest.

  • Do you mean you use include_task to restart the services, and only do so with a when: clause to check whether or not a restart has to occur on this node ?
  • How can you assure that only one node has his services restarted at a time (that's the whole issue here) ? Do you mean you can add serial with include_task ?
@hryamzik

This comment has been minimized.

Show comment
Hide comment
@hryamzik

hryamzik Mar 12, 2018

Contributor
- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

in this case serial=1 is simulated for all the tasks inside install_configure.yml.

Contributor

hryamzik commented Mar 12, 2018

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

in this case serial=1 is simulated for all the tasks inside install_configure.yml.

@erpadmin

This comment has been minimized.

Show comment
Hide comment
@erpadmin

erpadmin Jun 18, 2018

how has healthy_servers been defined? how is it used in the workaround? i dont see it being referenced. I want a serial task to apply to all hosts that the playbook is being ran against.

erpadmin commented Jun 18, 2018

how has healthy_servers been defined? how is it used in the workaround? i dont see it being referenced. I want a serial task to apply to all hosts that the playbook is being ran against.

@hryamzik

This comment has been minimized.

Show comment
Hide comment
@hryamzik

hryamzik Jun 18, 2018

Contributor

@erpadmin why don't you use play_hosts in this case?

Contributor

hryamzik commented Jun 18, 2018

@erpadmin why don't you use play_hosts in this case?

@pablofuentesbodion

This comment has been minimized.

Show comment
Hide comment
@pablofuentesbodion

pablofuentesbodion Aug 1, 2018

Hi all,
we have detected similar issue and we can not pass the argument and use in the playbook:
serial: ${serial_mode}
but if fails with:
ValueError: invalid literal for int() with base 10: 'serial_mode'

it seems to point to this bug but would like to clarify:

  • is there any workaround for this issue ??
  • what is the official version with this fix ?

thanks for your help and please keep us posted.

best regards, Pablo.

pablofuentesbodion commented Aug 1, 2018

Hi all,
we have detected similar issue and we can not pass the argument and use in the playbook:
serial: ${serial_mode}
but if fails with:
ValueError: invalid literal for int() with base 10: 'serial_mode'

it seems to point to this bug but would like to clarify:

  • is there any workaround for this issue ??
  • what is the official version with this fix ?

thanks for your help and please keep us posted.

best regards, Pablo.

@pablofuentesbodion

This comment has been minimized.

Show comment
Hide comment
@pablofuentesbodion

pablofuentesbodion Aug 1, 2018

pablofuentesbodion commented Aug 1, 2018

@crossan007

This comment has been minimized.

Show comment
Hide comment
@crossan007

crossan007 Sep 6, 2018

Contributor

It seems to me that the run once + loop + delegate (limited to serial=1 behavior): approach does not work on an include_tasks statement when the inventory has two "inventory hosts", with each host having the same value for ansible_host.

Given two inventory hosts with the same ansible_host, the approach does run twice; however, both iterations are against the same host.

Contributor

crossan007 commented Sep 6, 2018

It seems to me that the run once + loop + delegate (limited to serial=1 behavior): approach does not work on an include_tasks statement when the inventory has two "inventory hosts", with each host having the same value for ansible_host.

Given two inventory hosts with the same ansible_host, the approach does run twice; however, both iterations are against the same host.

@alexhexabeam

This comment has been minimized.

Show comment
Hide comment
@alexhexabeam

alexhexabeam Sep 21, 2018

There is a major problem with most of the proposed workarounds is CPU and memory usage, as well as massive deployment slowdowns. The method of checking that the inventory_hostname == item in a with_items loop is O(n^2), which combined with a large number of hosts can balloon memory and CPU load greatly.

With 200 hosts, I've seen ansible use 20GB of ram and 70 load avg just to serialize an include_tasks block. That particular task took several minutes just to decide which hosts to include.

alexhexabeam commented Sep 21, 2018

There is a major problem with most of the proposed workarounds is CPU and memory usage, as well as massive deployment slowdowns. The method of checking that the inventory_hostname == item in a with_items loop is O(n^2), which combined with a large number of hosts can balloon memory and CPU load greatly.

With 200 hosts, I've seen ansible use 20GB of ram and 70 load avg just to serialize an include_tasks block. That particular task took several minutes just to decide which hosts to include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment