Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ignore unreachable option to the serial variable Feature #37309 #37587

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/docsite/rst/reference_appendices/glossary.rst
Expand Up @@ -398,7 +398,8 @@ when a term comes up on the mailing list.
default is to address the batch size all at once, so this is something
that you must opt-in to. OS configuration (such as making sure config
files are correct) does not typically have to use the rolling update
model, but can do so if desired.
model, but can do so if desired. This option permit also to ignore
unreachable nodes in the group of machines.

Serial
.. seealso::
Expand Down
17 changes: 17 additions & 0 deletions docs/docsite/rst/reference_appendices/test_strategies.rst
Expand Up @@ -216,6 +216,23 @@ the pool.
In the event of a problem, fix the few servers that fail using Ansible's automatically generated
retry file to repeat the deploy on just those servers.

You can also ignore unreachable node to go ahead with your job, suppose for example you have 6 nodes and you
want to dived it in groups of 2, an example of serialization option should be::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar, typos here. Suggested fix:

You can also ignore unreachable nodes to go ahead with your job. For example, if you have six (6) nodes and you
want to divide it into groups of two (2), an example of the serialization option should be::

Copy link
Author

@zopar zopar Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jimi-c , what do you think about this modification?
If this is ok for you, I will do same modification in the other files. Eventually I could open a new clean merge request and we will close this one.

You can also ignore unreachable nodes to go ahead with your job. For example, if you have six (6) nodes and you want to divide it into groups of two (2), an example of the serialization option should be::

---

- hosts: webservers
  serial: [[2, 1], 2, 2]

This is also equivalent to::

---

- hosts: webservers
  serial: [[2, 1], [2,0], [2,0]]

We have three (3) groups of nodes and every group contains two (2) nodes:
The first group is represented by the list [2, 1] and has a value of 1 (True).
This boolean value represents the answer to the question "Do you want to ignore unreachable machines?".
The second and third group are both represented by [2, 0] and the implicit answer to the previous question is 0 (False).
If you do not explicitly set 1 (True) the default value will be 0 (False).
At this point, the ansible run will have this behavior:
If in the first group one or both nodes are unreachable, ansible will not stop and continue with the
second group.
If in the second group one node is unreachable, ansible will not stop and continue with the third group;
this is the default behavior, also in old versions of ansible.
If in the second group both nodes are unreachable, ansible will stop the job.


---

- hosts: webservers
serial: [[2, 1], 2, 2]


In this case, the first group represented by the list [2, 1] has a value of 1 (True) regarding the question
"Do you want to ignroe unreachable?". For others groups the implicit answer to the question is 0 (False).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More typos here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dharmabumstead review more wording/style here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimi-c Thank you, let me know what do you like to modify, if necessary I will do a new push request for the merge.

At this point if one or both machines in the first group of 2 are unreachable, ansible go ahead with the
second group and does not stop the job. If in the second group one machine is unreachable, ansible does not
stop (this is the default behavoir also in old versions of ansible). If in the second group both machines
are unreachable, ansible stop the job.


Achieving Continuous Deployment
```````````````````````````````

Expand Down
17 changes: 17 additions & 0 deletions docs/docsite/rst/user_guide/guide_rolling_upgrade.rst
Expand Up @@ -237,6 +237,23 @@ Here is the next part of the update play::
- The ``serial`` keyword forces the play to be executed in 'batches'. Each batch counts as a full play with a subselection of hosts.
This has some consequences on play behavior. For example, if all hosts in a batch fails, the play fails, which in turn fails the entire run. You should consider this when combining with ``max_fail_percentage``.

To prevent that unreachable are counted as failed and stop the play, you can ignore unreachable nodes. You need to use a list of list. Suppose for example that you have 6 nodes and you
want to divide it in groups of 2, an example of serialization option should be::

---

- hosts: webservers
serial: [[2, 1], 2, 2]


In this case, the first group represented by the list [2, 1] has a value of 1 (True) regarding the question
"Do you want to ignroe unreachable?". For others groups the implicit answer to the question is 0 (False).
At this point if one or both machines in the first group of 2 are unreachable, ansible go ahead with the
second group and does not stop the play. If in the second group one machine is unreachable, ansible does not
stop (this is the default behavoir also in old versions of ansible). If in the second group both machines
are unreachable, ansible stop the play. Valid value are only 0 and 1, False and True, other values will be
ignored and 0 will be used.

The ``pre_tasks`` keyword just lets you list tasks to run before the roles are called. This will make more sense in a minute. If you look at the names of these tasks, you can see that we are disabling Nagios alerts and then removing the webserver that we are currently updating from the HAProxy load balancing pool.

The ``delegate_to`` and ``loop`` arguments, used together, cause Ansible to loop over each monitoring server and load balancer, and perform that operation (delegate that operation) on the monitoring or load balancing server, "on behalf" of the webserver. In programming terms, the outer loop is the list of web servers, and the inner loop is the list of monitoring servers.
Expand Down
27 changes: 27 additions & 0 deletions docs/docsite/rst/user_guide/playbooks_delegation.rst
Expand Up @@ -111,6 +111,32 @@ You can also mix and match the values::
.. note::
No matter how small the percentage, the number of hosts per pass will always be 1 or greater.

You can also ignore unreachable node to go ahead with your job, suppose for example you have 6 nodes and you
want to divide it in groups of 2, an example of serialization option should be::

---

- hosts: webservers
serial: [[2, 1], 2, 2]


In this case, the first group represented by the list [2, 1] has a value of 1 (True) regarding the question
"Do you want to ignroe unreachable?". For others groups the implicit answer to the question is 0 (False).
At this point if one or both machines in the first group of 2 are unreachable, ansible go ahead with the
second group and does not stop the play. If in the second group one machine is unreachable, ansible does not
stop (this is the default behavoir also in old versions of ansible). If in the second group both machines
are unreachable, ansible stop the play.

.. note::
Valid value are only 0 and 1, False and True, other values will be ignored and 0 will be used. Example::

---

- hosts: webservers
serial: [[2, 1], [2,0], 2, [4, False], [3, True], [2, "False"]]


Last value will result as [2, 0] because "False" in this case is a string and not a boolean.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to modify the note to

.. note::
Valid values are only 0 (False) and 1 (True), all other values will be ignored and 0 will be used. Example::


- hosts: webservers
  serial: [[2, 1], [2, 2], [2, "one"], [2,0], 2, [4, False], [3, True], [2, "False"]]

The second list [2, 2] is not valid and ansible will replace it with [2, 0]
The third list [2, "one"] is not valid and ansible will replace it with [2, 0]
In the fifth position we have a single value (2) that implicitly represents [2, 0]
The last list, [2, "False"] is not valid, because "False" in this case is a string and not a boolean and ansible will replace it with [2, 0]

.. _maximum_failure_percentage:

Expand All @@ -132,6 +158,7 @@ In the above example, if more than 3 of the 10 servers in the group were to fail

The percentage set must be exceeded, not equaled. For example, if serial were set to 4 and you wanted the task to abort
when 2 of the systems failed, the percentage should be set at 49 rather than 50.
Unreachables machine are always not considered as failed if you use max_fail_percentage.

.. _delegation:

Expand Down
31 changes: 22 additions & 9 deletions lib/ansible/executor/playbook_executor.py
Expand Up @@ -158,11 +158,11 @@ def run(self):

break_play = False
# we are actually running plays
batches = self._get_serialized_batches(play)
batches, ignores = self._get_serialized_batches(play)
if len(batches) == 0:
self._tqm.send_callback('v2_playbook_on_play_start', play)
self._tqm.send_callback('v2_playbook_on_no_hosts_matched')
for batch in batches:
for batch, ignore in zip(batches, ignores):
# restrict the inventory to the hosts in the serialized batch
self._inventory.restrict_to_hosts(batch)
# and run it...
Expand All @@ -176,9 +176,13 @@ def run(self):
# check the number of failures here, to see if they're above the maximum
# failure percentage allowed, or if any errors are fatal. If either of those
# conditions are met, we break out, otherwise we only break out if the entire
# batch failed
failed_hosts_count = len(self._tqm._failed_hosts) + len(self._tqm._unreachable_hosts) - \
(previously_failed + previously_unreachable)
# batch failed. If ignore value is 1 we do not count unreachable hosts as failed.
# We have an ignore value for every hosts group.
if ignore == 1:
failed_hosts_count = len(self._tqm._failed_hosts) - previously_failed
else:
failed_hosts_count = len(self._tqm._failed_hosts) + len(self._tqm._unreachable_hosts) - \
(previously_failed + previously_unreachable)

if len(batch) == failed_hosts_count:
break_play = True
Expand Down Expand Up @@ -249,22 +253,29 @@ def run(self):

def _get_serialized_batches(self, play):
'''
Returns a list of hosts, subdivided into batches based on
the serial size specified in the play.
Returns a list of hosts subdivided into batches based on the serial size specified in the play
and a list of 0 and 1 values, used to ignore or not unreachable hosts during the play.
'''

# make sure we have a unique list of hosts
all_hosts = self._inventory.get_hosts(play.hosts, order=play.order)
all_hosts_len = len(all_hosts)

# Extract serial batch list
serial_batch_list = [i[0] if isinstance(i, list) else i for i in play.serial]

# ignore_unreachable_list contains 0,1 value, if 0, host unreachable are counted as failed, othewise
# are not counted as failed. If a value is not 0 or 1, we pass 0 as standard
ignore_unreachable_list = [i[1] if isinstance(i, list) and i[1] == 1 else 0 for i in play.serial]

# the serial value can be listed as a scalar or a list of
# scalars, so we make sure it's a list here
serial_batch_list = play.serial
if len(serial_batch_list) == 0:
serial_batch_list = [-1]

cur_item = 0
serialized_batches = []
ignore_unreachable = []

while len(all_hosts) > 0:
# get the serial value from current item in the list
Expand All @@ -275,6 +286,7 @@ def _get_serialized_batches(self, play):
# to the current serial item size
if serial <= 0:
serialized_batches.append(all_hosts)
ignore_unreachable.append(0)
break
else:
play_hosts = []
Expand All @@ -283,6 +295,7 @@ def _get_serialized_batches(self, play):
play_hosts.append(all_hosts.pop(0))

serialized_batches.append(play_hosts)
ignore_unreachable.append(ignore_unreachable_list[cur_item])

# increment the current batch list item number, and if we've hit
# the end keep using the last element until we've consumed all of
Expand All @@ -291,7 +304,7 @@ def _get_serialized_batches(self, play):
if cur_item > len(serial_batch_list) - 1:
cur_item = len(serial_batch_list) - 1

return serialized_batches
return serialized_batches, ignore_unreachable

def _generate_retry_inventory(self, retry_path, replay_hosts):
'''
Expand Down