Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix include loading for handler runs #69459

Closed
wants to merge 1 commit into from

Conversation

mwhahaha
Copy link
Contributor

SUMMARY

When using the free (or host_pinned) strategy if your playbook includes
a tasks, roles and handlers, an error can occur when handlers are run
and there are included files to process. In the free.py strategy[0], when
included files are run there is an additional check to see if the
included file is a role and to process that differently than if it's an
included file.

[0]

try:
if included_file._is_role:
new_ir = self._copy_included_file(included_file)
new_blocks, handler_blocks = new_ir.get_block_list(
play=iterator._play,
variable_manager=self._variable_manager,
loader=self._loader,
)
else:
new_blocks = self._load_included_file(included_file, iterator=iterator)

Resolves #69457

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME
ADDITIONAL INFORMATION

When using the free (or host_pinned) strategy if your playbook includes
a tasks, roles and handlers, an error can occur when handlers are run
and there are included files to process. In the free.py strategy[0], when
included files are run there is an additional check to see if the
included file is a role and to process that differently than if it's an
included file.

[0] https://github.com/ansible/ansible/blob/c8704573396e7480b3e1b33b2ddda2b6325d0d80/lib/ansible/plugins/strategy/free.py#L244-L254

Resolves ansible#69457
@ansibot ansibot added affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. core_review In order to be merged, this PR must follow the core review workflow. needs_triage Needs a first human triage before being processed. new_contributor This PR is the first contribution by a new community member. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels May 12, 2020
@EmilienM
Copy link
Contributor

👍 with the proposed solution. Thanks for fixing that bug Alex!

@sivel
Copy link
Member

sivel commented May 12, 2020

At a minimum, this is going to need integration tests to validate what you are fixing, as well as a changelog.

Without a reproducer, I'm not sure I fully understand what is being fixed here. Roles cannot be used as handlers.

@mwhahaha
Copy link
Contributor Author

This is not about roles being used as handlers but rather output of IncludedFile.process_include_results can be roles.

@sivel
Copy link
Member

sivel commented May 12, 2020

This is not about roles being used as handlers but rather output of IncludedFile.process_include_results can be roles.

I'm still not sure I understand, but I'll await integration tests that include a reproducer.

@mwhahaha
Copy link
Contributor Author

I'll try and figure out a reproducer but it seems to be a race condition based on when/how this is called when using the free strategy.

@mwhahaha
Copy link
Contributor Author

Right now I have a very complex set of playbooks that hit it at a scale of 13 nodes but not 4, however it is hit consistently on at least 2-3 nodes when deploying my playbooks using an environment of size 13

@ansibot
Copy link
Contributor

ansibot commented May 12, 2020

The test ansible-test sanity --test pep8 [explain] failed with 1 error:

lib/ansible/plugins/strategy/__init__.py:1014:33: E126: continuation line over-indented for hanging indent

click here for bot help

@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed core_review In order to be merged, this PR must follow the core review workflow. labels May 12, 2020
@sivel
Copy link
Member

sivel commented May 12, 2020

I may have an idea what is happening, and if correct, would mean that this fix is insufficient.

Both _wait_on_handler_results and _wait_on_pending_results use _process_pending_results.

So I am guessing that at least 1 host has made it to processing handlers, and other hosts are still executing normal tasks.

As such, _wait_on_handler_results is consuming include results that are meant for _wait_on_pending_results.

As such, while this would fix that situation, it would use _do_handler_run to execute those tasks, rather than run. As such, you wouldn't get true free task execution once this starts happening. This could cause all manner of inconsistencies for callback plugins, and the results of the playbook execution would not be deterministic.

We've been talking about adding a new ITERATING_HANDLERS state to PlayIterator, and to remove _do_handler_run, and inject notified handlers into the PlayIterator. Which is of course a lot more work, but then uses the strategies run method, for executing handlers as well as regular tasks, since _do_handler_run is effectively it's own strategy, that mostly adheres to linear concepts.

@mwhahaha
Copy link
Contributor Author

Yea sounds like what's happening because with free, the handler would get hit on a different host first while others are doing other tasks

@mwhahaha
Copy link
Contributor Author

I think this is triggered because we use meta: flush_handlers in some roles.

@mwhahaha
Copy link
Contributor Author

I wonder if this can be addressed by adding logic to the _execute_meta to only run handlers on thte target host instead of doing so for all systems. Currently it looks like run_handlers doesn't take a host argument so it'll try and run actions everywhere. Other meta actions like clear_facts, clear_host_errors, end_play and end_host take target_host into consideration when executing.

@mwhahaha
Copy link
Contributor Author

Yea this doesn't address the problem. I was sort of able to work around it by treating the flush_handlers task like a regular task when advancing the hosts when running under free by wedging something in around here: https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/strategy/free.py#L192-L194

But I end up with issues in _process_pending_results because the task doesn't get found in the self._queued_task_cache. I'm continuing to try and figure something out that isn't "remove all the handlers".

@mwhahaha
Copy link
Contributor Author

I commented over in the original issue, but I have a basic reproducer available @ https://github.com/mwhahaha/ansible-69457

It reproduces with a decent number of nodes like 13

@mattclay mattclay added the ci_verified Changes made in this PR are causing tests to fail. label May 13, 2020
@mwhahaha
Copy link
Contributor Author

#69498 resolves this

@mwhahaha mwhahaha closed this May 13, 2020
@mkrizek mkrizek removed the needs_triage Needs a first human triage before being processed. label May 14, 2020
@ansible ansible locked and limited conversation to collaborators Jun 10, 2020
@mwhahaha mwhahaha deleted the include-fix-69457 branch June 22, 2020 14:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. ci_verified Changes made in this PR are causing tests to fail. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. new_contributor This PR is the first contribution by a new community member. support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using the free strategy with a mix of tasks and roles with handlers can lead to failure
7 participants