Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Handle disappearing hosts #57129

Open
wants to merge 4 commits into
base: devel
Choose a base branch
from

Conversation

@bcoca
Copy link
Member

@bcoca bcoca commented May 29, 2019

fixes #55669

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

core

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented May 30, 2019

I built a reproducer for this with an inventory plugin (intending to turn it into a test for this PR), but as-is doesn't fix the issue (piqued my interest as I've often wondered how we'd handle disappearing hosts)...

@bcoca
Copy link
Member Author

@bcoca bcoca commented May 30, 2019

@nitzmahone what is the reproducer? original issue was mostly a confluence of race conditions and i was having a hard time making it happen.

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented May 30, 2019

vanish.py

from ansible.plugins.inventory import BaseInventoryPlugin

runcount = 0


class InventoryModule(BaseInventoryPlugin):

    NAME = 'vanish'

    def verify_file(self, path):
        if not path.endswith('.vanish.yml'):
            return False
        return super(InventoryModule, self).verify_file(path)

    def parse(self, inventory, loader, path, cache=True):
        global runcount
        runcount += 1
        group = 'vanishgroup'
        inventory.add_group(group)
        inventory.add_host('h1', group=group)
        print('runcount is %d' % runcount)
        if runcount <= 1:
            inventory.add_host('disappearing_host', group=group)
        inventory.set_variable(group, 'ansible_connection', 'local')

disappearing_host.yml

- hosts: vanishgroup
  gather_facts: no
  tasks:
  - shell: echo yo
    notify: dude
  - meta: refresh_inventory
  - ping:
  handlers:
  - name: dude
    debug: msg=howdy

disappear.vanish.yml

plugin: vanish

ansible-playbook -i disappear.vanish.yml disappearing_host.yml

reproduces it 100% for me

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented May 30, 2019

I suspect there are probably a number of other potential hazards for this, but handler exec is a big one. It seems to work as expected without a handler and using the default callback.

@bcoca
Copy link
Member Author

@bcoca bcoca commented May 31, 2019

this looks like diff case than the one i was handling: free strat + one of disappearing hosts being slow to return previous result when inventory (run_once) was refreshed.

so we probably have multiple cases that cause errors when a host disappears mid play.

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented May 31, 2019

Yeah, I thought about strategy: free too, though I didn't see anything mentioning that in the original issue... Reproducing that case under free should be pretty easy with this- just do a shell: sleep {{ bla }} and have the disappearing host's value of bla be longer than the others.

@nitzmahone nitzmahone closed this May 31, 2019
@nitzmahone nitzmahone reopened this May 31, 2019
@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented May 31, 2019

I'm also wondering if we should leave "removed" hosts in place and just have an active flag on them that play_iterator et al would just ignore when asking for the next host... Would probably solve a lot of these kinds of problems, since other things like task completion and stats that need to look up hosts would still find them. Otherwise this is probably going to be a long series of race condition whack-a-mole...

@bcoca
Copy link
Member Author

@bcoca bcoca commented May 31, 2019

discussed a lot in irc with original author so ticket does not have good reflection of full issue, still unsure that is the case since i could not easily reproduce.

as for ignoring the host, that is kind of what this does, sets state to 'nothing to do' and puts in removed_keys() so it is skipped even if iterator has it in preset objects

need to check why handlers still run, but also need to discuss --force-handlers in this case as well as meta: clear_host_errors

@bcoca bcoca changed the title Handle disappearing hosts [WIP] Handle disappearing hosts Jun 22, 2019
@ansibot ansibot added the WIP label Jun 22, 2019
@ansibot

This comment has been hidden.

@ansibot
Copy link
Contributor

@ansibot ansibot commented Feb 21, 2020

The test ansible-test sanity --test pylint [explain] failed with 1 error:

lib/ansible/executor/task_queue_manager.py:158:80: undefined-variable: Undefined variable 'AnsibleCollectionRef'

click here for bot help

@KFoxder
Copy link

@KFoxder KFoxder commented Apr 11, 2021

@bcoca Is there any workaround for this issue. Our playbook fails consistently now that we are on Ansible 2.9.19...

@bcoca
Copy link
Member Author

@bcoca bcoca commented Apr 12, 2021

@KFoxder not really, the iterator was not designed for hosts to disappear mid run.

I'm still not sure how to fix this since we don't want to add hosts to run retroactively, it has many weird consequences.

my current thought is to invalidate the run for the hosts that were removed with a new status to avoid it being cleared by meta
and any other way of reintroducing the hosts into the play

I have not figured out how to deal with the magic vars as they should both reflect the hosts the play started out targeting but also avoid trying to use/connect to removed hosts.

jimi-c
jimi-c approved these changes Apr 13, 2021
Co-authored-by: David Shrewsbury <Shrews@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

8 participants