Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle disappearing hosts #57129

Open
wants to merge 3 commits into
base: devel
from

Conversation

Projects
None yet
5 participants
@bcoca
Copy link
Member

commented May 29, 2019

fixes #55669

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

core

@nitzmahone

This comment has been minimized.

Copy link
Member

commented May 30, 2019

I built a reproducer for this with an inventory plugin (intending to turn it into a test for this PR), but as-is doesn't fix the issue (piqued my interest as I've often wondered how we'd handle disappearing hosts)...

@bcoca

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

@nitzmahone what is the reproducer? original issue was mostly a confluence of race conditions and i was having a hard time making it happen.

@nitzmahone

This comment has been minimized.

Copy link
Member

commented May 30, 2019

vanish.py

from ansible.plugins.inventory import BaseInventoryPlugin

runcount = 0


class InventoryModule(BaseInventoryPlugin):

    NAME = 'vanish'

    def verify_file(self, path):
        if not path.endswith('.vanish.yml'):
            return False
        return super(InventoryModule, self).verify_file(path)

    def parse(self, inventory, loader, path, cache=True):
        global runcount
        runcount += 1
        group = 'vanishgroup'
        inventory.add_group(group)
        inventory.add_host('h1', group=group)
        print('runcount is %d' % runcount)
        if runcount <= 1:
            inventory.add_host('disappearing_host', group=group)
        inventory.set_variable(group, 'ansible_connection', 'local')

disappearing_host.yml

- hosts: vanishgroup
  gather_facts: no
  tasks:
  - shell: echo yo
    notify: dude
  - meta: refresh_inventory
  - ping:
  handlers:
  - name: dude
    debug: msg=howdy

disappear.vanish.yml

plugin: vanish

ansible-playbook -i disappear.vanish.yml disappearing_host.yml

reproduces it 100% for me

@nitzmahone

This comment has been minimized.

Copy link
Member

commented May 30, 2019

I suspect there are probably a number of other potential hazards for this, but handler exec is a big one. It seems to work as expected without a handler and using the default callback.

@bcoca

This comment has been minimized.

Copy link
Member Author

commented May 31, 2019

this looks like diff case than the one i was handling: free strat + one of disappearing hosts being slow to return previous result when inventory (run_once) was refreshed.

so we probably have multiple cases that cause errors when a host disappears mid play.

@nitzmahone

This comment has been minimized.

Copy link
Member

commented May 31, 2019

Yeah, I thought about strategy: free too, though I didn't see anything mentioning that in the original issue... Reproducing that case under free should be pretty easy with this- just do a shell: sleep {{ bla }} and have the disappearing host's value of bla be longer than the others.

@nitzmahone nitzmahone closed this May 31, 2019

@nitzmahone nitzmahone reopened this May 31, 2019

@nitzmahone

This comment has been minimized.

Copy link
Member

commented May 31, 2019

I'm also wondering if we should leave "removed" hosts in place and just have an active flag on them that play_iterator et al would just ignore when asking for the next host... Would probably solve a lot of these kinds of problems, since other things like task completion and stats that need to look up hosts would still find them. Otherwise this is probably going to be a long series of race condition whack-a-mole...

@bcoca

This comment has been minimized.

Copy link
Member Author

commented May 31, 2019

discussed a lot in irc with original author so ticket does not have good reflection of full issue, still unsure that is the case since i could not easily reproduce.

as for ignoring the host, that is kind of what this does, sets state to 'nothing to do' and puts in removed_keys() so it is skipped even if iterator has it in preset objects

need to check why handlers still run, but also need to discuss --force-handlers in this case as well as meta: clear_host_errors

@ansibot ansibot added the stale_ci label Jun 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.