Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible not running interpreter discovery on host after run_once and gather_facts disabled #77219

Open
1 task done
ratschance opened this issue Mar 6, 2022 · 7 comments
Open
1 task done
Labels
affects_2.12 bug This issue/PR relates to a bug. module This issue/PR relates to a module. P3 Priority 3 - Approved, No Time Limitation support:core This issue/PR relates to code supported by the Ansible Engineering Team. verified This issue has been verified/reproduced by maintainer

Comments

@ratschance
Copy link

Summary

When I run a playbook with run_once: True and gather_facts: False ansible runs interpreter discovery on only the host that the task runs on. If a later play uses the same group with gather_facts: True, ansible will not re-run interpreter discovery for the other hosts in the group, which can cause failures if the other hosts have different interpreters.

Issue Type

Bug Report

Component Name

ansible.legacy.setup

Ansible Version

$ ansible --version
ansible [core 2.12.3]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/ratschanc/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ratschanc/venvs/ansible/lib/python3.8/site-packages/ansible
  ansible collection location = /home/ratschanc/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
  jinja version = 3.0.3
  libyaml = True

Configuration

$ ansible-config dump --only-changed
DEFAULT_MANAGED_STR(/etc/ansible/ansible.cfg) = WARNING: Managed by Ansible. Any changes will be overwritten

OS / Environment

Host: Ubuntu Focal

Target 1 (Trusty w/ Python2) Dockerfile:

FROM ubuntu:trusty

RUN apt-get update && apt-get install -y openssh-server python

RUN mkdir -p /root/.ssh
COPY id_ed25519.pub /root/.ssh/authorized_keys

RUN service ssh start

EXPOSE 22

CMD ["/usr/sbin/sshd", "-D"]

Target 2 (Focal w/ Python3) Dockerfile:

FROM ubuntu:focal

RUN apt-get update && apt-get install -y openssh-server python3

RUN mkdir -p /root/.ssh
COPY id_ed25519.pub /root/.ssh/authorized_keys

RUN service ssh start

EXPOSE 22

CMD ["/usr/sbin/sshd", "-D"]

Steps to Reproduce

Inventory:

[servers]
172.17.0.3 ansible_user=root # Trusty py2
172.17.0.2 ansible_user=root # Focal py3

Playbook:

---
- name: Run a task on one
  hosts: servers
  run_once: true
  gather_facts: false
  tasks:
    - name: Touch a file on one
      file:
        path: /root/file1
        owner: root
        group: root
        mode: '0644'
        state: touch

- name: Run a task on both
  hosts: servers
  gather_facts: true
  tasks:
    - name: Touch a file on both
      file:
        path: /root/file2
        owner: root
        group: root
        mode: '0644'
        state: touch

Command:

ansible-playbook -i inventory playbook.yml

Expected Results

I expected ansible to run interpreter discovery on the second host when gathering facts, but instead it failed after trying to directly use /usr/bin/python on a host that did not have it.

Actual Results

ansible-playbook [core 2.12.3]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/ratschanc/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ratschanc/venvs/ansible/lib/python3.8/site-packages/ansible
  ansible collection location = /home/ratschanc/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible-playbook
  python version = 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
  jinja version = 3.0.3
  libyaml = True
Using /etc/ansible/ansible.cfg as config file
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.

PLAYBOOK: playbook.yml *********************************************************
2 plays in playbook.yml

PLAY [Run a task on one] *******************************************************
META: ran handlers

TASK [Touch a file on one] *****************************************************
task path: /home/ratschanc/projects/ansible-testing/playbook.yml:7
changed: [172.17.0.3] => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python"}, "changed": true, "dest": "/root/file1", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "size": 0, "state": "file", "uid": 0}
META: ran handlers
META: ran handlers

PLAY [Run a task on both] ******************************************************

TASK [Gathering Facts] *********************************************************
task path: /home/ratschanc/projects/ansible-testing/playbook.yml:15
fatal: [172.17.0.2]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"ansible.legacy.setup": {"failed": true, "module_stderr": "Shared connection to 172.17.0.2 closed.\r\n", "module_stdout": "/bin/sh: 1: /usr/bin/python: not found\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127}}, "msg": "The following modules failed to execute: ansible.legacy.setup\n"}
ok: [172.17.0.3]
META: ran handlers

TASK [Touch a file on both] ****************************************************
task path: /home/ratschanc/projects/ansible-testing/playbook.yml:19
changed: [172.17.0.3] => {"changed": true, "dest": "/root/file2", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "size": 0, "state": "file", "uid": 0}
META: ran handlers
META: ran handlers

PLAY RECAP *********************************************************************
172.17.0.2                 : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
172.17.0.3                 : ok=3    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibot
Copy link
Contributor

ansibot commented Mar 6, 2022

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibot ansibot added affects_2.12 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Mar 6, 2022
@sivel
Copy link
Member

sivel commented Mar 7, 2022

This probably also affects delegate_to + delegate_facts: true. We've already handled the standalone delegate_to scenario.

@mattclay mattclay added needs_verified This issue needs to be verified/reproduced by maintainer P3 Priority 3 - Approved, No Time Limitation and removed needs_triage Needs a first human triage before being processed. labels Mar 10, 2022
@s-hertel
Copy link
Contributor

I verified this, but I'm not entirely sure it's a bug. When run_once is true and a task returns ansible_facts, the facts are copied to all the play hosts that are not unreachable (here: https://github.com/ansible/ansible/blob/stable-2.12/lib/ansible/plugins/strategy/__init__.py#L699-L709). Interpreter discovery does not occur specially for gather_facts. If gather_facts is enabled for the first play it would work because the gather_facts setup task overrides the run_once keyword to False. Running gather_facts doesn't help for the second play, because at that point the interpreter fact has already been set by the run_once file task in the first play (the facts from the server that runs are copied to all hosts in the play). You could make the example work the way you want by doing one of these:

  • targeting only the hosts which should get copied facts for the run_once play
  • adding a - meta: clear_facts task before the 2nd play
  • adding any initial task that connects to the remote (does not need to be gather_facts) in the first play that has run_once: False, just to set the interpreter fact correctly for all hosts.

This doesn't appear to affect delegate_to + delegate_facts: true because the facts are only copied to the delegated hosts (rather than all reachable hosts): https://github.com/ansible/ansible/blob/stable-2.12/lib/ansible/plugins/strategy/__init__.py#L683-L684.

We could maybe make an exception for copying facts that look like interpreter discovery if run_once is set on any task, but that seems like it might be a breaking change for some users? @nitzmahone, what do you think of that?

Click to expand diff
diff --git a/lib/ansible/plugins/strategy/__init__.py b/lib/ansible/plugins/strategy/__init__.py
index 9c9846a997..cc4255440c 100644
--- a/lib/ansible/plugins/strategy/__init__.py
+++ b/lib/ansible/plugins/strategy/__init__.py
@@ -743,7 +743,20 @@ class StrategyBase:
                                     self._variable_manager.set_host_variable(target_host, var_name, var_value)
                         else:
                             cacheable = result_item.pop('_ansible_facts_cacheable', False)
+
+                            # If the original task has been run with run_once, only set interpreter discovery vars on the original host
+                            interpreter_discovery_vars = {}
+                            if original_task.run_once:
+                                for var_name in dict(result_item['ansible_facts']):
+                                    if var_name.startswith('discovered_interpreter_'):
+                                        interpreter_discovery_vars[var_name] = result_item['ansible_facts'].pop(var_name)
+
                             for target_host in host_list:
+                                facts = result_item['ansible_facts'].copy()
+
+                                if target_host == original_host.name:
+                                    facts.update(interpreter_discovery_vars)
+
                                 # so set_fact is a misnomer but 'cacheable = true' was meant to create an 'actual fact'
                                 # to avoid issues with precedence and confusion with set_fact normal operation,
                                 # we set BOTH fact and nonpersistent_facts (aka hostvar)
@@ -751,9 +764,9 @@ class StrategyBase:
                                 # but for playbook setting it the 'higher' precedence is kept
                                 is_set_fact = original_task.action in C._ACTION_SET_FACT
                                 if not is_set_fact or cacheable:
-                                    self._variable_manager.set_host_facts(target_host, result_item['ansible_facts'].copy())
+                                    self._variable_manager.set_host_facts(target_host, facts)
                                 if is_set_fact:
-                                    self._variable_manager.set_nonpersistent_facts(target_host, result_item['ansible_facts'].copy())
+                                    self._variable_manager.set_nonpersistent_facts(target_host, facts)
 
                     if 'ansible_stats' in result_item and 'data' in result_item['ansible_stats'] and result_item['ansible_stats']['data']:

@s-hertel s-hertel added verified This issue has been verified/reproduced by maintainer and removed needs_verified This issue needs to be verified/reproduced by maintainer labels Mar 15, 2022
@bcoca
Copy link
Member

bcoca commented Mar 16, 2022

I'm partial to look at this as a bug, since interpreter discovery is something we 'hijack' into facts, we should avoid polluting other hosts when run_once is used

@bcoca
Copy link
Member

bcoca commented Mar 16, 2022

@s-hertel since this is something i can see us using (abusing) more in the future, we might want to start with a constant for 'non run once facts' to exclude from updating all hosts.

@nitzmahone
Copy link
Member

we might want to start with a constant for 'non run once facts' to exclude from updating all hosts

Yeah, that seems like the right thing to do here- we should absolutely not be propagating a clearly host-specific fact like that in the run_once case... Makes me wonder if we've got it backwards there, like defaulting to any ansible_facts output should only hit the host it ran on, then have $mechanism_tbd to cause the propagation for things that need it, but that's another horse that left the barn a long time ago.

@bcoca
Copy link
Member

bcoca commented Mar 16, 2022

Well, initially facts were just facts, so it made sense ... till they became nonfacts ... set_facts/include_vars ... and then 'discovery' ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.12 bug This issue/PR relates to a bug. module This issue/PR relates to a module. P3 Priority 3 - Approved, No Time Limitation support:core This issue/PR relates to code supported by the Ansible Engineering Team. verified This issue has been verified/reproduced by maintainer
Projects
None yet
Development

No branches or pull requests

7 participants