Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent variable dereferences depending on number of forks #4239

Closed
fgtham opened this issue Sep 25, 2013 · 13 comments
Closed

Inconsistent variable dereferences depending on number of forks #4239

fgtham opened this issue Sep 25, 2013 · 13 comments
Assignees
Labels
bug This issue/PR relates to a bug. P1 Priority 1 - Immediate Attention Required; Release Immediately After Fixed

Comments

@fgtham
Copy link

fgtham commented Sep 25, 2013

I have recently upgraded to ansible 1.3.2 from the release/1.3.2 branch. This upgrade broke my playbooks.

git bisect identified commit 576962d as the first bad commit. This commit changes the number of forks in relation to the number of hosts. As it turns out, this commit did not introduce the bug I encountered, but merely unmasked it.

Unfortunately, the most reduced example I managed to create is not really minimal. At the end of the comment there's a script to create the files I mention here, so that you can easily reproduce the problem. Note that you will need to set hash_behaviour = merge In your ansible.cfg:

# hosts:
[vm_servers]
127.0.0.2
[vm_guests]
127.0.0.3
127.0.0.4

# host_vars/127.0.0.3:
name: srv
vm:
  host_system: 127.0.0.2

# host_vars/127.0.0.4:
name: srx
vm:
  host_system: 127.0.0.2

# site.yml:
- include: role.vm.yml

# role.vm.yml:
- hosts: vm_guests
  vars_files:
  - "roles/vm/vars/{{ name }}.yml"
  roles:
  - vm

# roles/vm/vars/srv.yml
vm:
  foo: bar

# roles/vm/vars/srx.yml
vm:
  foo: baz

# roles/vm/tasks/main.yml
- name: delegation pass
  action: shell hostname -f
  delegate_to: "{{ vm.host_system }}"

- name: template upload
  action: template src=test.j2 dest=/tmp/test.txt

- name: delegation fail
  action: shell hostname -f
  delegate_to: "{{ vm.host_system }}"

# roles/vm/templates/test.j2
{%- for host in groups['vm_guests'] -%}
  {{ hostvars[host]['name'] }}
{%- endfor %}

With ansible 1.3.2, the playbook works like expected when called with:

$ ansible-playbook site.yml -i hosts

If I limit the number of forks to exactly 1, execution aborts with an error:

$ ansible-playbook site.yml -i hosts -vvvv --forks 1
...
TASK: [delegation pass]
*******************************************************
<127.0.0.2> ESTABLISH CONNECTION FOR USER: root
...
TASK: [template upload]
*******************************************************
<127.0.0.3> ESTABLISH CONNECTION FOR USER: root
...
TASK: [delegation fail]
*******************************************************
<{{vm.host_system}}> ESTABLISH CONNECTION FOR USER: root

As can be seen in the last line, ansible cannot dereference vm.host_system any longer.

The same behaviour occurs with the following invocation, as it implicitly sets the number of forks to 1:

$ ansible-playbook -i hosts --limit 127.0.0.3

With versions prior to commit 576962d, these invocations result in the playbook being executed successfully:

$ ansible-playbook -i hosts
$ ansible-playbook -i hosts --limit 127.0.0.3

However, these invocations abort with errors, as vm.host_system cannot be dereferenced any longer:

$ ansible-playbook -i hosts --forks 1
$ ansible-playbook -i hosts --limit 127.0.0.3 --forks 1

In other words: if the number of forks is greater than 1, variable dereference works, whereas it does not work if the number of forks is equal to one. As commit 576962d caps the number of forks if it it greater than the number of hosts, it unmasked the problem.

I would expect identical behaviour no matter how many processes are being forked.

In the file role.vm.yml, if I change the filename of the vars_file to be included to something fixed like vars_files: [ roles/vm/vars/srv.yml ], the playbook will execute successfully every single time.

If I remove the template task, the error is gone, too.

My python-fu is not strong enough to figure out what's going on.

PS: Here's the script I mentioned to create the files in order to reproduce the issue:

#! /bin/bash
mkdir -p ansible_test/{host_vars,roles/vm/{tasks,templates,vars}}
cd ansible_test

cat << EOF >hosts
[vm_servers]
127.0.0.2
[vm_guests]
127.0.0.3
127.0.0.4
EOF

cat << EOF >host_vars/127.0.0.3
name: srv
vm:
  host_system: 127.0.0.2
EOF

cat << EOF >host_vars/127.0.0.4
name: srx
vm:
  host_system: 127.0.0.2
EOF

cat << EOF >site.yml
- include: role.vm.yml
EOF

cat << EOF >role.vm.yml
- hosts: vm_guests
  vars_files:
  - "roles/vm/vars/{{ name }}.yml"
  roles:
  - vm
EOF

cat << EOF >roles/vm/vars/srv.yml
vm:
  foo: bar
EOF

cat << EOF >roles/vm/vars/srx.yml
vm:
  foo: baz
EOF

cat << EOF >roles/vm/tasks/main.yml
- name: delegation pass
  action: shell hostname -f
  delegate_to: "{{ vm.host_system }}"

- name: template upload
  action: template src=test.j2 dest=/tmp/test.txt

- name: delegation fail
  action: shell hostname -f
  delegate_to: "{{ vm.host_system }}"
EOF

cat << EOF >roles/vm/templates/test.j2
{%- for host in groups['vm_guests'] -%}
  {{ hostvars[host]['name'] }}
{%- endfor %}
EOF
@mpdehaan
Copy link
Contributor

As discussed on the mailing list, you should not mix host variables in your role directories as Ansible will try to load them from other paths.

However I've noticed that we didn't implement the feature I was thinking of yet, so my original explanation seems to be wrong: #2958

This is probably the case of something not doing a copy of a value somewhere.

@fgtham
Copy link
Author

fgtham commented Sep 25, 2013

I just moved the variable files out of the role path and updated role.vm.yml accordingly, but this doesn't change the results. Variable dereference still only works with a number of forks greater than 1.

@fgtham
Copy link
Author

fgtham commented Sep 26, 2013

Ok, so I reverted commit 576962d in order to vary the number of forks. Additionally, I added some debug prints to _executor_internal() to dump the current values of all variables used to construct the inject dictionary. self.default_vars was always empty, and self.module_vars was identical throughout the runs, so I removed them from output again.

I ran the test with --forks 2 (pass) and --forks 1 (fail):

$ ansible-playbook site.yml -i hosts --limit 127.0.0.3 -vvvv --forks 2
TASK: [delegation pass] ******************************************************* 
self.default_vars: {}
host_variables: {
                "group_names": ["vm_guests"],
                "inventory_hostname": "127.0.0.3",
                "inventory_hostname_short": "127",
                "name": "srv", 
                "vm": {"host_system": "127.0.0.2"}
                }
self.setup_cache[host]: {"vm": {"foo": "bar"}}
<127.0.0.2> ESTABLISH CONNECTION FOR USER: root
...
TASK: [delegation fail] ******************************************************* 
self.default_vars: {}
host_variables: {
                "group_names": ["vm_guests"],
                "inventory_hostname": "127.0.0.3",
                "inventory_hostname_short": "127",
                "name": "srv",
                "vm": {"host_system": "127.0.0.2"}
                }
self.setup_cache[host]: {
                "vm": {"foo": "bar"}
                }
<127.0.0.2> ESTABLISH CONNECTION FOR USER: root
$ ansible-playbook site.yml -i hosts --limit 127.0.0.3 -vvvv --forks 1
...
TASK: [delegation pass] ******************************************************* 
self.default_vars: {}
host_variables: {
                "group_names": ["vm_guests"],
                "inventory_hostname": "127.0.0.3",
                "inventory_hostname_short": "127",
                "name": "srv", 
                "vm": {"host_system": "127.0.0.2"}
                }
self.setup_cache[host]: {
                "vm": {"foo": "bar"}
                }
<127.0.0.2> ESTABLISH CONNECTION FOR USER: root
...
TASK: [delegation fail] ******************************************************* 
self.default_vars: {}
host_variables: {
                "group_names": ["vm_guests"],
                "inventory_hostname": "127.0.0.3",
                "inventory_hostname_short": "127",
                "name": "srv",
                "vm": {"foo": "bar"}
                }
self.setup_cache[host]: {
                "vm": {"foo": "bar"}
                }
<{{vm.host_system}}> ESTABLISH CONNECTION FOR USER: root

As can be seen in the last host_variables dump, the contents of the vm dict are messed up. It looks like the dict has been overwritten.

@ghost ghost assigned jctanner Oct 15, 2013
@jctanner
Copy link
Contributor

@fgtham I can not reproduce this with 1.3.2 or latest devel, which might be me overlooking something or not understanding your description. If you could attach/comment an updated reproducer script, i'll take another stab at it.

@fgtham
Copy link
Author

fgtham commented Oct 23, 2013

I just upgraded to v1.3.3. The issue remains. Here's an updated test case. Run it, change into ansible_test/ and execute ./test_runner.sh. This script executes the provided playbook two times:

  • first run: ansible-playbook site.yml -i hosts
  • second run: ansible-playbook site.yml -i hosts --forks 1

The first run passes, the second run fails.

#! /bin/bash
mkdir -p ansible_test/{host_vars,vars,roles/vm/{tasks,templates}}
cd ansible_test

cat << EOF >hosts
[vm_servers]
127.0.0.2
[vm_guests]
127.0.0.3
127.0.0.4
EOF

cat << EOF >ansible.cfg
[defaults]
remote_user = root
hash_behaviour = merge
EOF

cat << EOF >host_vars/127.0.0.3
name: srv
vm:
  host_system: 127.0.0.2
EOF

cat << EOF >host_vars/127.0.0.4
name: srx
vm:
  host_system: 127.0.0.2
EOF

cat << EOF >site.yml
- include: role.vm.yml
EOF

cat << EOF >role.vm.yml
- hosts: vm_guests
  gather_facts: False
  vars_files:
  - "vars/{{ name }}.yml"
  roles:
  - vm
EOF

cat << EOF >vars/srv.yml
vm:
  foo: bar
EOF

cat << EOF >vars/srx.yml
vm:
  foo: baz
EOF

cat << EOF >roles/vm/tasks/main.yml
- name: delegation pass
  action: shell hostname -f
  delegate_to: "{{ vm.host_system }}"

- name: template upload
  action: template src=test.j2 dest=/tmp/test.txt

- name: delegation fail
  action: shell hostname -f
  delegate_to: "{{ vm.host_system }}"
EOF

cat << EOF >roles/vm/templates/test.j2
{%- for host in groups['vm_guests'] -%}
  {{ hostvars[host]['name'] }}
{%- endfor %}
EOF

cat << EOF >test_runner.sh
#! /bin/bash
prompt='Press <return> to continue '

cmd='ansible-playbook site.yml -i hosts -vvvv'
echo "*** This test will pass."
echo "*** command: '\$cmd'"
read -p "\$prompt"
\$cmd

cmd="\$cmd --forks 1"
echo -e "\n\n*** This test will fail. It runs the same command as above, except for an additional '--forks 1'."
echo "*** command: '\$cmd'"
read -p "\$prompt"
\$cmd
EOF

chmod 0755 test_runner.sh

cat << EOH
Make sure you have password-less access to root@127.0.0.3 and root@127.0.0.4.
Then change into ansible_test and execute test_runner.sh.
EOH

@jctanner
Copy link
Contributor

@fgtham Is this the behavior you expected to see? ...

jtanner@u1304:~/issues/4239-variable-derefernce-forks/code2/ansible_test$ ansible-playbook -i hosts site.yml 
PLAY [vm_guests] ************************************************************** 
127.0.0.3: importing /home/jtanner/issues/4239-variable-derefernce-forks/code2/ansible_test/vars/srv.yml
127.0.0.4: importing /home/jtanner/issues/4239-variable-derefernce-forks/code2/ansible_test/vars/srx.yml
TASK: [vm | delegation pass] ************************************************** 
changed: [127.0.0.4]
changed: [127.0.0.3]
TASK: [vm | template upload] ************************************************** 
ok: [127.0.0.3]
ok: [127.0.0.4]
TASK: [vm | delegation fail] ************************************************** 
changed: [127.0.0.4]
changed: [127.0.0.3]
PLAY RECAP ******************************************************************** 
127.0.0.3                  : ok=3    changed=2    unreachable=0    failed=0   
127.0.0.4                  : ok=3    changed=2    unreachable=0    failed=0   
jtanner@u1304:~/issues/4239-variable-derefernce-forks/code2/ansible_test$ ansible-playbook -i hosts --forks 1 site.yml 
PLAY [vm_guests] ************************************************************** 
127.0.0.3: importing /home/jtanner/issues/4239-variable-derefernce-forks/code2/ansible_test/vars/srv.yml
127.0.0.4: importing /home/jtanner/issues/4239-variable-derefernce-forks/code2/ansible_test/vars/srx.yml
TASK: [vm | delegation pass] ************************************************** 
changed: [127.0.0.3]
changed: [127.0.0.4]
TASK: [vm | template upload] ************************************************** 
ok: [127.0.0.3]
ok: [127.0.0.4]
TASK: [vm | delegation fail] ************************************************** 
fatal: [127.0.0.3] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
fatal: [127.0.0.4] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
FATAL: all hosts have already failed -- aborting
PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/home/jtanner/site.retry
127.0.0.3                  : ok=2    changed=1    unreachable=1    failed=0   
127.0.0.4                  : ok=2    changed=1    unreachable=1    failed=0 

@fgtham
Copy link
Author

fgtham commented Oct 23, 2013

Yes, this is the bug I encountered. The playbook's result only depends on the number of parallel ansible processes, which doesn't make sense.

If you increase logging with -vvvv, you may see that in the last task (the one that fails), ansible cannot dereference vm.host_system any longer, and tries to establish an ssh connection with a host named '{{vm.host_system}}' instead.

@jctanner
Copy link
Contributor

I was able to see more of what is going on via the debug module ...

PLAY [vm_guests] ************************************************************** 
127.0.0.3: importing /home/jtanner/issues/4239-variable-derefernce-forks/code2/ansible_test/vars/srv.yml
127.0.0.4: importing /home/jtanner/issues/4239-variable-derefernce-forks/code2/ansible_test/vars/srx.yml
TASK: [vm | debug var=vm] ***************************************************** 
ok: [127.0.0.3] => {
    "item": "", 
    "vm": {
        "foo": "bar", 
        "host_system": "127.0.0.2"
    }
}
ok: [127.0.0.4] => {
    "item": "", 
    "vm": {
        "foo": "baz", 
        "host_system": "127.0.0.2"
    }
}
TASK: [vm | delegation pass] ************************************************** 
changed: [127.0.0.3]
changed: [127.0.0.4]
TASK: [vm | template upload] ************************************************** 
ok: [127.0.0.3]
ok: [127.0.0.4]
TASK: [vm | debug var=vm] ***************************************************** 
ok: [127.0.0.3] => {
    "item": "", 
    "vm": {
        "foo": "bar"
    }
}
ok: [127.0.0.4] => {
    "item": "", 
    "vm": {
        "foo": "baz"
    }
}
TASK: [vm | delegation fail] ************************************************** 
fatal: [127.0.0.3] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
fatal: [127.0.0.4] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
FATAL: all hosts have already failed -- aborting

So after the template module call, "vm" looses it's "host_system" key. If I comment out the template task in main.yml, the error disappears.

I'll keep digging.

@jctanner
Copy link
Contributor

The first problem I found is that this syntax won't work ...

  vars_files:
  - "vars/{{ name }}.yml"

The templating code doesn't have vars populated when it goes to replace the line ...

vm | template src=test.j2 dest=/tmp/{{vm.host_system}}-test.txt :::: ERROR 'vm' is undefined
{'_original_file': '/home/jtanner/issues/4239-variable-derefernce-forks/code-fix/ansible_test/roles/vm/tasks/main.yml',
 'always_run': False,
 'changed_when': None,
 'delegate_to': None,
 'failed_when': None,
 'ignore_errors': False,
 'inventory_dir': '/home/jtanner/issues/4239-variable-derefernce-forks/code-fix/ansible_test',
 'inventory_file': 'hosts',
 'item': '',
 'register': None}

#4616 Details this as a feature request.

I know the playbook layout is a reduced example to show your usecase, but what is the intent for trying to load vars_files that way? Can you elaborate a bit more?

@fgtham
Copy link
Author

fgtham commented Oct 23, 2013

I use ansible to create and manage a number of virtual machines on multiple servers.

The configuration parameters that are identical for most of the VMs have been placed under vars/vm_default.yml. If these settings are not appropriate for a specific VM, an alternate config can be provided in a file vars/vm_{{ hostname }}.yml. Basically it's the conditional imports example from the documentation.

Here's the VM guest role definition:

- hosts: vm_guests
  gather_facts: no
  vars_files:
  - [ "vars/vm_{{ hostname }}.yml", "vars/vm_default.yml" ]
  roles:
  - vm_guest

The hostname variable is set in another config file under inventory/host_vars/.

Edit: what I don't get is: if you say that "the templating code doesn't have vars populated when it goes to replace the line ...", why does it appear to work with a number of forks greater than one?

@jctanner
Copy link
Contributor

#4239 (comment) is resolved.

@jctanner
Copy link
Contributor

Preliminary patch:

diff --git a/lib/ansible/playbook/play.py b/lib/ansible/playbook/play.py
index fc5b499..9f4f2a7 100644
--- a/lib/ansible/playbook/play.py
+++ b/lib/ansible/playbook/play.py
@@ -756,6 +756,7 @@ class Play(object):
                             if self._has_vars_in(filename2) and not self._has_vars_in(filename3):
                                 # this filename has variables in it that were fact specific
                                 # so it needs to be loaded into the per host SETUP_CACHE
+                                data = utils.combine_vars(inject, data)
                                 self.playbook.SETUP_CACHE[host].update(data)
                                 self.playbook.callbacks.on_import_for_host(host, filename4)
                         elif not self._has_vars_in(filename4):
@@ -788,6 +789,7 @@ class Play(object):
                     if host is not None and self._has_vars_in(filename2) and not self._has_vars_in(filename3):
                         # running a host specific pass and has host specific variables
                         # load into setup cache
+                        new_vars = utils.combine_vars(inject, new_vars)
                         self.playbook.SETUP_CACHE[host] = utils.combine_vars(
                             self.playbook.SETUP_CACHE[host], new_vars)
                         self.playbook.callbacks.on_import_for_host(host, filename4)

@mpdehaan mpdehaan added P2 and removed priority labels Mar 19, 2014
@jctanner
Copy link
Contributor

@fgtham I know it took a long time, but I think this is finally fixed. If you are still able, try out your test playbook with devel. It's working for me when running your test scripts without any modification.

@ansibot ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 6, 2018
@ansible ansible locked and limited conversation to collaborators Apr 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug. P1 Priority 1 - Immediate Attention Required; Release Immediately After Fixed
Projects
None yet
Development

No branches or pull requests

4 participants