-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
race condition with fact cache and smart fact gathering causes undefined variable errors. #14456
Comments
I figured out what's happening here, but not how to fix it. There is a flaw in the smart fact gathering design. At the start of a multi-play playbook run such as the typical site.yml, if previously cached facts have not expired, they will not be regathered. Then, if they expire later during the playbook run, no attempt is made to refresh them, and the undefined variable error occurs. From the documentation "The value ‘smart’ means each new host that has no facts discovered will be scanned, but if the same host is addressed in multiple plays it will not be contacted again in the playbook run." It wouldn't really matter what timeout value is used for fact_caching_timeout, the potential for this condition still exists. Evidence that this happened. Running 2.0.1.0 now, turned fact caching back on to test, issue reappears. One of the hosts that just failed, with an undefined ansible_system fact
Looking in the cached file directory, the failed server hadn't been updated in the most recent execution, which launched at 12:00, because the file was not quite old enough yet, it was 2 hours 46 minutes old, our timeout is set to 3 hours
But, the site run failed at 12:33 because the fact file was now over 3 hours (10,800 seconds) old. The only behavior I can't explain, is why we never saw this problem in 1.9.4 but see it in frequently starting with 2.0 |
Changed issue title to more accurately reflect the issue, now that it's been identified. |
fixes #14456, now it won't expire keys in middle of a play when they were 'valid' at 'gather time'.
Note: We are using file (json) caching. For anyone having this problem, until the commit makes the 2.1 release, a suggested workaround is to switch from smart to explicit, and explicitly call setup once at the start of your playbook. I'm not entirely clear how redis or other caching systems are working, but I believe they will see the same issue, and the fix committed above only covers json caching behavior. |
Generating my /etc/hosts file in ansible2 fails. But only when playbook runs for a few minutes and then generates the /etc/hosts file. When running task direct with corresponding tag the /etc/hosts is generated without any issue. fatal: [host]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'ansible_ssh_host'"} ansible (2.1.0.0) |
fixes ansible#14456, now it won't expire keys in middle of a play when they were 'valid' at 'gather time'.
Issue Type: Bug Report
Ansible Version: 2.0.0.2
Ansible Configuration:
No changes, fact caching worked in 1.9.4 but is experiencing problems in 2.0.0.2
Fact caching settings in ansible.cfg
fact_caching = jsonfile
fact_caching_connection = ~/.ansiblecachedir
fact_caching_timeout = 10800
gathering = smart
Environment:
Control server is RedHat 6, target hosts are a collection of rhel5/6 and win2008/2012
Summary:
Fact caching worked in 1.9.4, but is having issues in 2.0.0.2.
Site runs fail at random places and random hosts with undefined variables that are really facts, and that were defined and used earlier during the site run.
Steps To Reproduce:
During a long site.yml run, ours takes almost two hours, we gather facts at the top and cache them, and the cache timeout is set to three hours.
Every two hour site run fails at random places and random hosts with undefined variables that are really facts, and that were defined and used without error earlier during the site run.
Disable fact caching and the errors go away.
Limit the site run to a subset of hosts or tags, which makes runs much faster, and the problem doesn't occur
Expected Results:
Fact variables are not forgotten
Actual Results:
Hosts error with undefined variables that are facts during longer runs, that were defined earlier in the run, and that shouldn't have expired.
The text was updated successfully, but these errors were encountered: