Traceback in random location near SSH timeout with many target hosts #77325
Labels
affects_2.11
bug
This issue/PR relates to a bug.
needs_verified
This issue needs to be verified/reproduced by maintainer
P3
Priority 3 - Approved, No Time Limitation
support:core
This issue/PR relates to code supported by the Ansible Engineering Team.
traceback
This issue/PR includes a traceback.
Summary
Traceback is shown in a seemingly random place but seems to be often immediately or shortly after some hosts were
unreachable
due to SSH timeout. Can be duplicated consistently with about 20 managed machines with asite.yml
that works totally reliably with a small amount of hosts.Checking available RAM in a 1 second resolution shows minimum of 600 MB available RAM. So unless
ansible-playbook
uses all that in less than 1 second there should be enough RAM. No log messages anywhere about oom-killer or other memory issues.In some runs the crash happens after just a few tasks: only fact gathering + simple checks with
fail
andassert
+ debug msg tasks were executed before the crash. That lead me to the test case below that is simplified version of the beginning ofsite.yml
.The below test case causes crash sometimes but not always. Minimum free RAM during this run is about 1 GB.
It seems important that some hosts get SSH unreachable (in test case 3 timeouts, 1 invalid host key).
Issue Type
Bug Report
Component Name
ansible
Ansible Version
Configuration
OS / Environment
EL7
strategy=free
fact caching enabled
netbox inventory plugin
Steps to Reproduce
Include extra variable
xxx
with value > 100,ansible-playbook -e 'xxx=900' [...]
Expected Results
Expect the playbook run the same with 5 or 20 managed hosts and whether some of the hosts couldn't be reached due to SSH timeout.
Actual Results
Code of Conduct
The text was updated successfully, but these errors were encountered: