-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Playbooks running longer than 4 hours are terminated unexpectedly #11594
Comments
@FedorRub Have you seen #11451 ? #10366 (comment) worked for us as a workaround. |
Hi @aussielunix thanks for the comment. As suggested in the threads, I have already increased kubelet log max size, however, it does not appear to have an effect in my case. For the test job the log file is about 165K and is not rotated. From what I have read my issue is similar to a possible issue with idle_timeout for ansible runner ansible/awx-ee#80 |
You can now configure the idle timeout via the settings page: #10906 |
We have the same issue, disabling log rotation within kubelet or timeout configuration did not help. |
We also have the same issue. Modifying idle timeout did not taken effect and job containers are being killed exactly after 4h. |
@spireob @skbki Confirm. Pod dying after 4 hour. Log rotation is disabled.
Please re-open issue. |
AWX newest 19.5.1 fresh very basic install, nothing custom with awx "sleep" task: same 4 hour issue
|
We have also deployed new fresh installation and problem still occurs. Please re-open the issue. |
Checked also with "Default Job Idle Timeout 18000 seconds" = 5 hours in GUI settings -> Jobs. |
Anything moved with this issue? Is there any possibility to reopen this one? |
fixed in ansible/receptor#683 |
Please confirm the following
Summary
Playbooks running longer than 4 hours are terminated unexpectedly. The Jobs finish with error state in GUI.This is relevant for us as we have some long-running playbooks (windows server patching, backups etc)
Please help me understand if there is some timeout inside awx that terminates the container and if this limit can be adjusted.
There is a similar issue reported in awx operator repo ansible/awx-operator#622
AWX version
19.5.0
Installation method
kubernetes
Modifications
yes
Ansible version
core 2.11.7.post0
Operating system
centos (awx-ee)
Web browser
Chrome
Steps to reproduce
The issue can be reproduced by running the following playbook
Expected results
Playbook completes successfully
Actual results
Container running the job is terminated after running for 4 hours
Additional information
automation jobs container exited with the following error
awx-task container logs
job std out
We use awx-ee:0.6.0 with slight additions (galaxy collection, python packages etc)
job details
The text was updated successfully, but these errors were encountered: