-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
awx.conf.settings Database settings are not available #12683
Comments
@klauserber how long does your system stay up before getting into this bad state? Do you see evidence that awx is connecting to db at all during this time? |
@fosterseth We have seen this several times after the System was up for a few days. We have only test workloads on the system at the Moment, about 50-100 Jobs per day. |
same happening here:
AWX version 21.2.0 our workload is 5,000-15,000 jobs per day and it happens to us once after something like 2 months |
I observe the same behavior with AWX
|
Same here, AWX 21.7.0, Postgres 14.4 with Zalando Operator. And I also cannot reproduce it. Simply killing the leader in the postgres cluster sometimes produces the error message ( |
Hi @klauserber, this trace can be part of "normal operation" when connections expire. We have done some work around these error messages in later versions of AWX (from 21.4.0). As you stated, this issue is not reproducible in 21.7.0. We will close this issue for now, but please feel free to reopen if need be. |
Still observe the same problem with AWX 21.10.2. Hopefully #13505 and the corresponding PR would appear a solution and get some traction |
Please confirm the following
Bug Summary
Hello,
there is another brick in the way to go online with our awx sytem. I hope somebody can help us.
Sometimes the whole system goes in a state where no jobs can be started. Every new job is in the state 'pending' and nothing happens, no new automation-job pods are started.
A closer logs of the xxx-task container shows the following:
A restart of the awx main pod brings the system back in a working state.
We use a separate deployed database (with the Zalando Postgres operator). The DB shows no errors and have enough resources.
AWX version
21.4.0
Select the relevant components
Installation method
kubernetes
Modifications
yes
Ansible version
21.10.11
Operating system
Kubernetes 23.6 on Ubuntu 20.04
Web browser
Chrome
Steps to reproduce
This is a sporadic problem, we don't know how to reproduce it.
Expected results
No working system without down times.
Actual results
The system is not working after a while and must be restarted.
Additional information
wie have an extended execution environment images with some additional binary dependency (terraform, kubectl, helm ...), built like the original awx-ee.
The text was updated successfully, but these errors were encountered: