You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that AWX is open source software provided for free and that I might not receive a timely response.
I am NOT reporting a (potential) security vulnerability. (These should be emailed to security@ansible.com instead.)
Bug Summary
after a (unspecified) period of DB outage wsrelay does correctly re-establish connection result in websocket stop working
AWX version
24.0.0
Select the relevant components
UI
UI (tech preview)
API
Docs
Collection
CLI
Other
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
No response
Web browser
No response
Steps to reproduce
deploy awx on kube
scale down awx-operator
scale down postgres statefulset for....60 second
scale backup postgres statefulset
Expected results
websocket (like job log live update) works in the UI
Actual results
websocket stop working
Additional information
future: <Task finished name='Task-9' coro=<WebSocketRelayManager.on_ws_heartbeat() done, defined at /var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/wsrelay.py:221> exception=OperationalError('consuming input failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.')>
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/wsrelay.py", line 223, in on_ws_heartbeat
async for notif in conn.notifies():
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/psycopg/connection_async.py", line 315, in notifies
raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.```
we see that the recent change to add tcp keepalive does result in pg listen connection to correctly terminate after database down for > 25 second (yay)
after some digging with @jbradberry we found that the task created by `event_loop.create_task(self.on_ws_heartbeat(async_conn))` terminated but the main loop outside is still running (since it doesn't access the database)
we determine that we need to re-establish the db connection and restart the `on_ws_heartbeat` task after db connection has been lost
the end goal wsrelay process should not terminate even when there's db connection problem it should continue to retry to establish connection forever
The text was updated successfully, but these errors were encountered:
Please confirm the following
security@ansible.com
instead.)Bug Summary
after a (unspecified) period of DB outage wsrelay does correctly re-establish connection result in websocket stop working
AWX version
24.0.0
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
No response
Web browser
No response
Steps to reproduce
Expected results
websocket (like job log live update) works in the UI
Actual results
websocket stop working
Additional information
The text was updated successfully, but these errors were encountered: