awx.conf.settings Database settings are not available #12683

klauserber · 2022-08-18T10:31:27Z

Please confirm the following

I agree to follow this project's code of conduct.
I have checked the current issues for duplicates.
I understand that AWX is open source software provided for free and that I might not receive a timely response.

Bug Summary

Hello,

there is another brick in the way to go online with our awx sytem. I hope somebody can help us.

Sometimes the whole system goes in a state where no jobs can be started. Every new job is in the state 'pending' and nothing happens, no new automation-job pods are started.

A closer logs of the xxx-task container shows the following:

2022-08-05 07:43:51,408 ERROR    [-] awx.conf.settings Database settings are not available, using defaults.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
psycopg2.InterfaceError: connection already closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 80, in _ctit_db_wrapper
    yield
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 408, in _get_local_with_cache
    return self._get_local(name)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 327, in _get_local
    self._preload_cache()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 296, in _preload_cache
    for setting in Setting.objects.filter(key__in=settings_to_cache.keys(), user__isnull=True).order_by('pk'):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 280, in __iter__
    self._fetch_all()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
    cursor = self.connection.cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 259, in cursor
    return self._cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

A restart of the awx main pod brings the system back in a working state.

We use a separate deployed database (with the Zalando Postgres operator). The DB shows no errors and have enough resources.

AWX version

21.4.0

Select the relevant components

Installation method

kubernetes

Modifications

yes

Ansible version

21.10.11

Operating system

Kubernetes 23.6 on Ubuntu 20.04

Web browser

Chrome

Steps to reproduce

This is a sporadic problem, we don't know how to reproduce it.

Expected results

No working system without down times.

Actual results

The system is not working after a while and must be restarted.

Additional information

wie have an extended execution environment images with some additional binary dependency (terraform, kubectl, helm ...), built like the original awx-ee.

The text was updated successfully, but these errors were encountered:

fosterseth · 2022-08-24T17:47:15Z

@klauserber how long does your system stay up before getting into this bad state? Do you see evidence that awx is connecting to db at all during this time?

klauserber · 2022-08-29T05:26:16Z

@fosterseth We have seen this several times after the System was up for a few days. We have only test workloads on the system at the Moment, about 50-100 Jobs per day.

erz4 · 2022-09-18T06:32:41Z

same happening here:

2022-09-17 16:48:49,680 DEBUG    [-] awx.main.commands.run_callback_receiver 25 is alive
2022-09-17 16:48:49,680 DEBUG    [-] awx.main.commands.run_callback_receiver 25 is alive
2022-09-17 16:48:49,682 ERROR    [-] awx.conf.settings Database settings are not available, using defaults.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
psycopg2.InterfaceError: connection already closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 80, in _ctit_db_wrapper
    yield
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 408, in _get_local_with_cache
    return self._get_local(name)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 327, in _get_local
    self._preload_cache()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 296, in _preload_cache
    for setting in Setting.objects.filter(key__in=settings_to_cache.keys(), user__isnull=True).order_by('pk'):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 280, in __iter__
    self._fetch_all()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
    cursor = self.connection.cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 259, in cursor
    return self._cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

AWX version 21.2.0
k8s deployment via the awx-operator
external DB of type AWS RDS with postgres (12.X)

our workload is 5,000-15,000 jobs per day and it happens to us once after something like 2 months

stanislav-zaprudskiy · 2022-11-24T14:47:02Z

I observe the same behavior with AWX 21.7.0 using external PostgreSQL. In my case it happens when there is an interruption in PostgreSQL availability. But not every interruption results into such failure. The PostgreSQL is operated by https://github.com/zalando/postgres-operator, and simple patronictl failover is handled by AWX well, while a more complicated rolling restart of all DB servers during deployment (or during managed Kubernetes nodes restarts) ends up with AWX being stuck in awx.conf.settings Database settings are not available, using defaults. error: connection already closed state. The problem appeared after upgrades

PostgreSQL 14.0 -> 14.4 (spilo-14:2.1-p3 -> spilo-14:2.1-p6), postgres-operator v1.7.1 -> v1.8.2
AWX v21.0.0 -> v21.7.0

klauserber · 2023-01-04T16:22:34Z

Same here, AWX 21.7.0, Postgres 14.4 with Zalando Operator. And I also cannot reproduce it. Simply killing the leader in the postgres cluster sometimes produces the error message (awx.conf.settings Database settings are not available ...), but the system comes back to normal functionality.

akus062381 · 2023-01-17T15:25:00Z

Hi @klauserber, this trace can be part of "normal operation" when connections expire. We have done some work around these error messages in later versions of AWX (from 21.4.0). As you stated, this issue is not reproducible in 21.7.0. We will close this issue for now, but please feel free to reopen if need be.

stanislav-zaprudskiy · 2023-02-22T16:46:30Z

Still observe the same problem with AWX 21.10.2. Hopefully #13505 and the corresponding PR would appear a solution and get some traction

github-actions bot added needs_triage type:bug labels Aug 18, 2022

akus062381 closed this as completed Jan 17, 2023

tanganellilore mentioned this issue Feb 1, 2023

PostgreSQL reconnection in case of DB failover or idle or unstable connection #13505

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awx.conf.settings Database settings are not available #12683

awx.conf.settings Database settings are not available #12683

klauserber commented Aug 18, 2022

fosterseth commented Aug 24, 2022

klauserber commented Aug 29, 2022

erz4 commented Sep 18, 2022 •

edited

Loading

stanislav-zaprudskiy commented Nov 24, 2022

klauserber commented Jan 4, 2023 •

edited

Loading

akus062381 commented Jan 17, 2023

stanislav-zaprudskiy commented Feb 22, 2023

awx.conf.settings Database settings are not available #12683

awx.conf.settings Database settings are not available #12683

Comments

klauserber commented Aug 18, 2022

Please confirm the following

Bug Summary

AWX version

Select the relevant components

Installation method

Modifications

Ansible version

Operating system

Web browser

Steps to reproduce

Expected results

Actual results

Additional information

fosterseth commented Aug 24, 2022

klauserber commented Aug 29, 2022

erz4 commented Sep 18, 2022 • edited Loading

stanislav-zaprudskiy commented Nov 24, 2022

klauserber commented Jan 4, 2023 • edited Loading

akus062381 commented Jan 17, 2023

stanislav-zaprudskiy commented Feb 22, 2023

erz4 commented Sep 18, 2022 •

edited

Loading

klauserber commented Jan 4, 2023 •

edited

Loading