Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awx.conf.settings Database settings are not available #12683

Closed
4 of 9 tasks
klauserber opened this issue Aug 18, 2022 · 7 comments
Closed
4 of 9 tasks

awx.conf.settings Database settings are not available #12683

klauserber opened this issue Aug 18, 2022 · 7 comments

Comments

@klauserber
Copy link

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.

Bug Summary

Hello,

there is another brick in the way to go online with our awx sytem. I hope somebody can help us.

Sometimes the whole system goes in a state where no jobs can be started. Every new job is in the state 'pending' and nothing happens, no new automation-job pods are started.

A closer logs of the xxx-task container shows the following:

2022-08-05 07:43:51,408 ERROR    [-] awx.conf.settings Database settings are not available, using defaults.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
psycopg2.InterfaceError: connection already closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 80, in _ctit_db_wrapper
    yield
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 408, in _get_local_with_cache
    return self._get_local(name)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 327, in _get_local
    self._preload_cache()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 296, in _preload_cache
    for setting in Setting.objects.filter(key__in=settings_to_cache.keys(), user__isnull=True).order_by('pk'):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 280, in __iter__
    self._fetch_all()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
    cursor = self.connection.cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 259, in cursor
    return self._cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

A restart of the awx main pod brings the system back in a working state.

We use a separate deployed database (with the Zalando Postgres operator). The DB shows no errors and have enough resources.

AWX version

21.4.0

Select the relevant components

  • UI
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

yes

Ansible version

21.10.11

Operating system

Kubernetes 23.6 on Ubuntu 20.04

Web browser

Chrome

Steps to reproduce

This is a sporadic problem, we don't know how to reproduce it.

Expected results

No working system without down times.

Actual results

The system is not working after a while and must be restarted.

Additional information

wie have an extended execution environment images with some additional binary dependency (terraform, kubectl, helm ...), built like the original awx-ee.

@fosterseth
Copy link
Member

@klauserber how long does your system stay up before getting into this bad state? Do you see evidence that awx is connecting to db at all during this time?

@klauserber
Copy link
Author

@fosterseth We have seen this several times after the System was up for a few days. We have only test workloads on the system at the Moment, about 50-100 Jobs per day.

@erz4
Copy link

erz4 commented Sep 18, 2022

same happening here:

2022-09-17 16:48:49,680 DEBUG    [-] awx.main.commands.run_callback_receiver 25 is alive
2022-09-17 16:48:49,680 DEBUG    [-] awx.main.commands.run_callback_receiver 25 is alive
2022-09-17 16:48:49,682 ERROR    [-] awx.conf.settings Database settings are not available, using defaults.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
psycopg2.InterfaceError: connection already closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 80, in _ctit_db_wrapper
    yield
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 408, in _get_local_with_cache
    return self._get_local(name)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 327, in _get_local
    self._preload_cache()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/conf/settings.py", line 296, in _preload_cache
    for setting in Setting.objects.filter(key__in=settings_to_cache.keys(), user__isnull=True).order_by('pk'):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 280, in __iter__
    self._fetch_all()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
    cursor = self.connection.cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 259, in cursor
    return self._cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

AWX version 21.2.0
k8s deployment via the awx-operator
external DB of type AWS RDS with postgres (12.X)

our workload is 5,000-15,000 jobs per day and it happens to us once after something like 2 months

@stanislav-zaprudskiy
Copy link
Contributor

I observe the same behavior with AWX 21.7.0 using external PostgreSQL. In my case it happens when there is an interruption in PostgreSQL availability. But not every interruption results into such failure. The PostgreSQL is operated by https://github.com/zalando/postgres-operator, and simple patronictl failover is handled by AWX well, while a more complicated rolling restart of all DB servers during deployment (or during managed Kubernetes nodes restarts) ends up with AWX being stuck in awx.conf.settings Database settings are not available, using defaults. error: connection already closed state. The problem appeared after upgrades

  • PostgreSQL 14.0 -> 14.4 (spilo-14:2.1-p3 -> spilo-14:2.1-p6), postgres-operator v1.7.1 -> v1.8.2
  • AWX v21.0.0 -> v21.7.0

@klauserber
Copy link
Author

klauserber commented Jan 4, 2023

Same here, AWX 21.7.0, Postgres 14.4 with Zalando Operator. And I also cannot reproduce it. Simply killing the leader in the postgres cluster sometimes produces the error message (awx.conf.settings Database settings are not available ...), but the system comes back to normal functionality.

@akus062381
Copy link
Member

Hi @klauserber, this trace can be part of "normal operation" when connections expire. We have done some work around these error messages in later versions of AWX (from 21.4.0). As you stated, this issue is not reproducible in 21.7.0. We will close this issue for now, but please feel free to reopen if need be.

@stanislav-zaprudskiy
Copy link
Contributor

Still observe the same problem with AWX 21.10.2. Hopefully #13505 and the corresponding PR would appear a solution and get some traction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants