Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on create_partition #14563

Closed
5 of 11 tasks
remipcomaite opened this issue Oct 11, 2023 · 17 comments
Closed
5 of 11 tasks

Error on create_partition #14563

remipcomaite opened this issue Oct 11, 2023 · 17 comments

Comments

@remipcomaite
Copy link

remipcomaite commented Oct 11, 2023

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to security@ansible.com instead.)

Bug Summary

Hello everyone,

Since updating AWX to 23.3.0, I have this error each time a partition is created on the main_jobevent table:

`Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
return self.cursor.execute(sql)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: Duplicate key value violates unique constraint “pg_type_typname_nsp_index”
DETAIL: Key “(typname, typnamespace)=(main_jobevent_20231011_14, 2200)” already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 491, in run
self.pre_run_hook(self.instance, private_data_dir)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 1058, in pre_run_hook
super(RunJob, self).pre_run_hook(job, private_data_dir)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 425, in pre_run_hook
create_partition(instance.event_class._meta.db_table, start=instance.created)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/common.py", line 1175, in create_partition
cursor.execute(
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
return self.cursor.execute(sql)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
raise ex.with_traceback(None)
django.db.utils.IntegrityError: Duplicate key value violates unique constraint 'pg_type_typname_nsp_index'
DETAIL: Key “(typname, typnamespace)=(main_jobevent_20231011_14, 2200)” already exists. `

AWX version

23.3.0

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

no

Ansible version

No response

Operating system

No response

Web browser

No response

Steps to reproduce

Each hour an automated task failed.

Expected results

I wish there would no longer be this error when a partition is created on the main_jobevent table.

Actual results

The task failed with this error.

Additional information

No response

@fosterseth
Copy link
Member

fosterseth commented Oct 11, 2023

Are the jobs running to completion still or erroring out?

can you kubectl exec into task pod (task container) and run
awx-manage dbshell

then you can run \d main_jobevent_20231011_14;

just to verify that this table exists in the database

a new partition is created for each hour of the day (in your example, 14th hour)

if you wait an hour or two, do you get the same error, but with a different table (e.g. main_jobevent_20231011_15)?

overall, is this error coming up each time you run a job or just periodically?

Also you can query pg_stat_activity to see if there is any long running queries that may be blocking some important db operation?

@remipcomaite
Copy link
Author

I have a scheduled task that runs every 2 minutes.
As for the database, I use an external PostgreSQL cluster.
I confirm that the partition does exist.
My task failed when it was 2:00 p.m. just like it failed at 1:00 p.m.

@AlanCoding
Copy link
Member

so maybe this is just a miss in this criteria

awx/awx/main/utils/common.py

Lines 1181 to 1182 in 447ac77

if 'already exists' in str(e):
logger.info(f'Caught known error due to partition creation race: {e}')

My questions are (1) does this really happen every hour when creating a new partition and (2) can we get more context around these logs? We often have logger.exception logs, and these will give a traceback, but they also have an associated message that is helpful to know what bit of code is logging the traceback, not just what bit of code had the error.

Or, it could be that str(e) doesn't serialize the content we expect it to, and we need to use repr or something else.

@gundalow
Copy link
Contributor

Are you running on a non-English system?

Workaround
should be fixed in AWX code, workaround for now is to set lc_messages property in /var/lib/psql/data/postgresql.conf to en_US.UTF-8

Full Fix
A fuller fix is been put together in English string comparisons awx#14910

@kakawait
Copy link

kakawait commented Mar 13, 2024

Are you running on a non-English system?

Workaround should be fixed in AWX code, workaround for now is to set lc_messages property in /var/lib/psql/data/postgresql.conf to en_US.UTF-8

Full Fix A fuller fix is been put together in English string comparisons awx#14910

@gundalow I've the exact opposite, everything was working fine, but since 24.0.0 upgrade that include #14910 I've the same error on job output that prevent it from launching (see #14563 (comment) apparently error existed also in 23.8.1 but was silent and do not break workflow/job)

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240313_12, 2200) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/awx_devel/awx/main/tasks/jobs.py", line 499, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/awx_devel/awx/main/tasks/jobs.py", line 1066, in pre_run_hook
    super(RunJob, self).pre_run_hook(job, private_data_dir)
  File "/awx_devel/awx/main/tasks/jobs.py", line 427, in pre_run_hook
    create_partition(instance.event_class._meta.db_table, start=instance.created)
  File "/awx_devel/awx/main/utils/common.py", line 1154, in create_partition
    cursor.execute(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 102, in execute
    return super().execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240313_12, 2200) already exists.

Do you think workaround still relevant with full fix?


PS: I'm still on PG12. I'll try to upgrade to PG15 to see


Update

Even with Psql15 + env variable LC_MESSAGES: en_US.UTF-8 I'm getting the problem.

It seems that only affect Workflow template not normal template

@AlanCoding
Copy link
Member

We should use this as motivation to convert our tests to run with a postgres database. There are many things we could do to test that sort of error with a postgres test database.

@kakawait
Copy link

kakawait commented Mar 14, 2024

I don't know if it can help or not, but here my latest update:

  1. 24.0.0 + postgresql 12
  2. 24.0.0 + postgresql 15
  3. 24.0.0 + postgresql 15 + env variable inside postgresql container LC_MESSAGES=en_US.UTF-8
  4. (rollback after 24.0.0 without restoring backup) 23.8.1* + postgresql 12 + env variable inside postgresql container LC_MESSAGES=en_US.UTF-8

However on 4. I saw the same error on log

postgres  | 2024-03-14 06:27:14.057 UTC [114] LOG:  duration: 1920.585 ms  statement: CREATE TABLE main_jobevent_20240314_06 (LIKE main_jobevent INCLUDING DEFAULTS INCLUDING CONSTRAINTS); ALTER TABLE main_jobevent ATTACH PARTITION main_jobevent_20240314_06 FOR VALUES FROM ('2024-03-14 06:00:00+00:00') TO ('2024-03-14 07:00:00+00:00');
postgres  | 2024-03-14 06:27:14.090 UTC [113] ERROR:  duplicate key value violates unique constraint "pg_type_typname_nsp_index"
postgres  | 2024-03-14 06:27:14.090 UTC [113] DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240314_06, 2200) already exists.
postgres  | 2024-03-14 06:27:14.090 UTC [113] STATEMENT:  CREATE TABLE main_jobevent_20240314_06 (LIKE main_jobevent INCLUDING DEFAULTS INCLUDING CONSTRAINTS); ALTER TABLE main_jobevent ATTACH PARTITION main_jobevent_20240314_06 FOR VALUES FROM ('2024-03-14 06:00:00+00:00') TO ('2024-03-14 07:00:00+00:00');

but does not affect runtime and all my workflow jobs correctly started

*: I was on 23.8.1 before upgrade so was safer for me to rollback to working version, I didn't test 23.9.0

@per-lind
Copy link

per-lind commented Mar 19, 2024

Got a workflow with 2 parallel tasks, one started fine the other errored out with

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240319_01, 2200) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 499, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 1066, in pre_run_hook
    super(RunJob, self).pre_run_hook(job, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 427, in pre_run_hook
    create_partition(instance.event_class._meta.db_table, start=instance.created)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/common.py", line 1154, in create_partition
    cursor.execute(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240319_01, 2200) already exists.

Have not seen this before version 24.0.0

@Nenodema
Copy link
Contributor

Got a workflow with 2 parallel tasks, one started fine the other errored out with

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240319_01, 2200) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 499, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 1066, in pre_run_hook
    super(RunJob, self).pre_run_hook(job, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 427, in pre_run_hook
    create_partition(instance.event_class._meta.db_table, start=instance.created)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/common.py", line 1154, in create_partition
    cursor.execute(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240319_01, 2200) already exists.

Have not seen this before version 24.0.0

Same here, exactly the same issue since the upgrade to 24.0.0. The issue is consistently with a workflow with two parallel tasks.

@adpavlov
Copy link

+1 after 24.0.0 upgrade, parallel inventory sync failing with duplicate key

@kakawait
Copy link

Should me create a separate issue?

@Nenodema
Copy link
Contributor

Should me create a separate issue?

I think changing the subject will be sufficient?

@AlekseiSaff
Copy link

same issue after 24.0.0

@donnieelmore
Copy link

Having this issue after upgrading to 24.0.0.

@Klaas-
Copy link
Contributor

Klaas- commented Mar 27, 2024

Possible fix in 24.1.0 - #15000

@adpavlov
Copy link

Seems like fixed in 24.1.0

@dmzoneill
Copy link
Member

Closing, please reopen if difficulty persists. Thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests