Error on create_partition #14563

remipcomaite · 2023-10-11T14:38:06Z

Please confirm the following

I agree to follow this project's code of conduct.
I have checked the current issues for duplicates.
I understand that AWX is open source software provided for free and that I might not receive a timely response.
I am NOT reporting a (potential) security vulnerability. (These should be emailed to security@ansible.com instead.)

Bug Summary

Hello everyone,

Since updating AWX to 23.3.0, I have this error each time a partition is created on the main_jobevent table:

`Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
return self.cursor.execute(sql)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: Duplicate key value violates unique constraint “pg_type_typname_nsp_index”
DETAIL: Key “(typname, typnamespace)=(main_jobevent_20231011_14, 2200)” already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 491, in run
self.pre_run_hook(self.instance, private_data_dir)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 1058, in pre_run_hook
super(RunJob, self).pre_run_hook(job, private_data_dir)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 425, in pre_run_hook
create_partition(instance.event_class._meta.db_table, start=instance.created)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/common.py", line 1175, in create_partition
cursor.execute(
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
return self.cursor.execute(sql)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
raise ex.with_traceback(None)
django.db.utils.IntegrityError: Duplicate key value violates unique constraint 'pg_type_typname_nsp_index'
DETAIL: Key “(typname, typnamespace)=(main_jobevent_20231011_14, 2200)” already exists. `

AWX version

23.3.0

Select the relevant components

Installation method

kubernetes

Modifications

no

Ansible version

No response

Operating system

No response

Web browser

No response

Steps to reproduce

Each hour an automated task failed.

Expected results

I wish there would no longer be this error when a partition is created on the main_jobevent table.

Actual results

The task failed with this error.

Additional information

No response

The text was updated successfully, but these errors were encountered:

fosterseth · 2023-10-11T15:19:28Z

Are the jobs running to completion still or erroring out?

can you kubectl exec into task pod (task container) and run
awx-manage dbshell

then you can run \d main_jobevent_20231011_14;

just to verify that this table exists in the database

a new partition is created for each hour of the day (in your example, 14th hour)

if you wait an hour or two, do you get the same error, but with a different table (e.g. main_jobevent_20231011_15)?

overall, is this error coming up each time you run a job or just periodically?

Also you can query pg_stat_activity to see if there is any long running queries that may be blocking some important db operation?

remipcomaite · 2023-10-11T15:36:10Z

I have a scheduled task that runs every 2 minutes.
As for the database, I use an external PostgreSQL cluster.
I confirm that the partition does exist.
My task failed when it was 2:00 p.m. just like it failed at 1:00 p.m.

AlanCoding · 2023-10-19T17:35:00Z

so maybe this is just a miss in this criteria

awx/awx/main/utils/common.py

Lines 1181 to 1182 in 447ac77

    
           if 'already exists' in str(e): 
        
               logger.info(f'Caught known error due to partition creation race: {e}')

My questions are (1) does this really happen every hour when creating a new partition and (2) can we get more context around these logs? We often have logger.exception logs, and these will give a traceback, but they also have an associated message that is helpful to know what bit of code is logging the traceback, not just what bit of code had the error.

Or, it could be that str(e) doesn't serialize the content we expect it to, and we need to use repr or something else.

gundalow · 2024-02-27T09:35:32Z

Are you running on a non-English system?

Workaround
should be fixed in AWX code, workaround for now is to set lc_messages property in /var/lib/psql/data/postgresql.conf to en_US.UTF-8

Full Fix
A fuller fix is been put together in English string comparisons awx#14910

kakawait · 2024-03-13T13:50:36Z

Are you running on a non-English system?

Workaround should be fixed in AWX code, workaround for now is to set lc_messages property in /var/lib/psql/data/postgresql.conf to en_US.UTF-8

Full Fix A fuller fix is been put together in English string comparisons awx#14910

@gundalow I've the exact opposite, everything was working fine, but since 24.0.0 upgrade that include #14910 I've the same error on job output that prevent it from launching (see #14563 (comment) apparently error existed also in 23.8.1 but was silent and do not break workflow/job)

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240313_12, 2200) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/awx_devel/awx/main/tasks/jobs.py", line 499, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/awx_devel/awx/main/tasks/jobs.py", line 1066, in pre_run_hook
    super(RunJob, self).pre_run_hook(job, private_data_dir)
  File "/awx_devel/awx/main/tasks/jobs.py", line 427, in pre_run_hook
    create_partition(instance.event_class._meta.db_table, start=instance.created)
  File "/awx_devel/awx/main/utils/common.py", line 1154, in create_partition
    cursor.execute(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 102, in execute
    return super().execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240313_12, 2200) already exists.

Do you think workaround still relevant with full fix?

PS: I'm still on PG12. I'll try to upgrade to PG15 to see

Update

Even with Psql15 + env variable LC_MESSAGES: en_US.UTF-8 I'm getting the problem.

It seems that only affect Workflow template not normal template

AlanCoding · 2024-03-13T14:05:52Z

We should use this as motivation to convert our tests to run with a postgres database. There are many things we could do to test that sort of error with a postgres test database.

kakawait · 2024-03-14T06:34:28Z

I don't know if it can help or not, but here my latest update:

24.0.0 + postgresql 12 ❌
24.0.0 + postgresql 15 ❌
24.0.0 + postgresql 15 + env variable inside postgresql container LC_MESSAGES=en_US.UTF-8 ❌
(rollback after 24.0.0 without restoring backup) 23.8.1* + postgresql 12 + env variable inside postgresql container LC_MESSAGES=en_US.UTF-8 ✅

However on 4. I saw the same error on log

postgres  | 2024-03-14 06:27:14.057 UTC [114] LOG:  duration: 1920.585 ms  statement: CREATE TABLE main_jobevent_20240314_06 (LIKE main_jobevent INCLUDING DEFAULTS INCLUDING CONSTRAINTS); ALTER TABLE main_jobevent ATTACH PARTITION main_jobevent_20240314_06 FOR VALUES FROM ('2024-03-14 06:00:00+00:00') TO ('2024-03-14 07:00:00+00:00');
postgres  | 2024-03-14 06:27:14.090 UTC [113] ERROR:  duplicate key value violates unique constraint "pg_type_typname_nsp_index"
postgres  | 2024-03-14 06:27:14.090 UTC [113] DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240314_06, 2200) already exists.
postgres  | 2024-03-14 06:27:14.090 UTC [113] STATEMENT:  CREATE TABLE main_jobevent_20240314_06 (LIKE main_jobevent INCLUDING DEFAULTS INCLUDING CONSTRAINTS); ALTER TABLE main_jobevent ATTACH PARTITION main_jobevent_20240314_06 FOR VALUES FROM ('2024-03-14 06:00:00+00:00') TO ('2024-03-14 07:00:00+00:00');

but does not affect runtime and all my workflow jobs correctly started

*: I was on 23.8.1 before upgrade so was safer for me to rollback to working version, I didn't test 23.9.0

per-lind · 2024-03-19T07:02:08Z

Got a workflow with 2 parallel tasks, one started fine the other errored out with

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240319_01, 2200) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 499, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 1066, in pre_run_hook
    super(RunJob, self).pre_run_hook(job, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 427, in pre_run_hook
    create_partition(instance.event_class._meta.db_table, start=instance.created)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/common.py", line 1154, in create_partition
    cursor.execute(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240319_01, 2200) already exists.

Have not seen this before version 24.0.0

Nenodema · 2024-03-20T07:44:06Z

Got a workflow with 2 parallel tasks, one started fine the other errored out with

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240319_01, 2200) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 499, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 1066, in pre_run_hook
    super(RunJob, self).pre_run_hook(job, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 427, in pre_run_hook
    create_partition(instance.event_class._meta.db_table, start=instance.created)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/common.py", line 1154, in create_partition
    cursor.execute(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(main_jobevent_20240319_01, 2200) already exists.

Have not seen this before version 24.0.0

Same here, exactly the same issue since the upgrade to 24.0.0. The issue is consistently with a workflow with two parallel tasks.

adpavlov · 2024-03-20T08:05:14Z

+1 after 24.0.0 upgrade, parallel inventory sync failing with duplicate key

kakawait · 2024-03-20T08:07:13Z

Should me create a separate issue?

Nenodema · 2024-03-20T08:08:47Z

Should me create a separate issue?

I think changing the subject will be sufficient?

AlekseiSaff · 2024-03-24T07:55:04Z

same issue after 24.0.0

donnieelmore · 2024-03-25T18:07:01Z

Having this issue after upgrading to 24.0.0.

Klaas- · 2024-03-27T10:06:57Z

Possible fix in 24.1.0 - #15000

adpavlov · 2024-03-27T11:36:14Z

Seems like fixed in 24.1.0

dmzoneill · 2024-03-27T12:14:18Z

Closing, please reopen if difficulty persists. Thanks all

github-actions bot added needs_triage type:bug community labels Oct 11, 2023

Nenodema mentioned this issue Mar 23, 2024

duplicate key value violates unique constraint "pg_type_typname_nsp_index" #15016

Closed

11 tasks

Klaas- mentioned this issue Mar 27, 2024

Fix failing bulk launch job due to create partition race #15000

Merged

dmzoneill closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on create_partition #14563

Error on create_partition #14563

remipcomaite commented Oct 11, 2023 •

edited

Loading

fosterseth commented Oct 11, 2023 •

edited

Loading

remipcomaite commented Oct 11, 2023

AlanCoding commented Oct 19, 2023

gundalow commented Feb 27, 2024

kakawait commented Mar 13, 2024 •

edited

Loading

AlanCoding commented Mar 13, 2024

kakawait commented Mar 14, 2024 •

edited

Loading

per-lind commented Mar 19, 2024 •

edited

Loading

Nenodema commented Mar 20, 2024

adpavlov commented Mar 20, 2024

kakawait commented Mar 20, 2024

Nenodema commented Mar 20, 2024

AlekseiSaff commented Mar 24, 2024

donnieelmore commented Mar 25, 2024

Klaas- commented Mar 27, 2024

adpavlov commented Mar 27, 2024

dmzoneill commented Mar 27, 2024

Error on create_partition #14563

Error on create_partition #14563

Comments

remipcomaite commented Oct 11, 2023 • edited Loading

Please confirm the following

Bug Summary

AWX version

Select the relevant components

Installation method

Modifications

Ansible version

Operating system

Web browser

Steps to reproduce

Expected results

Actual results

Additional information

fosterseth commented Oct 11, 2023 • edited Loading

remipcomaite commented Oct 11, 2023

AlanCoding commented Oct 19, 2023

gundalow commented Feb 27, 2024

kakawait commented Mar 13, 2024 • edited Loading

AlanCoding commented Mar 13, 2024

kakawait commented Mar 14, 2024 • edited Loading

per-lind commented Mar 19, 2024 • edited Loading

Nenodema commented Mar 20, 2024

adpavlov commented Mar 20, 2024

kakawait commented Mar 20, 2024

Nenodema commented Mar 20, 2024

AlekseiSaff commented Mar 24, 2024

donnieelmore commented Mar 25, 2024

Klaas- commented Mar 27, 2024

adpavlov commented Mar 27, 2024

dmzoneill commented Mar 27, 2024

remipcomaite commented Oct 11, 2023 •

edited

Loading

fosterseth commented Oct 11, 2023 •

edited

Loading

kakawait commented Mar 13, 2024 •

edited

Loading

kakawait commented Mar 14, 2024 •

edited

Loading

per-lind commented Mar 19, 2024 •

edited

Loading