Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psycopg2.errors.UniqueViolation: duplicate key value error #297

Closed
banesullivan opened this issue Mar 9, 2021 · 9 comments · Fixed by #311
Closed

psycopg2.errors.UniqueViolation: duplicate key value error #297

banesullivan opened this issue Mar 9, 2021 · 9 comments · Fixed by #311
Assignees
Labels
bug Something isn't working
Milestone

Comments

@banesullivan
Copy link
Contributor

banesullivan commented Mar 9, 2021

When running the demo data commands that I have, if the celery worker is set up to run in the background, an integretiy error for duplicate keys happens on the image_entry.save() call here:

The error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "geodata_imageentry_image_file_id_key"
DETAIL:  Key (image_file_id)=(14) already exists.

This is making me think that when we create a new ImageEntry in the tasks, there is some sort of race condition between jobs for the same ImageFile... which shouldn't happen? I'm not really sure what is going on here.

Steps to reproduce

  1. Clear the database volume
  2. Apply migrations: docker-compose run --rm django ./manage.py migrate
  3. In one session, launch the celery worker: docker-compose up celery and wait until ready
  4. In another session, run the Landsat demo data command: docker-compose run --rm django ./manage.py landsat_data -c 3
  5. Observe the error

Error Message

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "geodata_imageentry_image_file_id_key"
DETAIL:  Key (image_file_id)=(14) already exists.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./manage.py", line 28, in <module>
    main()
  File "./manage.py", line 24, in main
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/opt/django-project/rgd/geodata/management/commands/landsat_data.py", line 49, in handle
    helper.load_raster_files(_get_landsat_urls(count))
  File "/opt/django-project/rgd/geodata/management/commands/_data_helper.py", line 80, in load_raster_files
    imentries = load_image_files(
  File "/opt/django-project/rgd/geodata/management/commands/_data_helper.py", line 56, in load_image_files
    result = load_image_files(imfile)
  File "/opt/django-project/rgd/geodata/management/commands/_data_helper.py", line 60, in load_image_files
    read_image_file(entry)
  File "/opt/django-project/rgd/geodata/models/imagery/etl.py", line 129, in read_image_file
    _read_image_to_entry(image_entry, file_path)
  File "/opt/django-project/rgd/geodata/models/imagery/etl.py", line 69, in _read_image_to_entry
    image_entry.save()
  File "/opt/django-project/rgd/geodata/models/common.py", line 51, in save
    super(ModifiableEntry, self).save(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 726, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 763, in save_base
    updated = self._save_table(
  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 868, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 906, in _do_insert
    return manager._insert(
  File "/usr/local/lib/python3.8/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 1268, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1410, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 98, in execute
    return super().execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "geodata_imageentry_image_file_id_key"
DETAIL:  Key (image_file_id)=(14) already exists.
@banesullivan banesullivan added the bug Something isn't working label Mar 9, 2021
@banesullivan
Copy link
Contributor Author

If we do not run the celery worker in the background but wait until the demo data command is complete then this error does not occur. I can't put my finger out what exactly is going wrong here yet.

@zachmullen
Copy link
Contributor

I'm assigning myself on this one.

@zachmullen zachmullen self-assigned this Mar 10, 2021
@zachmullen
Copy link
Contributor

This is a result of attempting to create multiple ImageEntry with the same ImageFile foreign key, which is unique by virtue of being a OneToOneField.

I don't understand the data model enough to determine the appropriate fix. @banesullivan is this actually a one to one correspondence, or can there be multiple ImageEntry for the same ImageFile?

@banesullivan
Copy link
Contributor Author

banesullivan commented Mar 10, 2021

is this actually a one-to-one correspondence, or can there be multiple ImageEntry for the same ImageFile?

This is a on-to-one correspondence in practice. An ImageEntry is only ever created and accosiated with an ImageFile in read_image_file here:

image_entry, created = get_or_create_no_commit(
ImageEntry, defaults=dict(name=ife.file.name), image_file=ife
)

This should only ever be called from task_read_image_file which is the result of an ImageFile being saved and it kicking off that task.

This makes me think my helper, get_or_create_no_commit, could have an issue:

def get_or_create_no_commit(model, defaults=None, **kwargs):
try:
return model.objects.get(**kwargs), False
except model.DoesNotExist:
if not defaults:
defaults = {}
defaults.update(kwargs)
return model(**defaults), True

Or one ImageFile is launching multiple tasks both attempting to create a new ImageEntry at the same time.

@banesullivan
Copy link
Contributor Author

Perhaps it is time we do some sort of task auditing to make sure the ETL routines are all only launching one task per model

@zachmullen
Copy link
Contributor

Yes. I think we'll want to re-engineer a lot of this to make task launching more explicit.

@banesullivan
Copy link
Contributor Author

banesullivan commented Mar 10, 2021

Uncovered what's going wrong here. It is a trivial typo fix.

I added a signal wrapper skip_signal here

def skip_signal():
"""Skip the signal on an instance-basis."""
def _skip_signal(signal_func):
@wraps(signal_func)
def _decorator(sender, instance, **kwargs):
if hasattr(instance, 'skip_signal'):
return None
return signal_func(sender, instance, **kwargs)
return _decorator
return _skip_signal

that checks if a model instance has an skip_signal attribute and if so (regardless of value) it will skip sending the task signal. Someone... 🤦🏻‍♂️ ... used skip_task instead of skip_signal here:

if skip_task:
entry.skip_task = True

which means two tasks are being launched for the ImageFile model.

You can see the save() call here launching the task:

and calling the ETL routine explicitly here:

@banesullivan
Copy link
Contributor Author

We should come up with a safer way to use this skip_signal wrapper such that a typo like this doesn't happen again. Arbitrarily setting an attribute on an instance like this isn't a great approach

@banesullivan
Copy link
Contributor Author

Turns out #299 didn't entirely fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants