Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure while executing task: OperationalError: deadlock detected #5279

Closed
2 tasks done
Kingdutch opened this issue Jan 27, 2021 · 5 comments
Closed
2 tasks done

Failure while executing task: OperationalError: deadlock detected #5279

Kingdutch opened this issue Jan 27, 2021 · 5 comments
Assignees
Labels
bug Something is broken.
Milestone

Comments

@Kingdutch
Copy link
Contributor

Kingdutch commented Jan 27, 2021

Describe the issue

We had a 4.3.1 instance that occasionally crashed on PO import causing the redis queue to go down with it and be lost. This problem has resolved itself after upgrading to 4.4.2. To update all our components we had to re-import PO files for quite a lot of components using weblate loadpo --all --lang <code>. This produced about ~270 celery tasks in the queue. We kept an eye on the log to see whether our earlier problem was resolved. In the log we saw the following deadlock error multiple times.

The error does not actually cause any problems we can find within Weblate, but since it's what looks like a fatal error that was not yet reported on GitHub I wanted to report it for your information.

I already tried

Describe the steps you tried to solve the problem yourself.

  • I've read and searched the docs and did not find the answer there.
    If you didn’t try already, try to search there what you wrote above.
  • Looked in the GitHub issue queue for related open issues.

To Reproduce the issue

Steps to reproduce the behavior:

  1. Run weblate loadpo --all --lang <langcode> form multiple languages
  2. See the error in

Expected behavior

No errors in the logs

Exception traceback

celery-main stderr | [2021-01-27 10:56:50,426: ERROR/1848] Failure while executing task: OperationalError: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/ForkPoolWorker-5] Failure while executing task: OperationalError: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/1848] Failure while executing task: OperationalError: {'signal': <Signal: task_failure providing_args={'kwargs', 'traceback', 'exception', 'einfo', 'args', 'task_id'}>, 'sender': <@task: weblate.trans.tasks.perform_load of weblate at 0x7ff99ba17860>, 'task_id': '81b245f9-cb65-4c25-ae7d-b9206a07c307', 'args': [213], 'kwargs': {'force': False, 'langs': ['bg']}, 'traceback': <traceback object at 0x7ff98bd7ff08>, 'einfo': <ExceptionInfo: OperationalError('deadlock detected\nDETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.\nProcess 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.\nHINT:  See server log for query details.\nCONTEXT:  while locking tuple (7881,1) in relation "trans_unit"\nSQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"\n')>}
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/ForkPoolWorker-5] Failure while executing task: OperationalError: {'signal': <Signal: task_failure providing_args={'kwargs', 'traceback', 'exception', 'einfo', 'args', 'task_id'}>, 'sender': <@task: weblate.trans.tasks.perform_load of weblate at 0x7ff99ba17860>, 'task_id': '81b245f9-cb65-4c25-ae7d-b9206a07c307', 'args': [213], 'kwargs': {'force': False, 'langs': ['bg']}, 'traceback': <traceback object at 0x7ff98bd7ff08>, 'einfo': <ExceptionInfo: OperationalError('deadlock detected\nDETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.\nProcess 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.\nHINT:  See server log for query details.\nCONTEXT:  while locking tuple (7881,1) in relation "trans_unit"\nSQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"\n')>}
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/1848] Failure while executing task
celery-main stderr | Traceback (most recent call last):
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr | psycopg2.errors.DeadlockDetected: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr |
celery-main stderr | The above exception was the direct cause of the following exception:
celery-main stderr |
celery-main stderr | Traceback (most recent call last):
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 405, in trace_task
celery-main stderr |     R = retval = fun(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 697, in __protected_call__
celery-main stderr |     return self.run(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/autoretry.py", line 34, in run
celery-main stderr |     return task._orig_run(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/tasks.py", line 86, in perform_load
celery-main stderr |     force=force, langs=langs, changed_template=changed_template, from_link=from_link
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/component.py", line 1751, in create_translations
celery-main stderr |     force, langs, request, changed_template, from_link
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/component.py", line 1875, in _create_translations
celery-main stderr |     self.progress_step()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/transaction.py", line 232, in __exit__
celery-main stderr |     connection.commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/utils/asyncio.py", line 26, in inner
celery-main stderr |     return func(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 266, in commit
celery-main stderr |     self._commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/utils.py", line 90, in __exit__
celery-main stderr |     raise dj_exc_value.with_traceback(traceback) from exc_value
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr | django.db.utils.OperationalError: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/ForkPoolWorker-5] Failure while executing task
celery-main stderr | Traceback (most recent call last):
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr | psycopg2.errors.DeadlockDetected: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr |
celery-main stderr | The above exception was the direct cause of the following exception:
celery-main stderr |
celery-main stderr | Traceback (most recent call last):
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 405, in trace_task
celery-main stderr |     R = retval = fun(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 697, in __protected_call__
celery-main stderr |     return self.run(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/autoretry.py", line 34, in run
celery-main stderr |     return task._orig_run(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/tasks.py", line 86, in perform_load
celery-main stderr |     force=force, langs=langs, changed_template=changed_template, from_link=from_link
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/component.py", line 1751, in create_translations
celery-main stderr |     force, langs, request, changed_template, from_link
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/component.py", line 1875, in _create_translations
celery-main stderr |     self.progress_step()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/transaction.py", line 232, in __exit__
celery-main stderr |     connection.commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/utils/asyncio.py", line 26, in inner
celery-main stderr |     return func(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 266, in commit
celery-main stderr |     self._commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/utils.py", line 90, in __exit__
celery-main stderr |     raise dj_exc_value.with_traceback(traceback) from exc_value
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr | django.db.utils.OperationalError: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |

Other instances of this error showed different process numbers, forkpoolworker number and tuple (x,y) values but were otherwise identical.

Server configuration and status

Weblate installation: Docker

 * Weblate: 4.4.2
 * Django: 3.1.5
 * siphashc: 2.1
 * Whoosh: 2.7.4
 * translate-toolkit: 3.3.0
 * lxml: 4.6.2
 * Pillow: 8.1.0
 * bleach: 3.2.1
 * python-dateutil: 2.8.1
 * social-auth-core: 4.0.3
 * social-auth-app-django: 4.0.0
 * django-crispy-forms: 1.10.0
 * oauthlib: 3.1.0
 * django-compressor: 2.4
 * djangorestframework: 3.12.2
 * django-filter: 2.4.0
 * django-appconf: 1.0.4
 * user-agents: 2.2.0
 * filelock: 3.0.12
 * setuptools: 40.8.0
 * jellyfish: 0.8.2
 * openpyxl: 3.0.5
 * celery: 5.0.5
 * kombu: 5.0.2
 * translation-finder: 2.7
 * weblate-language-data: 2020.13
 * html2text: 2020.1.16
 * pycairo: 1.16.2
 * pygobject: 3.30.4
 * diff-match-patch: 20200713
 * requests: 2.25.1
 * django-redis: 4.12.1
 * hiredis: 1.1.0
 * sentry_sdk: 0.19.5
 * Cython: 0.29.21
 * misaka: 2.1.1
 * GitPython: 3.1.12
 * borgbackup: 1.1.15
 * pyparsing: 2.4.7
 * Python: 3.7.3
 * Git: 2.20.1
 * psycopg2: 2.8.6
 * psycopg2-binary: 2.8.6
 * phply: 1.2.5
 * chardet: 4.0.0
 * ruamel.yaml: 0.16.12
 * tesserocr: 2.5.1
 * akismet: 1.1
 * boto3: 1.16.54
 * zeep: 4.0.0
 * aeidon: 1.9
 * iniparse: 0.5
 * mysqlclient: 2.0.3
 * Mercurial: 5.6.1
 * git-svn: 2.20.1
 * git-review: 1.28.0
 * Redis server: 4.0.14
 * PostgreSQL server: 11.10
 * Database backends: django.db.backends.postgresql
 * Cache backends: default:RedisCache, avatar:FileBasedCache
 * Email setup: django.core.mail.backends.smtp.EmailBackend: smtp.sendgrid.net
 * OS encoding: filesystem=utf-8, default=utf-8
 * Celery: redis://cache:6379/1, redis://cache:6379/1, regular
 * Platform: Linux 4.14.158-129.185.amzn2.x86_64 (x86_64)

Weblate deploy checks

System check identified some issues:

INFOS:
?: (weblate.I021) Error collection is not set up, it is highly recommended for production use
	HINT: https://docs.weblate.org/en/weblate-4.4.2/admin/install.html#collecting-errors
?: (weblate.I028) Backups are not configured, it is highly recommended for production use
	HINT: https://docs.weblate.org/en/weblate-4.4.2/admin/backup.html

System check identified 2 issues (1 silenced).
@nijel
Copy link
Member

nijel commented Jan 27, 2021

I've already seen this with one customer, but unfortunately, we didn't find the cause there yet...

This can probably sometimes happen when multiple updates are running in parallel, and your description seems to match that. I still wonder why we've never seen this on our production servers...

The proper long-term solution for this is to avoid holding key lock for updates, but that will be possible after Django 3.2 is released, see #5227

@nijel nijel added the bug Something is broken. label Jan 27, 2021
@Kingdutch
Copy link
Contributor Author

Glad to hear it's a known issue :) I'm not entirely sure what it's locking on. Could it have something to do with components that share a repository (maybe they also share other things)? We have quite a few of those.

@nijel
Copy link
Member

nijel commented Jan 27, 2021

No, it's lock on the units table. Weblate holds it to avoid concurrent updates of the units, but in current Django implementation it doesn't hold only row level lock, but also primary key lock. And that blocks insertion into that table for that time.

@nijel nijel added this to the 4.6 milestone Apr 13, 2021
@nijel
Copy link
Member

nijel commented Apr 13, 2021

Should be addressed in Weblate 4.6 with Django 3.2.

@nijel nijel closed this as completed Apr 13, 2021
@nijel nijel self-assigned this Apr 13, 2021
@github-actions
Copy link

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken.
Projects
None yet
Development

No branches or pull requests

2 participants