Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent deadlock when importing database #8648

Merged
merged 2 commits into from Dec 8, 2021
Merged

Conversation

lmossman
Copy link
Contributor

@lmossman lmossman commented Dec 8, 2021

What

A deadlock was intermittently happening when importing an archive into a database. This manifested as an error like:

airbyte-db          | 2021-12-08 21:19:58.229 UTC [72] DETAIL:  Process 72 waits for AccessExclusiveLock on relation 16705 (jobs) of database 16385; blocked by process 119.
airbyte-db          | 	Process 119 waits for AccessShareLock on relation 16715 (attempts) of database 16385; blocked by process 72.
airbyte-db          | 	Process 72: truncate table public.JOBS restart identity
airbyte-db          | 	Process 119: SELECT
airbyte-db          | 	jobs.id AS job_id,
airbyte-db          | 	jobs.config_type AS config_type,
airbyte-db          | 	jobs.scope AS scope,
airbyte-db          | 	jobs.config AS config,
airbyte-db          | 	jobs.status AS job_status,
airbyte-db          | 	jobs.started_at AS job_started_at,
airbyte-db          | 	jobs.created_at AS job_created_at,
airbyte-db          | 	jobs.updated_at AS job_updated_at,
airbyte-db          | 	attempts.attempt_number AS attempt_number,
airbyte-db          | 	attempts.log_path AS log_path,
airbyte-db          | 	attempts.output AS attempt_output,
airbyte-db          | 	attempts.status AS attempt_status,
airbyte-db          | 	attempts.created_at AS attempt_created_at,
airbyte-db          | 	attempts.updated_at AS attempt_updated_at,
airbyte-db          | 	attempts.ended_at AS attempt_ended_at
airbyte-db          | 	FROM jobs LEFT OUTER JOIN attempts ON jobs.id = attempts.job_id WHERE CAST(jobs.status AS VARCHAR) = 'pending' AND jobs.scope NOT IN ( SELECT scope FROM jobs WHERE status = 'running' OR status = 'incomplete' ) ORDER BY jobs.created_at ASC LIMIT 1

The reason this happened is the due to the following:

  • The JobSubmitter runs in a separate process from the archive import logic. This JobSubmitter occasionally queries the jobs database for the next job to process , resulting in the SELECT query that appeared in the above error
  • If this JobSubmitter query is performed after the importDatabase transaction has began processing the attempts table (and therefore has obtained a lock on attempts), but before it has began processing the jobs table (and therefore has not obtained a lock on jobs), then the JobSubmitter will obtain a lock on the jobs table, then try to obtain a lock on the attempts table in order to join it with the jobs table. This results in a deadlock once the importDatabase logic moves on to the jobs table.
  • While this has been an intermittent issue for a while (see this relevant issue), this scenario was made more likely by the recent change to batch up the import database insertion query, as that slows down the import process and thus increases the window of time that the above situation is possible.

How

This PR prevents the deadlock by making the importDatabase logic explicitly obtain a lock on all tables that it is going to import into at the beginning of the transaction, thus making the window between obtaining locks on the different tables minuscule.

@github-actions github-actions bot added area/platform issues related to the platform area/scheduler labels Dec 8, 2021
Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the very clear summary of the problem and why it was more prevalent now.

@lmossman lmossman merged commit 973c043 into master Dec 8, 2021
@lmossman lmossman deleted the lmossman/fix-deadlock branch December 8, 2021 23:26
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* lock all tables at beginning of transaction to avoid deadlocks

* fix lock statement format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants