Skip to content

Advisory locks can deadlock and sleep forever #137

@jmanian

Description

@jmanian

I experienced this working with postgres. If you don't use timeout_seconds then it's possible for two processes to get deadlocked against each other and get stuck in a sleep-retry loop forever.

You can reproduce this by opening up two rails consoles and running these two transactions:

# In console 1
ApplicationRecord.transaction do
  ApplicationRecord.with_advisory_lock("lock_test_a", transaction: true)
  pp "sleeping"
  sleep(3)
  pp "waking up"
  ApplicationRecord.with_advisory_lock("lock_test_b", transaction: true)
  "success"
end

# In console 2
ApplicationRecord.transaction do
  ApplicationRecord.with_advisory_lock("lock_test_b", transaction: true)
  pp "sleeping"
  sleep(3)
  pp "waking up"
  ApplicationRecord.with_advisory_lock("lock_test_a", transaction: true)
  "success"
end

The only different between them is the lock strings are swapped, which is what creates the deadlock. The sleep(3) is there just so you have time to start the second command in time to create the deadlock.


I have a few thoughts on ways to prevent this:

Add a global setting for timeout_seconds

Now that I know about this issue, I plan to use with_advisory_lock! with timeout_seconds everywhere. But it would be nice to be able to set timeout_seconds globally.

Add an option to not use try commands

Normally Postgres detects a deadlock and kills one of the queries (raising a PG::TRDeadlockDetected) allowing the other one to proceed. For example it does this with row-level locks, which you can see by running these two queries at the same time in two consoles:

# Console 1
User.transaction do
  User.lock.find(1)
  pp "sleeping"
  sleep(3)
  pp "waking up"
  User.lock.find(2)
  "success"
end


# Console 2
User.transaction do
  User.lock.find(2)
  pp "sleeping"
  sleep(3)
  pp "waking up"
  User.lock.find(1)
  "success"
end

This doesn't happen when you use with_advisory_lock because the try commands return immediately when they can't acquire the lock, and they don't actually create a true deadlock within postgres — instead the deadlock is created by the retry and sleep logic within with_advisory_lock in conjunction with the locks.

But if instead you use the non-try version of the locking functions, which will wait for the lock to be available rather than returning, then postgres will detect the deadlock and kill one of them, just as it does with the row-level locks above. You can test with this:

# Console 1
ApplicationRecord.transaction do
  ApplicationRecord.connection.execute("SELECT pg_advisory_xact_lock(101,0)")
  pp "sleeping"
  sleep(3)
  pp "waking up"
  ApplicationRecord.connection.execute("SELECT pg_advisory_xact_lock(102,0)")
  "success"
end

# Console 2
ApplicationRecord.transaction do
  ApplicationRecord.connection.execute("SELECT pg_advisory_xact_lock(102,0)")
  pp "sleeping"
  sleep(3)
  pp "waking up"
  ApplicationRecord.connection.execute("SELECT pg_advisory_xact_lock(101,0)")
  "success"
end

I assume the idea with using try was to not tie up the postgres thread while waiting for the lock to be available, but this deadlock issue is a drawback of that approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions