Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: dealing with old locks #30

Closed
BrunoDelor opened this issue Jul 5, 2022 · 2 comments
Closed

Question: dealing with old locks #30

BrunoDelor opened this issue Jul 5, 2022 · 2 comments

Comments

@BrunoDelor
Copy link

BrunoDelor commented Jul 5, 2022

Hello,

I have an issue with dealing with obsolete locks. My situation simple, and is the following:

Context

I have a long process that uses pglock to restrict the access to some API endpoint until it's done. The service have 3 instances which is why I needed a distributed lock. That lock is only used to reject requests to schedule some works.

Problem

Now, the service that was running the long process fails and crash. The lock is still in the database.
If I attempt to acquire the lock, it fails because the rvn doesn't match.
If I attempt to release it, it works but I see two issues with that.

  1. I expected the system to automatically detect and internally deal with obsolete locks, such as trying to acquire an obsolete lock doesn't result in an error
  2. I can't even clean it manually by releasing the lock because the error covers two cases: locked and obsolete

I'm starting to think about using the data field to store things like the heartbeat frequency + current date, and have a goroutine in my long work to update the data periodically but it feels like that should be happening internally

So the question is the following: How to deal with obsolete keys ? I can't just try acquire with fail on lock because that would not help distinguish between a legit lock, and an obsolete one. And I can't just blindly release that lock after the error because it could be legit.

Last thoughts

If those are behaviors do not exist yet and that are welcome for the enhancement of the library I might give it a try in a PR. Otherwise if there is no existing solution I suppose I would have to write a wrapper.

Thanks for your attention,

Edit: pressed send early by mistake: finishing the message
Edit 2: done

@ucirello
Copy link
Collaborator

ucirello commented Jul 5, 2022

@BrunoDelor -- thanks for opening this issue.

Your line of reasoning makes me believe that you might need to take a bit of a harder look at how this library works and more specifically what is meant by the last block of the README.md

The lock client never stores absolute times in PostgreSQL. The way locks are expired is that a call to tryAcquire reads in the current lock, checks the record version number of the lock and starts a timer. If the lock still has the same after the lease duration time has passed, the client will determine that the lock is stale and expire it.

What you seem to be calling obsolete locks is termed, in this library, as stale locks. Locks whose owner is gone and it is no longer sending heartbeats. And that is by design. The stale lock must be left behind so that the time skew logic works.

If you have configured everything correctly, as soon as the current leader stops working, the next follower will attempt grabbing the lock and wait until the lease time runs out and effectively grab the lock.

Answering your question directly:

So the question is the following: How to deal with obsolete keys ? I can't just try acquire with fail on lock because that would not help distinguish between a legit lock, and an obsolete one. And I can't just blindly release that lock after the error because it could be legit.

Assuming you are using a lock leader election pattern, you need to make sure the followers are always attempting to grab the lock from the leader. You can use loops (in which the lock grabbing succeeds upon leader-loss), or you might want postgresql-based notifications to speed this process up (https://www.postgresql.org/docs/current/sql-notify.html).

With all that said, you might be missing a feature common in other lock server which is the recapture of the lock by the current owner upon missed heartbeats. This feature has not been implemented yet because I couldn't quite design a DX that wouldn't let people misuse it.

@ucirello ucirello closed this as completed Jul 5, 2022
@ucirello
Copy link
Collaborator

ucirello commented Jul 5, 2022

To make it less confusing to you, make sure you're not using:
https://github.com/cirello-io/pglock/blob/v1.9.0/lock.go#L107-L112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants