-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: dealing with old locks #30
Comments
@BrunoDelor -- thanks for opening this issue. Your line of reasoning makes me believe that you might need to take a bit of a harder look at how this library works and more specifically what is meant by the last block of the README.md
What you seem to be calling obsolete locks is termed, in this library, as stale locks. Locks whose owner is gone and it is no longer sending heartbeats. And that is by design. The stale lock must be left behind so that the time skew logic works. If you have configured everything correctly, as soon as the current leader stops working, the next follower will attempt grabbing the lock and wait until the lease time runs out and effectively grab the lock. Answering your question directly:
Assuming you are using a lock leader election pattern, you need to make sure the followers are always attempting to grab the lock from the leader. You can use loops (in which the lock grabbing succeeds upon leader-loss), or you might want postgresql-based notifications to speed this process up (https://www.postgresql.org/docs/current/sql-notify.html). With all that said, you might be missing a feature common in other lock server which is the recapture of the lock by the current owner upon missed heartbeats. This feature has not been implemented yet because I couldn't quite design a DX that wouldn't let people misuse it. |
To make it less confusing to you, make sure you're not using: |
Hello,
I have an issue with dealing with obsolete locks. My situation simple, and is the following:
Context
I have a long process that uses pglock to restrict the access to some API endpoint until it's done. The service have 3 instances which is why I needed a distributed lock. That lock is only used to reject requests to schedule some works.
Problem
Now, the service that was running the long process fails and crash. The lock is still in the database.
If I attempt to acquire the lock, it fails because the rvn doesn't match.
If I attempt to release it, it works but I see two issues with that.
I'm starting to think about using the data field to store things like the heartbeat frequency + current date, and have a goroutine in my long work to update the data periodically but it feels like that should be happening internally
So the question is the following: How to deal with obsolete keys ? I can't just try acquire with fail on lock because that would not help distinguish between a legit lock, and an obsolete one. And I can't just blindly release that lock after the error because it could be legit.
Last thoughts
If those are behaviors do not exist yet and that are welcome for the enhancement of the library I might give it a try in a PR. Otherwise if there is no existing solution I suppose I would have to write a wrapper.
Thanks for your attention,
Edit: pressed send early by mistake: finishing the message
Edit 2: done
The text was updated successfully, but these errors were encountered: