-
-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simpler Storage API #5
Comments
This is a good question, but I am still convinced the two interfaces are jointly required for a workable Storage implementation. It's not the Storage methods that need to be made safe for concurrent use -- they already need to be, of course -- it's that the ACME operations like the obtaining of a certificate need to be synchronized. This means not only one operation at a time, but one in total. TryLock is the easiest way I could think of that allows other processes to wait while another finishes obtaining the certificate it needs, then simply loading that certificate without obtaining it again. This idempotency is crucial for running correctly in a cluster, and it's why other libraries such as autocert don't do well sharing storage imementations even though it has a Cache interface. Look closer at CertMagic's code and you'll see that it doesn't always wait -- it only waits if the lock is already held, and then it does some other flow when the waiting is over, whereas with a regular mutex you just wait and then continue, but here we have to alter the program flow. For database backed storage you can use a transaction to "obtain" a lock or, if already obtained, to know that and simply return a waiter instead. I hope that makes sense. They used to be separate but I really could not justify totally separating the two because it was not safe to use one but not the other. FileStorage does need to handle crashes better, and it's on my list to enable expiring the locks, but it's also safe to assume that in a clean shutdown scenario, Unlock() will be called, cleaning up the resources. (Also a WIP, I haven't hit either of these edge cases in the years I've been using it though.) |
Commit fe72205 handles stale locks in FileStorage, and adds a method to the Let me know what you think! |
Going by the 2 uses of To me this the same as a Without the need to be asynchronous the implementation can just use a e.g. Renew using
If I understand the code correctly, it currently does:
Again, it's entirely possible that I just don't understand the code but it seems to me that if 2 processes happen to try to renew at the same time but don't block each other, i.e. p1 calls Unlock before p2 calls TryLock then p2 will go on to try and renew as well.
How do you implement said waiter?
To be clear: I originally thought an external lock might not be needed and it could be simpler to remove
Did you not hit not hit them? or are you just not aware that you hit them? :p |
I don't think this works. If the process crashed, there'd be no chance to clean up the locks at shutdown, otherwise it could just unlock them. It's the same problem you have with pid files, disk-based locks, (non-abstract) domain sockets, etc.: you have to take care to clean up at program start. |
@DisposaBoy Thanks for your comments. Also thanks for GoSublime! 😊
This is probably a better pattern. I'll push a branch soon that changes this, and I would like you to try it if that's okay with you!
For FileStorage, we do a simple poll every so often.
I understand now -- I bet we can remove both.
It does work shutdowns that can cleanup (e.g. SIGTERM), but crashes have to be handled by ignoring/removing stale locks, which that commit also does. |
@DisposaBoy Please see and review PR #7 when you have a chance. Hopefully soon. :) I'd like to get this pinned down. |
No problem 😄
That's a common solution, but it's the one I was afraid of because you run the risk of polling too quickly, thus overloading the system or not quickly enough, thus introducing more latency than necessary.
Reading the code: stale is defined as being 2 hours old which means it only works if you try the lock after 2 hours. If you try before this time, it will block forever (or until whatever lock timeout might be in place). The code below demonstrates the issue for me. I ran it with ctrl.,ctrl+r which is roughly equivalent to
|
It's later here, so I'll add it to my list of things to do tomorrow. |
What better solution would you propose? Thanks for your reproduce case, I'd be interested what you think of the new design in #7 after you wake up 👍 |
Also:
New in #7, the Wait() will only block until the lock is stale or released, whichever comes first. No more arbitrary timeout. |
I don't know of a better solution than to simply delegate the locking to someone else who know how to do it better, where possible i.e. by using *Flock(), db transactions, mutexes, etc. |
* Replace TryLock and Wait with Lock, and check for idempotency (issue #5) * Fix logic of lock waiter creation in FileStorage (+ improve client log) * Return from Wait() if lock file becomes stale * Remove racy deletion of empty lock folder * move all (FileStorage) methods to (*FileStorage) so assignments to fields like fileStorageNameLocks aren't lost * rework lock acquisition * Create lockDir just before lock file creation to reduce the chance that another process calls Unlock() and removes lockDir while we were waiting, preventing us from creating the lock file. * Use the same strategy that Wait() uses to avoid depending on internal state. * fix unlock of unlocked mutex * Move fileStorageNameLocksMu into FileStorage struct * implement new lockfile removal strategy and simplify the lock acquisition loop. * readme: Add link to full examples * Rework file lock obtaining and waiting logic * Remove not-useful optimization to simplify file-locking logic
Woohoo! Thanks for your critiques! At some point, can we get your test program added to the repo as a test case? |
I'll see if I can convert it tomorrow or Friday. |
It's probably too late to change the API now, so this is more of a question about why the API is designed the way it is than anything.
What would you like to have changed?
Remove
Locker
from theStorage
interface.Why is this feature a useful, necessary, and/or important addition to this project?
TryLock
is hard to implement correctly and the implementation doesn't seem to make use of it, because it just waits anyway.What alternatives are there, or what are you doing in the meantime to work around the lack of this feature?
The ideal interface would be
Storage
on its own withoutLocker
, then all methods would be required to be safe for concurrent use.Apart from being more idiomatic, it makes things a lot simpler because the actual storage implementation has a better idea about how to do locking than CertMagic.
The main motivation was to implement a
Storage
in SQLite (or other databases) but I was blocked by the requirement thatTryLock
being non-blocking.Instead of implementing the equivalent of
mu.Lock()
, it seemed I'd have to introduce a whole bunch of state that will probably be buggy for unknown reasons. EvenFileStorage.TryLock
makes me nervous because at a quick glance, it doesn't seem to handle cleanup so if the process crashes, everything is probably borked.The text was updated successfully, but these errors were encountered: