-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loss due to non-atomic writes #440
Comments
python-atomicwrites handles this usecase although it might be overkill |
@untitaker I've JUST RIGHT NOW found your project too. What a timing. |
(Or you've just seen my last comment about this in #372, magic's not always magic.) |
I've used atomicwrites for now. It works, but I'd be happy to have real life tests. |
Thanks! I'm not sure how to test this kind of problem, but I'll certainly let you know if there are any other reports of data loss after this update. |
This issue is not solved. It's still possible that deleted events reappear in clients and that created events or changes to existing events disappear. Only broken (partially written) items can't appear. |
To be clear:
|
It is problematic because a "200 OK" could've been returned shortly before the system crashed and reverted the changes. Unfortunately I don't have access to a PC at the moment. On 18 July 2016 00:30:20 CEST, Kenton Varda notifications@github.com wrote:
Sent from my Android device with K-9 Mail. Please excuse my brevity. |
python-atomicwrites 1.1.0 solves this issue. |
|
Oh right. |
storage.py
contains many clauses that update files by overwriting them, like:Unfortunately, if a crash or machine failure occurs between the first line and the second line, the result is that the item will be completely lost, since
open()
erases all existing data and the data isn't replaced until the write.Note that on a typical Linux system, there may be a full 30 seconds between the call to
write()
and the data actually being flushed to disk. Therefore, after this code, there is a 30-second window in which a power failure will cause data loss.This is a real problem for people running Radicale on https://oasis.sandstorm.io: In such large distributed systems, "power failures" -- or events that look a lot like power failures, such as losing the network connection to the storage volume -- are common. We've now had two independent reports of Radicale losing people's calendars on Oasis.
To fix this problem, Radicale should always write data to a temporary file first, and then
os.rename()
the temporary over the original. Technically it should also doos.fsync(f.fileno())
immediately before the rename (although on Linux / ext4 systems this is not strictly necessary).The text was updated successfully, but these errors were encountered: