Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File locking and/or atomic writes of attributes (and blocks) #63

Open
clbarnes opened this issue Jun 21, 2018 · 14 comments
Open

File locking and/or atomic writes of attributes (and blocks) #63

clbarnes opened this issue Jun 21, 2018 · 14 comments

Comments

@clbarnes
Copy link
Contributor

I was thinking about how to deal with the fact that when attributes are changed, the file is opened, read, closed, truncated + opened, and then written to. This may lead to a race condition if two processes are trying to write to the same attributes file. Unfortunately, python stdlib support for file locking is pretty terrible, from the looks of things.

I also had a quick grep of the C++ side of things and was wondering if there was any file locking when writing blocks. I'm going to start using z5py with a multiprocessing pipeline soon and it would be convenient if I didn't have to manage block access myself.

@constantinpape
Copy link
Owner

constantinpape commented Jun 21, 2018

For now, there isn't any file locking for blocks, so access from multiple threads or processes must
act on independent chunks.

I had a look into this a while ago for the c++ side. There are two basic locking mechanisms on POSIX systems:

I did not move forward on this is as I am not sure how portable these solutions are (though supporting locking on POSIX first would be ok).
Also note that NFS might pose an issue, though it should work with lockf.
See also, https://stackoverflow.com/questions/22409780/flock-vs-lockf-on-linux.

@constantinpape
Copy link
Owner

lockf only adds 3 lines of code, so it would be easy to give this a try.
I will implement this in a PR after we have merged the relase1.0 PR.

@constantinpape
Copy link
Owner

Regarding the issue with the AttributeManager:
I haven't looked into file locking in python, but another option would be to
use the C++ implementation of reading / writing attributes from python (and then using lockf in C++).

Getting the values from python to C++ might be a bit brittle / inconvenient though.
That's why I reimplemented the functionality in python in the first place.

But iIf the file locking for chunks works out I could give it another look.

@alimanfoo
Copy link

alimanfoo commented Jun 21, 2018 via email

@clbarnes
Copy link
Contributor Author

clbarnes commented Jun 21, 2018

The solutions on stackoverflow for python are generally "use a separate lock file", which sounds kind of like solving a race condition with another race condition. It'd be much more portable though - as we tend to keep N5 files on remote servers, there is the possibility of different OSes trying to access and host them at the same time. Probably best not to go that route if the reference implementation doesn't, though.

Windows seems to have its own file locking semantics, and in rust there's a library which acts as a wrapper around both that and flock. I think there are a few of those in python, in various states of disrepair.

I was just thinking about pushing all the attribute stuff through the C++ layer. The "stupid but it just might work" solution to passing JSON objects between python and C++ would be to just pass the string - it is a serialisation protocol after all ;) The important thing would be to get the C++ layer to hold the file lock for as long as it takes to send the data to python, receive data back from python, and write it out.

Testing these race conditions sounds like it would be a real pain, too.

@constantinpape
Copy link
Owner

@alimanfoo: Thanks for the pointer. Do you happen to know if this works for shared filesystems?

@clbarnes: Yes, just sending the json string would work :).

Let's see if lockf does a proper job for the chunks and then worry about the attributes.

@constantinpape
Copy link
Owner

FIrst attempt at file locks for writing chunks (so far only implemented for n5) in #65 .

I have no idea if this actually works, so we will need some proper test cases.
Even if it does, this implementation has the downsides that it is not portable and that
it only locks access for processes, not for threads.

@alimanfoo
Copy link

I haven't tested fasteners on shared file systems. My guess is it will work on nfsv4, otherwise no idea.

Also FWIW in Zarr for multithreaded parallelism we use thread locks instead of file locks. But the synchronisation mechanism is pluggable, in the sense that you can use any type of synchronisation with any type of storage, or no synchronisation at all if you know you will always align write operations with chunk boundaries.

@clbarnes
Copy link
Contributor Author

I guess we can test whether the locks are respected by creating a block, locking it from python, and then trying to write to it in the normal way.

We could test race conditions by having one big block, spawning X python processes, and have them wait for a multiprocessing.Event before all trying to write to different parts of the block. Then you can check that all of the writes are respected and that no errors were thrown.

I could take a crack at this today.

@clbarnes
Copy link
Contributor Author

clbarnes commented Jun 22, 2018

Here's my branch with the tests; I'm not sure if this testing strategy will work so I've based it on master for now and will rebase on the locking implementation if they (correctly) fail.

https://github.com/clbarnes/z5/blob/race-tests/src/python/test/test_locking.py

This depends on the assumptions that

  1. Exclusive lockf is not shared to between child or sibling processes
  2. Errors are handled reasonably by processes spawned by multiprocessing
  3. Everything has the same semantics for non-blocking locks
  4. Travis environments are multi-threaded

@clbarnes
Copy link
Contributor Author

Hm, tests are hanging a lot, always a threat with multiprocessing.

@clbarnes
Copy link
Contributor Author

Got the failures I wanted! I'll just check these refactors still work and then will rebase and PR.

@clbarnes clbarnes changed the title File locking and/or atomic writes of attributes (and blocked) File locking and/or atomic writes of attributes (and blocks) Nov 27, 2018
@constantinpape
Copy link
Owner

Putting this here for reference:
http://0pointer.de/blog/projects/locking.html

@clbarnes
Copy link
Contributor Author

It sounds like the only reliable method would be lock files, which feels pretty gross because you'd need to poll the file system, which can be pretty slow over a network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants