New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
acquiring/releasing lock: Resource deadlock avoided #2207
Comments
FWIW, the No clue about why this happens, though-- maybe a cycle in the builders/targets? :/. |
Had this too, running the same nix-build in two different shells. Maybe the order in which the dependencies are build is not deterministic? (It is just a wild guess, but that could explain a deadlock, if a lock is acquired for every store path to be built.) |
Just hit me too, I was running multiple integration tests in parallel and some of them were using nix-shell with a specific package. One of the test failed with:
and the next attempt on a fresh VM worked. |
@edolstra I've seen this too at various random times. Any idea what might cause it? From eyeballing the man page, it seems like there's a slight chance we're checking for the wrong return value. The code that @dtzWill points out checks In a few places I've seen some suggestion that the |
|
More fun facts from the manpage: "The deadlock-detection algorithm employed by the kernel when dealing with F_SETLKW requests can yield ... false positives (EDEADLK errors when there is no deadlock). ... In addition, the kernel may falsely indicate a deadlock when two or more processes created using the clone(2) CLONE_FILES flag place locks that appear (to the kernel) to conflict." Note that threads are created using CLONE_FILES. BTW edolstra@58d1980 gets rid of POSIX file locks. It might fix this problem as a side-effect. |
Nice, so it looks like what we really need is somebody to try write a reproducer (perhaps a script that uses |
For what it's worth, we've been able to hit this message pretty reliably in our builds with 32 concurrent build agents attempting to use nix. Our installation is multi-user but all the processes are trying to write to a common /nix directory which must use some locking mechanism. |
@joshenders can you test with @edolstra patch? |
http://0pointer.de/blog/projects/locking.html explains how posix locks are not even thread safe, so I'm not entirely convinced that edolstra@effa4be prevents much. |
I don't follow, since the patch gets rid of POSIX locks. |
What I mean is, I'm not entirely convinced that we need a schema bump - I'd really like to backport this to the maintenance branch. I'll try to come up with a way to reproduce this easily and we can try different Nix versions without nix-daemon. |
@edolstra any objections cherry-picking edolstra@58d1980 to master? |
Yes, we can't cherry-pick it because it's a schema change (it also requires edolstra@effa4be). |
Also, there is no evidence that edolstra@58d1980 actually fixes this issue. |
I can reliably reproduce this issue in our environment and so I might be able to test 58d1980. I’ve worked around it temporarily by preventing processes from calling nix concurrently. |
Just an update: planning on testing @edolstra's diff early next week. Should I be able to cherry pick this commit cleanly on the 2.2.1 tag? |
@edolstra A checkout of 2.2.1 with effa4be and 58d1980 cherry-picked from your repo isn't building cleanly. A build of 2.2.1 without effa4be and 58d1980 is building and testing cleanly. Are there other dependent commits I'm missing? Should I be building directly from your repo? I'm invoking the build scripts with: |
@edolstra @joshenders I also experience problems similar to what is reported here, and I also tried 58d1980, on top of the
The compilation part goes through, but the |
POSIX file locks are essentially incompatible with multithreading. BSD locks have much saner semantics. We need this now that there can be multiple concurrent LocalStore::buildPaths() invocations.
Will close this. Please reopen if anybody sees this issue on Nix >= 2.3. |
Nix 2.0.2 as invoked via
nixops
.After upgrading the host machine running nixops to 18.03, I got for the first time, and seem to nondeterministically get a failure for some stuff I'm building:
The problem goes away after running the build for a couple times, because eventually the build succeeds and then it's in the nix store.
acquiring/releasing lock: Resource deadlock avoided
is apparently a string Google has never seen.I'm not quite sure what component is emitting it;
Resource deadlock avoided
seems to be some system error message.The text was updated successfully, but these errors were encountered: