Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mv in different filesystems #6280

Merged
merged 11 commits into from Aug 8, 2022

Conversation

thufschmitt
Copy link
Member

Hopefully fixes #6262 (but I couldn’t manage to reproduce it, so I’m not sure).

This moves things around a bit (well, d4fc2b5 mostly), but the gist of the PR is that calls to rename are replaced by a new moveFile function which tries the renaming and fallbacks to copying if the filesystems differ (as suggested in containers/podman#13507 (comment) ).

I’d love to add some hydra tests for this (or more generally ensuring that the docker image works fine both in docker and podman), but that’ll take a bit more time, so I’ll leave this for later.

@thufschmitt
Copy link
Member Author

Oh right, always that OSX filesystem link issue. I’ll try to remember how I fixed it last time 😒

@edolstra
Copy link
Member

Note that we shouldn't just replace rename() with a "move-or-copy", since there is a general assumption that renames are atomic. E.g. profile upgrades are only atomic because rename() is.

I think generally we should just avoid moving across filesystems. Where are we doing that?

@thufschmitt
Copy link
Member Author

I think generally we should just avoid moving across filesystems. Where are we doing that?

The root issue is #6262 in which Nix fails on podman because of some silly overlayfs behavior (https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html?highlight=overlayfs#renaming-directories). More generally, I think this kind of behavior might happen in unexpected places so that’s probably a good idea to have it to prevent Nix from crashing in such a case

Note that we shouldn't just replace rename() with a "move-or-copy", since there is a general assumption that renames are atomic.

Yes you’re right, the fix should be more subtle − both in terms of only using the function in the right places and having a more atomic behavior (maybe copy to a temp location that’s guaranteed to be on the same fs, and then move from it or something like that).

@Ericson2314
Copy link
Member

I suppose we can a renameNonatomic that alone has this workaround? Agreeing with @thufschmitt that the fix could be a bit more clever, though it doesn't give us full atomicity. Using a name like this signals to the reader of the code what can go wrong, and caller-site comments can explain the "we don't need atomicity here, and we would like to not break with overlayfs here" reasoning.

I think the "bad news in the name (renameNonatomic), good news in the comment (overlayfs story)" is nicely conservative, by giving the bad parts a bit more emphasis than the silver lining. That conservatism in what is emphasized to future readers of the code makes me feel much better about this new complexity.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-27/18433/1

@thufschmitt thufschmitt force-pushed the fix-mv-in-different-filesystems branch from 00a6dab to b97a323 Compare April 13, 2022 12:55
@MagicRB
Copy link
Contributor

MagicRB commented May 11, 2022

I may have encountered this, I'm not sure but I'm getting

error: linking '/nix/store/pmv5v9q4x2655qms3cavq73l91g6lhkb-init.drv.chroot/nix/store/07y6c51dam7zliaznnnzvxsc6i7jws0q-2' to '/nix/store/07y6c51dam7zliaznnnzvxsc6i7jws0q-2': Invalid cross-device link

/nix/store is zfs and i assume that the chroot is tmpfs or something

interestingly it works on my laptop but not server, the Nix version is the same and both have /nix and / as zfs

@thufschmitt
Copy link
Member Author

@MagicRB That seems to be slightly different (your error is about linking, not moving), but strinkingly similar. What’s the exact context in which you encountered this? (in particular: which kind of command did you rur? and was it in a docker/podman container like in the original issue? )

/nix/store is zfs and i assume that the chroot is tmpfs or something

The chroot should just be a subdir of /nix/store (its root is created here, and the /nix/store below it a few lines below).

It looks like the failure comes from here (which is called while building the content of the sandbox). The code already falls back to copying for some error codes, but not for EXDEV.

@MagicRB
Copy link
Contributor

MagicRB commented May 12, 2022

Nope this is on NixOS. I can get you the exact command with the rev and everything later. But it was nix build git+https://gitea.redalder.org/RedAdler/systems#nixngSystems.hydra.config.system.build.toplevel i think

EDIT: I dont remember the rev, but i think that i used the latest so it should be 7011cf1eb3396fabd68fe48e1b47a9bac4d4f2d0

@thufschmitt thufschmitt force-pushed the fix-mv-in-different-filesystems branch from b97a323 to f43517d Compare May 24, 2022 11:50
@thufschmitt thufschmitt requested a review from edolstra May 24, 2022 11:52
@thufschmitt
Copy link
Member Author

@edolstra I've fixed (I think) the OSX failure. Does that look good?

@thufschmitt thufschmitt force-pushed the fix-mv-in-different-filesystems branch 2 times, most recently from e075f9c to 0b28d7d Compare May 24, 2022 12:40
@thufschmitt thufschmitt marked this pull request as ready for review June 1, 2022 14:45
@@ -282,15 +282,6 @@ AC_CHECK_FUNCS([setresuid setreuid lchown])
AC_CHECK_FUNCS([strsignal posix_fallocate sysconf])


# This is needed if bzip2 is a static library, and the Nix libraries
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edolstra I've brutally commented that out because it was breaking the libc++fs link (dunno why, but I ended-up with a duplicate symbol error). Do you think it's still needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, I'm fine with removing it, we'll see what breaks :-)

@edolstra
Copy link
Member

TBH I'm reluctant to start using std::filesystem, because it puts another layer between us and POSIX, while we often need to know exactly what's going on. E.g. how does fs::copy() deal with ACLs? Does fs::rename() have the same atomicity guarantees? And it introduces another path type with its own semantics...

@thufschmitt
Copy link
Member Author

TBH I'm reluctant to start using std::filesystem, because it puts another layer between us and POSIX, while we often need to know exactly what's going on

That's a good point (I was actually tripped over that while implementing this PR because although std::filesystem::copy can recursively copy directories, it will at least update the mtime in the copied version, so I had to roll ou my own implementation).
That being said, I think that despite its quirks, it's nicer than directly hitting the underlying C APIs for two reasons:

  1. We need some kind of layer. Having all these inlined c_str() and error code checks is both painful to read and error-prone, and if we must have something on top of that, then I'd rather trust the one from the stl rather than a custom one

  2. More importantly, it's a layer that doesn't just sit on top of POSIX, but also on top of OS-specific APIs (be it Unix, Windows, Linux or OSX), and it gives us a unified interface for all of these (through which we can break when needed like I did in this PR), which is very valuable – esp. if we consider ever having a windows support, which I think would be incredibly great

I'm not married to it though (and it wouldn't fundamentally change this PR), so if you're strongly against it I can rewrite it to use the good ol' Unix APIs (although that would be a non-trivial amount of work, so I'd rather not if possible 🙃 )

@thufschmitt
Copy link
Member Author

@edolstra friendly ping on

I'm not married to it though (and it wouldn't fundamentally change this PR), so if you're strongly against it I can rewrite it to use the good ol' Unix APIs (although that would be a non-trivial amount of work, so I'd rather not if possible upside_down_face )

Any final word on the matter? :)

@thufschmitt thufschmitt reopened this Jun 29, 2022
@edolstra edolstra added this to the nix-2.10 milestone Jul 1, 2022
@edolstra edolstra modified the milestones: nix-2.10, nix-2.11 Jul 11, 2022
thufschmitt and others added 5 commits August 3, 2022 10:27
Directly takes some c++ strings, and gently throws an exception on error
(rather than having to inline this logic everywhere)
In `nix::rename`, if the call to `rename` fails with `EXDEV` (failure
because the source and the destination are in a different filesystems)
switch to copying and removing the source.

To avoid having to re-implement the copy manually, I switched the
function to use the c++17 `filesystem` library (which has a `copy`
function that should do what we want).

Fix NixOS#6262
The recursive copy from the stl doesn’t exactly do what we need because
1. It doesn’t delete things as we go
2. It doesn’t keep the mtime, which change the nars

So re-implement it ourselves. A bit dull, but that way we have what we want
Required by the old clang version
`move` tends to have this `mv` connotation of “I will copy it for you if
needs be”
In most places the fallback to copying isn’t needed and can actually be
bad, so we’d rather not transparently fallback
Rather than directly copying the source to its dest, copy it first to a
temporary location, and eventually move that temporary.
That way, the move is at least atomic from the point-of-view of the destination
That flag breaks `-lc++fs` (introducing a duplicate symbol for some
reason). Besides, it was apparently needed for bzip2, but we're not using bzip2
anymore.
@thufschmitt thufschmitt force-pushed the fix-mv-in-different-filesystems branch from 61d4c5c to d1cda07 Compare August 3, 2022 08:33
Doesn't seem needed on a recent-enough clang anymore (and even seems to
break stuff)
@thufschmitt
Copy link
Member Author

@edolstra I'm merging this since the filesystem use is small and well-contained.
If it's too much of an issue, feel free to revert or ask me to rewrite the filesystem parts

@thufschmitt thufschmitt merged commit 73fde9e into NixOS:master Aug 8, 2022
@thufschmitt thufschmitt deleted the fix-mv-in-different-filesystems branch August 8, 2022 14:48
Minion3665 pushed a commit to Minion3665/nix that referenced this pull request Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nix fails due to rename returning EXDEV in podman (non-root overlayfs)
5 participants