OverlayFS broken on NixOS Kernel 4.19 and 4.20 #54509

bachp · 2019-01-23T19:36:26Z

Issue description

When using overlayfs with NixOS and Kernel 4.19 it is not possible to overwrite a file that already exists in the lower directory with new content. Instead if attempted the file appears empty. (See how to reproduce for details).

It seems like the revert of an upstream commit in 4.19 #52942 breaks something fundamentally in overlayfs.

EDIT: Reverting #52942 fixes the behavior. But I'm unable to run the tests as the VM kernel pancis :(

This also affects docker if used with one of the overlay drivers.

Steps to reproduce

Via the test in #54508

or

Manually:

mkdir -p /tmp/mnt/upper /tmp/mnt/lower /tmp/mnt/work /tmp/mnt/merged

# Create an existing file in the lower directory
echo 'Existing' > /tmp/mnt/lower/existing.txt

# Mount overlayfs
mount -t overlay overlay -o lowerdir=/tmp/mnt/lower,upperdir=/tmp/mnt/upper,workdir=/tmp/mnt/work /tmp/mnt/merged
 
# Prints "Existing" - OK
cat /tmp/mnt/merged/existing.txt

# Write some new content
"echo 'New' > /tmp/mnt/merged/existing.txt",

# Prints "" (empty) but it should be New - FAIL
cat /tmp/mnt/merged/existing.txt

Technical details

Please run nix-shell -p nix-info --run "nix-info -m" and paste the
results.

 - system: `"x86_64-linux"`
 - host os: `Linux 4.19.16, NixOS, 19.03.git.5d42db2 (Koi)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.2`
 - channels(root): `"nixos-17.03pre95306.a24728f"`
 - channels(pascal): `"nixos-19.03pre167251.20343f0ab47"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

The text was updated successfully, but these errors were encountered:

bachp · 2019-01-23T22:24:54Z

/cc @aszlig and @samueldr as you probably know the most about this.

samueldr · 2019-01-23T23:38:27Z

Hmm, this is a blocker for 19.03, can't have either side of the current situation while keeping the latest LTS kernel.

Thanks for the (wip) test, I bet it'll be useful to check for the same kind of regression.

cc @lheckemann to keep in mind.

@aszlig was the issue we were working around reported upstream? I guess it hasn't been fixed since the patch (a partial revert) still applies.

bachp · 2019-01-26T21:11:15Z

I'm trying to debug the issue but I'm confused right now.

I tried to run the tests without the patches in #52942 but I then get the error:

switch_root: can't execute '/nix/store/i9rxcghqk9glylabsdl5wmnwv7w853wj-nixos-system-machine-19.03.git.4fb8bc8/init': Operation not permitted

I'm not sure this is actually the issue #52942 is working around?

I was first suspecting it has something to do with the interaction between 9p and overlayfs but then I made a curious observation that when I run a VM produced via:

nix-build nixos/ -A vm -I nixpkgs=/root/nixpkgs

It runs without issue using the same overlayfs over 9p setup.

Any ideas what is different between running a test VM and my second directly built VM?

samueldr · 2019-01-26T21:50:35Z

IIRC, yes, Operation not permitted on switching root is the issue the patch is used to work around.

bachp · 2019-01-26T22:11:11Z

Ok but I still don't understand why it works in the non test case VM.

samueldr · 2019-01-27T03:35:10Z

Ok but I still don't understand why it works in the non test case VM.

It does for me, from your commit, reverted the patch. (Also added a workaround for the gdk_pixbuf mismatch between 18.09 and unstable #54278.)

QEMU_KERNEL_PARAMS='console=ttyS0' result/bin/run-DUFFMAN-vm
View → serial0

<<< NixOS Stage 1 >>>

loading module virtio_balloon...
loading module virtio_console...
loading module virtio_rng...
loading module dm_mod...
running udev...
kbd_mode: KDSKBMODE: Inappropriate ioctl for device
starting device mapper and LVM...
checking /dev/vda...
fsck (busybox 1.29.3)
[fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/vda
/dev/vda: recovering journal
/dev/vda: clean, 11/32768 files, 6353/131072 blocks
mounting /dev/vda on /...
mounting store on /nix/.ro-store...
mounting tmpfs on /nix/.rw-store...
mounting shared on /tmp/shared...
mounting xchg on /tmp/xchg...
mounting overlay filesystem on /nix/store...
switch_root: can't execute '/nix/store/9h3bsjav4cxlb6xn7a4zjwvb1r71sc3m-nixos-system-DUFFMAN-19.03.git.b2669a2/init': Operation not permitted
[    1.170907] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[    1.170907] 
[    1.172524] CPU: 0 PID: 1 Comm: switch_root Not tainted 4.19.17 #1-NixOS
[    1.173684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[    1.175602] Call Trace:
[    1.176171]  dump_stack+0x5c/0x7b
[    1.176901]  panic+0xe4/0x242
[    1.177541]  do_exit+0xaed/0xaf0
[    1.178209]  ? handle_mm_fault+0xdc/0x210
[    1.179004]  do_group_exit+0x3a/0xa0
[    1.180014]  __x64_sys_exit_group+0x14/0x20
[    1.180832]  do_syscall_64+0x4e/0x100
[    1.181589]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    1.182502] RIP: 0033:0x7f1357beb646
[    1.183222] Code: Bad RIP value.
[    1.183887] RSP: 002b:00007ffdda70f9d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[    1.185284] RAX: ffffffffffffffda RBX: 00007f1357ed45a0 RCX: 00007f1357beb646
[    1.186477] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
[    1.187727] RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffff80
[    1.188917] R10: 0000000000000073 R11: 0000000000000246 R12: 00007f1357ed45a0
[    1.190161] R13: 0000000000000001 R14: 00007f1357edd488 R15: 0000000000000000
[    1.191539] Kernel Offset: 0x4200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    1.193362] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[    1.193362]  ]---

boot.kernelPackages = pkgs.linuxPackages_4_19; in my configuration.nix, maybe you're using an older LTS pinned by its version number?

peti · 2019-01-29T16:15:03Z

For what it's worth, I can confirm the issue. Pretty much all my docker containers are broken after the recent channel update to 19.03pre166987.bc41317e243 a.k.a. nixos-unstable.

samueldr · 2019-01-29T16:27:12Z

Hmm, looks like we're between a rock and a hard place, something changed in the kernel breaking our tests (not sure if it's userland that broke) and the revert to make the tests pass breaks overlayfs.

The simple solution, and what probably will happen for 19.03 unless something is figured out, is to revert back to the previous LTS as default, but this is not an ideal solution.

ElvishJerricco · 2019-01-29T21:50:13Z

Should we revert #52942? Maybe there's a better way to fix the test failures.

aszlig · 2019-01-29T21:53:51Z

The report has been posted to unionfs-linux and should show up in mailinglist archives soon, I'll post a link as soon as that's the case.

peti · 2019-01-31T11:43:48Z

Should we revert #52942?

Yes, please. As much as it sucks, a situation where docker is completely broken on NixOS is not really tenable.

bachp · 2019-01-31T12:01:01Z

I think reverting #52942 and got back to 4.14 as the default would be the most reasonable thing to do until the issue is resolved.

peti · 2019-02-02T12:24:04Z

Whatever we do, we should do it soon, because the current situation is really bad and IMHO we shouldn't have kept master in that broken state as long as we did already. Waiting another couple of days before we do something doesn't feel like a good idea.

ElvishJerricco · 2019-02-02T14:02:11Z

I agree with the suggestion to revert #52942 and go back to 4.14. Then we can see about alternative solutions to the test failures with 4.19.

aszlig · 2019-02-02T17:14:09Z

Hm, it seems my mail (Message-ID: <20190129194114.GA7785@dnyarri>) never reached the list for some reason, even though the MTA has accepted it:

DD42F980DD95A: to=<linux-unionfs@vger.kernel.org>, relay=vger.kernel.org[209.132.180.67]:25, delay=687, delays=621/0.01/0.63/65, dsn=2.7.0, status=sent (250 2.7.0 nothing apparently wrong in the message. BF:<H 0>; S1728330AbfA2Tww)

(This was on Jan 29 20:52:53 CET)

@szmi: Is the linux-unionfs mailing list moderated?

edit: Just sent it again without attachments, let's hope this was the culprit.

This reverts commit de86af4. (Manual revert due to conflicts.) See NixOS#54509 The patch is causing overlayfs to misbehave.

This reverts commit b861ebb. The current issues (See NixOS#54509 and NixOS#48828) are causing headaches to users of the unstable branches.

aszlig · 2019-02-02T17:53:53Z

Okay, worked without attachments: https://www.spinics.net/lists/linux-unionfs/msg06627.html

NeQuissimus · 2019-02-06T19:55:35Z

There are a bunch of overlay fixes in 4.20.7, can you try the latest kernels?

samueldr · 2019-02-06T21:37:42Z

Same fixes with 4.19 (which is desired), looking this evening.

samueldr · 2019-02-07T02:12:50Z

machine# Probing EDD (edd=off to disable)... ok
machine# c[    0.000000] Linux version 4.19.20 (nixbld@localhost) (gcc version 7.4.0 (GCC)) #1-NixOS SMP Wed Feb 6 16:30:16 UTC 2019
machine# [    0.000000] Command line: loglevel=7 console=ttyS0 panic=1 boot.panic_on_fail init=/nix/store/xsimkh58w0k6jmn6jfxf5arsk9xkdhk0-nixos-system-machine
-19.03.git.47aad6e/init regInfo=/nix/store/a6vn3r6600p5z216iwbjh81vxkhbz3cw-closure-info/registration console=ttyS0
...
machine# mounting overlay filesystem on /nix/store...
machine# switch_root: can't execute '/nix/store/xsimkh58w0k6jmn6jfxf5arsk9xkdhk0-nixos-system-machine-19.03.git.47aad6e/init': Operation not permitted
machine# [    1.508713] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
machine# [    1.508713]
machine# [    1.510084] CPU: 0 PID: 1 Comm: switch_root Not tainted 4.19.20 #1-NixOS
machine# [    1.511062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
machine# [    1.512691] Call Trace:
machine# [    1.513078]  dump_stack+0x5c/0x7b
machine# [    1.513568]  panic+0xe4/0x242
machine# [    1.514020]  do_exit+0xb52/0xb60
machine# [    1.514499]  ? handle_mm_fault+0xdc/0x210
machine# [    1.515097]  do_group_exit+0x3a/0xa0
machine# [    1.515622]  __x64_sys_exit_group+0x14/0x20
machine# [    1.516248]  do_syscall_64+0x4e/0x100
machine# [    1.516782]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
machine# [    1.517526] RIP: 0033:0x7f5827cd9646
machine# [    1.518048] Code: Bad RIP value.
machine# [    1.518549] RSP: 002b:00007ffc3718aa98 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
machine# [    1.519685] RAX: ffffffffffffffda RBX: 00007f5827fc25a0 RCX: 00007f5827cd9646
machine# [    1.520726] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
machine# [    1.521771] RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffff80
machine# [    1.522807] R10: 0000000000000073 R11: 0000000000000246 R12: 00007f5827fc25a0
machine# [    1.523845] R13: 0000000000000001 R14: 00007f5827fcb488 R15: 0000000000000000
machine# [    1.525021] Kernel Offset: 0xbe00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
machine# [    1.526585] Rebooting in 1 seconds..

:(

NeQuissimus · 2019-02-07T13:43:47Z

WTF

samueldr · 2019-02-07T14:53:04Z

In case you didn't realise, this is the same initial issue which caused us to add a patch reverting part of the changes to make the tests pass. This is the subject of the thread. So it looks like there were other changes in passing, unrelated to our problem.

bachp · 2019-02-07T18:16:37Z

Is somebody able to create the setup as it is in during the startup of the nixos test VM in a simplified version. One that would allow to have access to the shell right befor the switch_root get's executed?

tbenst · 2019-02-24T22:04:35Z

For a new nixos-install of 18.09 I’m seeing something similar, with the main difference that except after switch_root it says “No such file or directory”...is this the same issue?

samueldr · 2019-02-24T22:39:29Z

@tbenst this particular issue shouldn't affect 18.09, and what you are experiencing shouldn't be this issue considering (1) the faulty revert patch was reverted (2) the current issue affects the testing infra. (To be fair, your comment could be about the testing infra, it's unspecified if it's your system boot or a VM.)

Though, I'm not downplaying whatever issue you're facing! In fact, let's try and figure out what's going on. Ideally, open an issue with the following information (and mention me and this issue in the body).

Do previous generations boot fine?
This was after a fresh update of 18.09, right?
Which kernel is your system configured to use? (nix eval --raw '(import <nixos/nixos> {}).options.boot.kernelPackages.value.kernel')
What's the output of nix-info?

tbenst · 2019-02-26T16:01:03Z

Thanks, @samueldr for the very kind comment! I figured out the issue thankfully--as soon as I enabled systemd-boot everything worked.

aszlig · 2019-03-14T12:49:01Z

Just to keep everyone outside of #nixos-dev up to date, we do have a fix for this. Here is the upstream post and patch: https://www.spinics.net/lists/linux-unionfs/msg06733.html

@bachp

In Linux 4.19 there has been a major rework of the overlayfs implementation and it now opens files in lowerdir with O_NOATIME, which in turn caused issues in our VM tests because the process owner of QEMU doesn't match the file owner of the lowerdir. The crux here is that 9p propagates the O_NOATIME flag to the host and the guest kernel has no way of verifying whether that flag will lead to any problems beforehand. There is ongoing work to possibly fix this in the kernel, but it will take a while until there is a working patch and consensus. So in order to bring our default kernel back to 4.19 and of course make it possible to run newer kernels in VM tests, I'm merging a small QEMU patch as an interim solution, which we can drop once we have a working fix in the next round of stable kernels. Now we already had Linux 4.19 set as the default kernel, but that was subsequently reverted in 048c36c because the patch we have used was the revert of the commit I bisected a while ago. This patch broke overlayfs in other ways, so I'm also merging in a VM test by @bachp, which only tests whether overlayfs is working, just to be on the safe side that something like this won't happen in the future. Even though this change could be considered a moderate mass-rebuild at least for GNU/Linux, I'm merging this to master, mainly to give us some time to get it into the current 19.03 release branch (and subsequent testing window) once we got no new breaking builds from Hydra. Cc: @samueldr, @lheckemann Fixes: #54509 Fixes: #48828 Merges: #57641 Merges: #54508

@bachp

In Linux 4.19 there has been a major rework of the overlayfs implementation and it now opens files in lowerdir with O_NOATIME, which in turn caused issues in our VM tests because the process owner of QEMU doesn't match the file owner of the lowerdir. The crux here is that 9p propagates the O_NOATIME flag to the host and the guest kernel has no way of verifying whether that flag will lead to any problems beforehand. There is ongoing work to possibly fix this in the kernel, but it will take a while until there is a working patch and consensus. So in order to bring our default kernel back to 4.19 and of course make it possible to run newer kernels in VM tests, I'm merging a small QEMU patch as an interim solution, which we can drop once we have a working fix in the next round of stable kernels. Now we already had Linux 4.19 set as the default kernel, but that was subsequently reverted in 048c36c because the patch we have used was the revert of the commit I bisected a while ago. This patch broke overlayfs in other ways, so I'm also merging in a VM test by @bachp, which only tests whether overlayfs is working, just to be on the safe side that something like this won't happen in the future. Even though this change could be considered a moderate mass-rebuild at least for GNU/Linux, I'm merging this to master, mainly to give us some time to get it into the current 19.03 release branch (and subsequent testing window) once we got no new breaking builds from Hydra. Cc: @samueldr, @lheckemann Fixes: #54509 Fixes: #48828 Merges: #57641 Merges: #54508 (cherry picked from commit 12efcc2)

QEMU's local 9pfs server passes through O_NOATIME from the client. If the QEMU process doesn't have permissions to use O_NOATIME (namely, it does not own the file nor have the CAP_FOWNER capability), the open will fail. This causes issues when from the client's point of view, it believes it has permissions to use O_NOATIME (e.g., a process running as root in the virtual machine). Additionally, overlayfs on Linux opens files on the lower layer using O_NOATIME, so in this case a 9pfs mount can't be used as a lower layer for overlayfs (cf. https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad9650/vmtest/onoatimehack.c and NixOS/nixpkgs#54509). Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g., network filesystems. open(2) notes that O_NOATIME "may not be effective on all filesystems. One example is NFS, where the server maintains the access time." This means that we can honor it when possible but fall back to ignoring it. Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Message-Id: <e9bee604e8df528584693a4ec474ded6295ce8ad.1587149256.git.osandov@fb.com> Signed-off-by: Greg Kurz <groug@kaod.org>

QEMU's local 9pfs server passes through O_NOATIME from the client. If the QEMU process doesn't have permissions to use O_NOATIME (namely, it does not own the file nor have the CAP_FOWNER capability), the open will fail. This causes issues when from the client's point of view, it believes it has permissions to use O_NOATIME (e.g., a process running as root in the virtual machine). Additionally, overlayfs on Linux opens files on the lower layer using O_NOATIME, so in this case a 9pfs mount can't be used as a lower layer for overlayfs (cf. https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad9650/vmtest/onoatimehack.c and NixOS/nixpkgs#54509). Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g., network filesystems. open(2) notes that O_NOATIME "may not be effective on all filesystems. One example is NFS, where the server maintains the access time." This means that we can honor it when possible but fall back to ignoring it. Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Message-Id: <e9bee604e8df528584693a4ec474ded6295ce8ad.1587149256.git.osandov@fb.com> Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit a5804fc) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

samueldr added this to the 19.03 milestone Jan 23, 2019

samueldr added the 1.severity: blocker label Jan 29, 2019

vcunat mentioned this issue Jan 29, 2019

Linux kernel 4.19 #48828

Closed

peti added the 6.topic: nixos label Jan 31, 2019

samueldr added a commit to samueldr/nixpkgs that referenced this issue Feb 2, 2019

Revert "linuxPackages_4_{19,20}: works around bug with overlayfs."

196af4b

This reverts commit de86af4. (Manual revert due to conflicts.) See NixOS#54509 The patch is causing overlayfs to misbehave.

samueldr added a commit to samueldr/nixpkgs that referenced this issue Feb 2, 2019

Revert "linuxPackages: 4.14 -> 4.19"

048c36c

This reverts commit b861ebb. The current issues (See NixOS#54509 and NixOS#48828) are causing headaches to users of the unstable branches.

samueldr mentioned this issue Feb 2, 2019

Reverts kernel bump + patch #55092

Merged

danielfullmer mentioned this issue Feb 14, 2019

linux_testing_bcachefs,bcachefs-tools: 20190209 #55499

Merged

10 tasks

samueldr mentioned this issue Mar 13, 2019

Zero Hydra Failures: 19.03 edition #56826

Closed

xeji mentioned this issue Mar 13, 2019

linux_latest-libre: fix build #57454

Merged

10 tasks

aszlig mentioned this issue Mar 14, 2019

linuxPackages: 4.14 -> 4.19 #57641

Merged

aszlig closed this as completed in 4c1ddb3 Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OverlayFS broken on NixOS Kernel 4.19 and 4.20 #54509

OverlayFS broken on NixOS Kernel 4.19 and 4.20 #54509

bachp commented Jan 23, 2019 •

edited

Loading

bachp commented Jan 23, 2019

samueldr commented Jan 23, 2019

bachp commented Jan 26, 2019

samueldr commented Jan 26, 2019 •

edited

Loading

bachp commented Jan 26, 2019

samueldr commented Jan 27, 2019 •

edited

Loading

peti commented Jan 29, 2019 •

edited

Loading

samueldr commented Jan 29, 2019

ElvishJerricco commented Jan 29, 2019

aszlig commented Jan 29, 2019

peti commented Jan 31, 2019

bachp commented Jan 31, 2019

peti commented Feb 2, 2019

ElvishJerricco commented Feb 2, 2019

aszlig commented Feb 2, 2019 •

edited

Loading

aszlig commented Feb 2, 2019

NeQuissimus commented Feb 6, 2019

samueldr commented Feb 6, 2019

samueldr commented Feb 7, 2019

NeQuissimus commented Feb 7, 2019

samueldr commented Feb 7, 2019

bachp commented Feb 7, 2019 •

edited

Loading

tbenst commented Feb 24, 2019

samueldr commented Feb 24, 2019 •

edited

Loading

tbenst commented Feb 26, 2019

aszlig commented Mar 14, 2019

OverlayFS broken on NixOS Kernel 4.19 and 4.20 #54509

OverlayFS broken on NixOS Kernel 4.19 and 4.20 #54509

Comments

bachp commented Jan 23, 2019 • edited Loading

Issue description

Steps to reproduce

Technical details

bachp commented Jan 23, 2019

samueldr commented Jan 23, 2019

bachp commented Jan 26, 2019

samueldr commented Jan 26, 2019 • edited Loading

bachp commented Jan 26, 2019

samueldr commented Jan 27, 2019 • edited Loading

peti commented Jan 29, 2019 • edited Loading

samueldr commented Jan 29, 2019

ElvishJerricco commented Jan 29, 2019

aszlig commented Jan 29, 2019

peti commented Jan 31, 2019

bachp commented Jan 31, 2019

peti commented Feb 2, 2019

ElvishJerricco commented Feb 2, 2019

aszlig commented Feb 2, 2019 • edited Loading

aszlig commented Feb 2, 2019

NeQuissimus commented Feb 6, 2019

samueldr commented Feb 6, 2019

samueldr commented Feb 7, 2019

NeQuissimus commented Feb 7, 2019

samueldr commented Feb 7, 2019

bachp commented Feb 7, 2019 • edited Loading

tbenst commented Feb 24, 2019

samueldr commented Feb 24, 2019 • edited Loading

tbenst commented Feb 26, 2019

aszlig commented Mar 14, 2019

bachp commented Jan 23, 2019 •

edited

Loading

samueldr commented Jan 26, 2019 •

edited

Loading

samueldr commented Jan 27, 2019 •

edited

Loading

peti commented Jan 29, 2019 •

edited

Loading

aszlig commented Feb 2, 2019 •

edited

Loading

bachp commented Feb 7, 2019 •

edited

Loading

samueldr commented Feb 24, 2019 •

edited

Loading