Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Failed to mount sysroot on reboot for nodes with a 'large' disk #2485
Container Linux Version
When rebooting a node with a "large" disk it should be able to mount sysroot.
A node with a "large" disk fails to mount sysroot when it's rebooted:
We first observed this issue when a machine with a 3.91TB failed to update from 1745.7.0 to 1800.4.0. Version 1745.7.0 was still able to mount the filesystem while 1800.4.0 gave the error described above.
It looks like some regression was introduced in kernel 4.14.55 with the ext4 changes https://lwn.net/Articles/759535/ and (from what we could gather) this may even be the patch: https://patchwork.ozlabs.org/patch/950668/
All of our machines with smaller disks (<500GB) still boot and reboot correctly.
We are affected by this too. Our platform is baremetal with 4TB disk.
fwiw, I did a fresh installation after doing
I tried with 1800.5.0 too, but same issue persists.
I've cherry-picked the upcoming ext4 fixes (including the commit you linked) onto the current stable and produced a test image here: http://builds.developer.core-os.net/boards/amd64-usr/1800.5.0%2Bjenkins2-build-1800%2Blocal-1683/coreos_production_image.bin.bz2
Can you confirm that resolves the issue?
Even with the test image, I'm still able to reproduce this issue.
I took both a 1800.5.0 image (https://stable.release.core-os.net/amd64-usr/1800.5.0/coreos_production_image.bin.bz2) and the test image of @dm0- above and ran through the following scenario.
qemu-img convert -p -O qcow2 coreos_production_image.bin coreos_production_image.qcow2 qemu-img resize coreos_production_image.qcow2 4T
Did a couple of more tests with different disk sizes (1TB -> 2TB -> 3TB). The 1TB and 2TB cases worked fine.
The 3TB drive case failed. I also noticed that it displayed an additional log line during the resizing:
Just to make sure, was this commit included in the test image? torvalds/linux@44de022
torvalds/linux@44de022 is not currently in the stable queue for kernels older than 4.17 because of a trivial patch conflict (see e.g. 4.14). I've reproduced the issue on 1800.5.0, and confirmed that the combination of torvalds/linux@44de022 and the other ext4 changes queued for 4.14 fixes the problem.