Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current raid/mdadm support does not allow enough time for slow devices to come fully online #97

Closed
harmonicoscillator opened this issue Aug 21, 2021 · 16 comments

Comments

@harmonicoscillator
Copy link

I'm using booster 0.6-1 from the archlinux repositories. My root is a 2-disk RAID1 on relatively slow HDDs. I am not able to use booster to boot because I believe that booster is not waiting for the RAID1 to fully finish assembly before attempting to mount. My system does alternatively work with mkinitcpio and the systemd + mdadm_udev hooks.

When attempting to boot, I get an "unable to read superblock" error. If I wait for the mount_timeout to elapse I can manually mount the RAID device, which is already present, without error.

An abbreviated dmesg log is here (I can attach full log if requested). Notice that the "unable to read superblock" error occurs moments before the raid device comes fully online:

[    0.000000] Command line: initrd=\amd-ucode.img initrd=\booster-linux-lts.img root=/dev/md/root rootfstype=ext4 rw
[    0.000000] Kernel command line: initrd=\amd-ucode.img initrd=\booster-linux-lts.img root=/dev/md/root rootfstype=ext4 rw
[    0.130695] smpboot: CPU0: AMD Ryzen 3 2200G with Radeon Vega Graphics (family: 0x17, model: 0x11, stepping: 0x0)
[    6.369652] md_mod: module verification failed: signature and/or required key missing - tainting kernel
[    6.462668] EXT4-fs (md127): unable to read superblock
[    6.462755] booster: mount(/dev/md/root): input/output error
[    6.505324] md/raid1:md127: active with 2 out of 2 mirrors
[    6.543912] md127: detected capacity change from 0 to 999546290176
[  128.412919] EXT4-fs (md127): mounted filesystem with ordered data mode. Opts: (null)

(I'm also very surprised by the tainting kernel line but that seems unrelated)

My /etc/booster.yaml is

universal: false
compression: zstd
mount_timeout: 1m
strip: true
# I removed fsck and fsck.ext4 in the course of debugging
extra_files: busybox
enable_mdraid: true

I'm using systemd-boot and my boot entry is:

title Arch Linux
linux /vmlinuz-linux-lts
initrd /amd-ucode.img
initrd /booster-linux-lts.img
options root=/dev/md/root rootfstype=ext4 rw

My /etc/mdadm.conf is just

DEVICE partitions
ARRAY /dev/md/root metadata=1.2 name=<hostname>:root UUID=<UUID>
@anatol
Copy link
Owner

anatol commented Aug 22, 2021

Thank you for the feedback.

[    6.462668] EXT4-fs (md127): unable to read superblock
[    6.462755] booster: mount(/dev/md/root): input/output error
[    6.505324] md/raid1:md127: active with 2 out of 2 mirrors
[    6.543912] md127: detected capacity change from 0 to 999546290176

It is an interesting behavior. I would expect that mdadm --incremental leaves raid in a working condition. i.e. once mdadm completes and udev event is fired then one can use the block device. But as you pointed out it might not be the case.

Could you please enable booster debugging with booster.debug kernel param and then post your output? It will help to understand the mdadm behavior better.

BTW here are the udev rules from Arch for the same even:

ACTION=="add|change", IMPORT{program}="/usr/bin/mdadm --incremental --export $devnode --offroot $env{DEVLINKS}"
ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*", ENV{MD_FOREIGN}=="no", ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer"

It uses --incremental as well (the same as booster). It also uses --offroot flag that is not part of manpage, interesting.... And then it can call mdadm-last-resort which is labeled as "Timer to wait for more drives before activating degraded array %I.". I will need to read more about it to understand how mdadm handles it.

But other than that I do not see any delays or extra sync functionality. I wonder if booster is just "too fast" and starts mounting the block device right away, while udev has a lot of delays internally and hence we do not see this problem there.

[    6.369652] md_mod: module verification failed: signature and/or required key missing - tainting kernel

As of this warning - could you please file a ticket and we proceed its discussion separately.

@anatol
Copy link
Owner

anatol commented Aug 22, 2021

Also could you please share configuration for your mdraid array so I'll try to reproduce the issue it at my side?

@anatol
Copy link
Owner

anatol commented Aug 22, 2021

[    6.462668] EXT4-fs (md127): unable to read superblock
[    6.462755] booster: mount(/dev/md/root): input/output error
[    6.505324] md/raid1:md127: active with 2 out of 2 mirrors
[    6.543912] md127: detected capacity change from 0 to 999546290176

Returning back to this log. It looks like booster gets a udev event for /dev/md/root and assumes the device is ready. While it is not. Only 38 ms later kernel reports the array is assembled. We need to figure out under what conditions such "early" udev events are reported. Having booster.debug logs might help to shed light on it.

Also locally I added a RAID1 integration test with 5 partitions and it works fine. The array is fully assembled first and then kernel sends a udev event that is handled by booster correctly.

@anatol
Copy link
Owner

anatol commented Aug 27, 2021

@richard-cms had you chance to get the debug logs from booster? It will help to understand and fix the problem you see.

@harmonicoscillator
Copy link
Author

harmonicoscillator commented Aug 27, 2021

Sorry for the delay @anatol I've had some medical issues pop up. I will try to get you the logs with booster.debug set this weekend.

Maybe a dumb question - what is the easiest way for me to redirect the stdout, stderr output when booster.debug is set to a file? IIRC it's not part of dmesg but maybe I'm misremembering.

@anatol
Copy link
Owner

anatol commented Aug 27, 2021

I hope everything is OK with you.

Booster debug is copied to dmesg as well. Unless kmesg throttling is enabled when printk.devkmsg is set.

Here is an example from my dmesg:

[    4.683958] booster: modalias usb:v1A7Cp0191d0100dc00dsc00dp00ic03isc01ip02in00 matched module usbhid
[    4.685232] booster: modalias usb:v1A7Cp0191d0100dc00dsc00dp00ic03isc01ip02in00 matched module usbhid
[    4.686472] booster: modalias usb:v1A7Cp0191d0100dc00dsc00dp00ic03isc01ip02in00 matched module usbhid
[    4.687679] booster: modalias usb:v1A7Cp0191d0100dc00dsc00dp00ic03isc01ip02in00 matched module usbhid
[    4.694955] booster: udev event {Header:add@/devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.4/1-5.4.4/1-5.4.4.4/1-5.4.4.4.1/1-5.4.4.4.1:1.0/0003:1A7C:0191.0006 Action:add Devpath:/devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.4/1-5.4.4/1-5.4.4.4/1-5.4.4.4.1/1-5.4.4.4.1:1.0/0003:1A7C:0191.0006 Subsystem:hid Seqnum:1524 Vars:map[ACTION:add DEVPATH:/devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.4/1-5.4.4/1-5.4.4.4/1-5.4.4.4.1/1-5.4.4.4.1:1.0/0003:1A7C:0191.0006 HID_ID:0003:00001A7C:00000191 HID_NAME:Kingsis Peripherals Evoluent VerticalMouse 4 HID_PHYS:usb-0000:00:14.0-5.4.4.4.1/input0 HID_UNIQ: MODALIAS:hid:b0003g0001v00001A7Cp00000191 SEQNUM:1524 SUBSYSTEM:hid]}
[    4.696144] booster: no match found for alias hid:b0003g0001v00001A7Cp00000191
[    4.697834] usb 1-6.5.1: new high-speed USB device number 13 using xhci_hcd

@harmonicoscillator
Copy link
Author

Okay I have some good news and bad news: I'm no longer able to reproduce the issue, but I've changed my boot process dramatically so I might just be hiding the issue now.

My disk setup is now:
sdb1 and sda2 are in a RAID1, and a new encrypted LUKS2 filesystem is on top of the RAID1. The LUKS2 has 3 keyslots: one passphrase, 1 TPM key from systemd-cryptenroll and lastly a TANG key from clevis pinging my local router, 192.168.1.1.

I am able to boot fully, including automated decryption of the LUKS2 root, using the following /etc/booster.yaml:

network:
  interfaces: enp4s0,e0:d5:5e:69:1d:ce
  dhcp: on
universal: false
compression: zstd
mount_timeout: 1m
strip: true
extra_files: busybox,fsck,fsck.ext4
enable_mdraid: true

I also have enabled secureboot, and the embedded kernel commandline in the signed efistub is now:
rd.luks.name=0fce166f-6e83-4422-ac01-21ba33db9e0e=cryptroot root=/dev/mapper/cryptroot booster.debug rw

My /etc/mdadm.conf remains the same as before:

DEVICE partitions
ARRAY /dev/md/root metadata=1.2 name=ryzen3:root UUID=45934a26:f39b54c4:7da24a3f:62c2c57e

I will attach the full boot log, with booster.debug set, momentarily.

@harmonicoscillator
Copy link
Author

@harmonicoscillator
Copy link
Author

For my own education, I can't actually tell how booster is decrypting my LUKS2 partition according to those logs - I see that it attempts to ping my TANG router but gets a no network message, I never see any network come-up messages, I also don't see any mention of the TPM keys, and I definitely didn't put in my password, so I'm wondering what happened :).

On the direct issue of the RAID1 previously not coming up in time, I wonder if setting the kernel commandline arguments to wait for the cryptdevice might mask that problem by making it explicitly wait for the appearance of that UUID to attempt mounting.

@anatol
Copy link
Owner

anatol commented Aug 27, 2021

I'm no longer able to reproduce the issue

It sounds like a timing issue. But to debug the issue I need the logs from the failed use-case.

Here are your log entries related to mdraid assembly:

[   15.585588] booster: blkinfo for /dev/sdb1: type=mdraid UUID=45934a26-f39b-54c4-7da2-4a3f62c2c57e LABEL=
[   15.645600] booster: trying to assemble mdraid array 45934a26-f39b-54c4-7da2-4a3f62c2c57e
[   15.730231] md/raid1:md127: active with 2 out of 2 mirrors
[   15.750164] booster: udev event {Header:add@/devices/pci0000:00/0000:00:08.1/0000:08:00.4/usb5/5-1/5-1.2/wakeup/wakeup33 Action:add Devpath:/devices/pci0000:00/0000:00:08.1/0000:08:00.4/usb5/5-1/5-1.2/wakeup/wakeup33 Subsystem:wakeup Seqnum:1451 Vars:map[ACTION:add DEVPATH:/devices/pci0000:00/0000:00:08.1/0000:08:00.4/usb5/5-1/5-1.2/wakeup/wakeup33 SEQNUM:1451 SUBSYSTEM:wakeup]}
[   15.764771] md127: detected capacity change from 0 to 999546290176
[   15.888529] booster: udev event {Header:add@/devices/virtual/bdi/9:127 Action:add Devpath:/devices/virtual/bdi/9:127 Subsystem:bdi Seqnum:1453 Vars:map[ACTION:add DEVPATH:/devices/virtual/bdi/9:127 SEQNUM:1453 SUBSYSTEM:bdi]}
[   15.922909] booster: udev event {Header:add@/devices/virtual/block/md127 Action:add Devpath:/devices/virtual/block/md127 Subsystem:block Seqnum:1454 Vars:map[ACTION:add DEVNAME:md127 DEVPATH:/devices/virtual/block/md127 DEVTYPE:disk MAJOR:9 MINOR:127 SEQNUM:1454 SUBSYSTEM:block]}
[   15.934051] booster: found a new device md127
[   15.951651] booster: blkinfo for /dev/md127: type=luks UUID=0fce166f-6e83-4422-ac01-21ba33db9e0e LABEL=
[   15.962775] booster: loading module dm_mod
[   15.973463] booster: loading module trusted
[   15.984123] booster: loading module cbc
[   15.988261] Key type trusted registered
[   15.990311] device-mapper: uevent: version 1.0.3

It looks fine.

Was you able to boot successfully without* debug enabled as well?

1 TPM key from systemd-cryptenroll

BTW you can use TPM with clevis/booster as well.

I see that it attempts to ping my TANG router but gets a no network message

Booster has a retry logic for a tang service availability. The connect() might fail because a network is not available or DHCP is not completed yet. So booster prints the error message, sleeps for a while and then tries to connect again. Eventually it is able to reach a tang service and unlocks the block device.

@harmonicoscillator
Copy link
Author

Was you able to boot successfully without* debug enabled as well?

No! I confirmed a few times that I could not boot unless the booster.debug flag was present.

There is no visible error that shows up now, just the timeout waiting for root. The dmesg log shows that /dev/md127 does eventually get assembled but no actions are taken after it is.

@anatol
Copy link
Owner

anatol commented Aug 28, 2021

No! I confirmed a few times that I could not boot unless the booster.debug flag was present.

It is interesting... It also gives an opportunity to understand what is going on there.

I just pushed a commit to wip branch to print mdadm output to stdout. If you have a chance could you please compile booster from wip branch and run it without booster.debug. There should be extra information at the console. Please capture it and post it here.

@harmonicoscillator
Copy link
Author

I rebuilt booster init from the wip branch and the following additional lines are printed during boot (debug.booster is not set). Boot is not successful, it hangs after these messages. These are transcribed by hand from screen, sorry if there are minor typos.

&{Header:add@/devices/virtual/block/md127 Action:add Devpath:/devices/virtual/block/md127 Subsystem:block Seqnum:1399 Vars:map[ACTION:add DEVNAME:md127 DEVPATH:/devices/virtual/block/md127 DEVTYPE:disk MAJOR:9 MINOR:127 SEQNUM:1299 SUBSYSTEM:block]}

mdadm : /dev/sda2 attached to /dev/md/root, not enough to start safely.
&{Header:change@/devices/virtual/block/md127 Action:change Devpath:/devices/virtual/block/md127 Subsystem:block Seqnum:1399 Vars:map[ACTION:change DEVNAME:md127 DEVPATH:/devices/virtual/block/md127 DEVTYPE:disk MAJOR:9 MINOR:127 SEQNUM:1299 SUBSYSTEM:block]}

mdadm:  /dev/sdb1 attached to /dev/md/root, which has been started.

anatol added a commit that referenced this issue Aug 28, 2021
Some of the dmraid device configurations might exist in non-safe
(non-working) mode. An example is an array is constructed incrementally from
multiple devices. The first device creates an array that is repored via
udev, but this array is not usable, it needs more block devices added to
start serving I/O.

Instead of starting to use an mdarray at "add" udev device we have to
check whether 'mdadm --incremental' actually starts the array. Use 'mdadm
--export' to get status of the array and mount it only when MD_STARTED
property is set (i.e. array is live).

Fixes #97
@anatol
Copy link
Owner

anatol commented Aug 28, 2021

Thank you for the logs. It looks like there is a short period of time when the raid1 array exists but not fully assembled yet. Instead of relying on udev "add" event booster should use mdadm --export and check whether mdadm was able to start the array.

Here is a proposed fix for the problem that has been pushed to wip branch. @richard-cms please give it a try and let me know it fixes your problem (both with debug enabled and disabled).

@harmonicoscillator
Copy link
Author

I have rebuilt based on 3fd9444 (HEAD of wip as of 2021-08-29 EST afternoon) and I now successfully boot both with and without booster.debug set. I rebooted a few times, including 'cold boots', to see if there were any common race conditions and did not run in to any.

Attached is the dmesg log for a successful boot with booster.debug set. Thanks very much @anatol !
2021-08-29-booster_debug_dmesg.txt

@anatol
Copy link
Owner

anatol commented Aug 29, 2021

It is great to hear that it fixed this issue! Thank you very much for discovering and debugging this problem @richard-cms!

I am going to test the code at my machines and if everything is ok i'll push it to master branch soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants