Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With dracut 100, nvme kernel drivers are not included in hostonly mode #130

Closed
dalto8 opened this issue Apr 6, 2024 · 14 comments · Fixed by #133
Closed

With dracut 100, nvme kernel drivers are not included in hostonly mode #130

dalto8 opened this issue Apr 6, 2024 · 14 comments · Fixed by #133
Labels

Comments

@dalto8
Copy link
Contributor

dalto8 commented Apr 6, 2024

Describe the bug
When building the initrd with dracut --hostonly --no-hostonly-cmdline nvme kernel drivers are not included even though there are nvme devices installed and used in the system. Downgrading to 059 includes these drivers.

Distribution used
Arch Linux

Dracut version
100

Init system
systemd

Please let me know if I can provide any further information.

@dalto8 dalto8 added the bug Our bugs label Apr 6, 2024
@LaszloGombos
Copy link
Contributor

LaszloGombos commented Apr 6, 2024

@dalto8 would have been great to try v60 but I see it is no longer available as the upgrade is from 059 --> 100.

@dalto8 can you please share the log (dracut -v ...)

You can also just temporary add the drivers and confirm that the workaround works - https://wiki.archlinux.org/title/Dracut

CC @freswa

@dalto8
Copy link
Contributor Author

dalto8 commented Apr 6, 2024

Here is the log with dracut -v https://pastebin.mozilla.org/vbT0oBGO

I suspect it will work if I add the driver manually since it is added when it isn't in hostonly mode but I will test it.

If needed, I can build and test v60.

EDIT: Using add_drivers did include the nvme drivers.

@LaszloGombos
Copy link
Contributor

LaszloGombos commented Apr 6, 2024

I see no change around the nvme driver (file changed but not the line) - https://github.com/dracut-ng/dracut-ng/blob/main/modules.d/90kernel-modules/module-setup.sh#L30

Can you confirm that kernel-modules dracut module itself is included ?

@dalto8
Copy link
Contributor Author

dalto8 commented Apr 6, 2024

This is in the logs:

dracut[I]: *** Including module: kernel-modules ***

Is there something else I should check?

@dalto8
Copy link
Contributor Author

dalto8 commented Apr 7, 2024

I see no change around the nvme driver (file changed but not the line) - https://github.com/dracut-ng/dracut-ng/blob/main/modules.d/90kernel-modules/module-setup.sh#L30

It isn't even reaching that line. It isn't entering this if block: https://github.com/dracut-ng/dracut-ng/blob/main/modules.d/90kernel-modules/module-setup.sh#L107

@dalto8
Copy link
Contributor Author

dalto8 commented Apr 7, 2024

I built the Arch packages for 60-1 and 60-2.(The latter removes some patches).

Both those versions include the nvme driver. Only when upgrading to 100 does it stop being included.

@freswa
Copy link

freswa commented Apr 7, 2024

@dalto8 Could you post your dracut config and the command you run dracut with please?

@LaszloGombos
Copy link
Contributor

Perhaps caused by this #36 ?

@dalto8
Copy link
Contributor Author

dalto8 commented Apr 7, 2024

@dalto8 Could you post your dracut config and the command you run dracut with please?

The command is dracut --hostonly --no-hostonly-cmdline initrd 6.8.2-arch2-1

The config file is unchanged but I do have a drop in that has this in it.

omit_dracutmodules+=" network cifs nfs nbd brltty i18n "
#hostonly_mode=strict
compress="zstd"

I commented out hostonly_mode=strict as my first troubleshooting step.

Perhaps caused by this #36 ?

Let me revert that and test.

@dalto8
Copy link
Contributor Author

dalto8 commented Apr 7, 2024

Perhaps caused by this #36 ?

Yes. That seems be it. Specifically, the change to line 1670.

If I revert that change, it works again.

@aafeijoo-suse
Copy link
Contributor

Yes. That seems be it. Specifically, the change to line 1670.

If find_block_device returns with an error, the following code is borked. E.g., when /etc is mounted as overlay, it's adding the empty string '' as host device, breaking functions that are expecting a real device, like get_maj_min.

So, as a first action it would be great to check what's the output of dracut --debug --hostonly --no-hostonly-cmdline initrd 6.8.2-arch2-1 with the change added again.

@dalto8
Copy link
Contributor Author

dalto8 commented Apr 11, 2024

If find_block_device returns with an error, the following code is borked. E.g., when /etc is mounted as overlay, it's adding the empty string '' as host device, breaking functions that are expecting a real device, like get_maj_min.

I suspect it may be hard to identify all the cases that could cause this condition to occur but what about scenarios in which / is not mounted directly on a block device? In this case, the underlying device still needs to be added to the list.

The reason it fails in my case is probably that the code has cases to handle finding the underlying block devices for zfs datasets and btrfs filesystems. However, if the mountpoint is mounted on something other than block device, that continue causes those blocks of code to be skipped and the underlying devices are never identified.

Instead of continue there, it should probably only skip the sections of code related to the $_dev it finds instead of the rest of the loop in it's entirety.

@aafeijoo-suse
Copy link
Contributor

A debug log would help us to understand what is happening in your system, to add a proper fix. As I said, the revert also breaks another systems. You do not need to break you system, you can just modify the line, create dracut --debug --hostonly --no-hostonly-cmdline test.img 2>&1 &> dracut.log, and attach the dracut.log file.

@dalto8
Copy link
Contributor Author

dalto8 commented Apr 11, 2024

Yes, I already reviewed the logs and analyzed the relevant code to determine the issue. That is where my description above came from.

My mountpoints are not mounted on block devices. However, the prior fix skipped the entire remainder of the loop, not just adding that specific device.

I created #155 that works for my situation and hopefully resolves the issue you were facing as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants