Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipath fixes #1664

Merged
merged 4 commits into from Feb 2, 2022
Merged

multipath fixes #1664

merged 4 commits into from Feb 2, 2022

Conversation

mwilck
Copy link
Contributor

@mwilck mwilck commented Dec 7, 2021

Changes

Fix some issues in multipathd.service, and eliminate the differences
wrt the unit file shipped with multipath-tools upstream. The ultimate
goal is to be able drop the service unit file from dracut and use the
one from upstream. I have sent a patch series to the dm-devel mailing
list today ("[PATCH 0/4] multipathd: service unit fixes").

Also, despite the recent changes in upstream multipath-tools,
we still get warnings about systemd-udev-settle.service, this time
for multipathd-configure.service. It would be nice to get rid of them,
too. This patch set avoids installing that unit file if mpathconf isn't
available.

@bmarzins, you're invited to push a fix for multipathd-configure.service
to this branch. I expect that simply dropping the dependencies on
systemd-udevd-settle.service will suffice. I won't attempt to make
these fixes myself, as I'm not familiar enough with mpathconf-based
multipath setup.

Checklist

  • I have tested it locally
  • I have reviewed and updated any documentation if relevant
  • I am providing new code and test(s) for it

The dependency of multipathd on "udev settle" has recently been removed in
c9689b6 ("multipathd: Remove dependency on systemd-udev-settle.service").

But this dependency has never been necessary in the initramfs environment. It
was only required after switching root, because multipathd would potentially
tear down valid multipath maps after switching from initrd to root FS. This can
happen because dm devices "survive" the root FS switch in the udev data
base (they have the "db_persist" flag set), whereas their component devices
(SCSI etc) do not. But this can only happen after initrd-udevadm-cleanup-db.service
has been run, which happens after initrd processing.

The only dependency that's really needed is that on
systemd-udevd-kernel.socket, because multipathd depends on uevents for
devices being delivered via systemd-udevd.
In the long run, it's desirable to be able to drop dracut's copy of
multipathd.service and use the upstream one from multipath-tools instead.
This patch makes a step in that direction.

With these changes, the only remaining difference is the support for
rd.multipath=0 and rd_NO_MULTIPATH, which must obviously be ignored in the
upstream unit.

The modifications in this patch are minor and will have no effect in the
initramfs.
@github-actions github-actions bot added modules Issue tracker for all modules multipath Issues related to the multipath module labels Dec 7, 2021
@mwilck
Copy link
Contributor Author

mwilck commented Dec 7, 2021

@bmarzins should be added as a reviewer here.

@mwilck
Copy link
Contributor Author

mwilck commented Dec 7, 2021

fixed shfmt issues.

@johannbg
Copy link
Collaborator

johannbg commented Dec 7, 2021

Given that you are dropping the cmd section I have to ask are there some upstream changes in places so there is no longer a requirement to load the modules before udev is started?

@mwilck
Copy link
Contributor Author

mwilck commented Dec 8, 2021

Given that you are dropping the cmd section I have to ask are there some upstream changes in places so there is no longer a requirement to load the modules before udev is started?

Hold on, I made a mistake. dm_multipath must still be loaded (strange that I saw no problems in testing). But for the device handlers, the kernel has an autoload mechanism.

@johannbg
Copy link
Collaborator

johannbg commented Dec 8, 2021

Given that you are dropping the cmd section I have to ask are there some upstream changes in places so there is no longer a requirement to load the modules before udev is started?

Hold on, I made a mistake. dm_multipath must still be loaded (strange that I saw no problems in testing). But for the device handlers, the kernel has an autoload mechanism.

You will need to revisit those changes again since it's failing the shell check.
If your intent is to remove the device handler kernel modules from the for loop ( which will re-introduce I/O errors being printed to the system log, which in turn will flood RH and Suse's issue trackers with complaints about exactly that, which I'm pretty sure no one wants ) you might just as well drop the for loop being used in the process since having a for loop for a single kernel module serves no purpose.

My suggestion is simply leave the cmd section there as it is...

@bmarzins
Copy link
Contributor

@bmarzins, you're invited to push a fix for multipathd-configure.service to this branch. I expect that simply dropping the dependencies on systemd-udevd-settle.service will suffice. I won't attempt to make these fixes myself, as I'm not familiar enough with mpathconf-based multipath setup.

I'm pretty sure that bmarzins/dracut@edaa7e8 is correct, but I'd like to defer to @darkmuggle, since he wrote the original unit file, and I've had issues trying to tinker around with multipathed Fedora CoreOS.

@mwilck mwilck force-pushed the multipath-fixes branch 2 times, most recently from 443ba8c to bb77408 Compare December 13, 2021 23:23
@mwilck
Copy link
Contributor Author

mwilck commented Dec 13, 2021

You will need to revisit those changes again since it's failing the shell check.

Done.

If your intent is to remove the device handler kernel modules from the for loop ( which will re-introduce I/O errors being printed to the system log, which in turn will flood RH and Suse's issue trackers with complaints about exactly that, which I'm pretty sure no one wants )

I'm curious where you take this insight from. The code for preloading the handlers was added with 856f826, 7 years ago. I've read throught the bug referenced in the commit; it was actually not about IO errors flooding the logs, but about SCSI device handlers missing. The distro that proposed the change (SUSE) was running kernel 3.12.28. At that time, the autoload mechanism of the kernel for SCSI device handlers was incomplete, and apparently didn't work reliably. In kernel 4.3 (2015), the kernel's autoload logic for SCSI device handlers has been rewritten; it attaches handlers as soon as SCSI devices are detected, and uses a built-in table to select the right handler. This works nicely, and avoids loading modules that aren't necessary. (Update 2022-01-24: this assertion was false).

My suggestion is simply leave the cmd section there as it is...

Fair enough, thank you. I think it's time to remove some of the accumulated historic cruft here. As explained above, the problem for which this code was a workaround has been solved for good, 6 years ago.

@bmarzins, @hreinecke: please speak up if you see issues that I've overlooked.

Update 2022-01-24: I actually overlooked something, see below.

(FTR: as far as the kernel is concerned, we could rely on autoload for dm-multipath as well. But multipath and multipathd don't deal with it well, because they want to check the dm-multipath version. So for the time being, we must keep force-loading dm-multipath).

@johannbg
Copy link
Collaborator

My suggestion is simply leave the cmd section there as it is...

Fair enough, thank you. I think it's time to remove some of the accumulated historic cruft here. As explained above, the problem for which this code was a workaround has been solved for good, 6 years ago.

Then there is no reason to keep it around anymore

@johannbg
Copy link
Collaborator

@mwilck given that I fail to see the practicality of this I have to ask what are you hoping to gain from moving the
ConditionKernelCommandLine=!rd.multipath=0 ConditionKernelCommandLine=!rd_NO_MULTIPATH
section in the multipathd.service to an snippet for that same service which then adds the section back to the unit you just removed it from?

@mwilck
Copy link
Contributor Author

mwilck commented Dec 15, 2021

@mwilck given that I fail to see the practicality of this I have to ask what are you hoping to gain from moving the ConditionKernelCommandLine=!rd.multipath=0 ConditionKernelCommandLine=!rd_NO_MULTIPATH section in the multipathd.service to an snippet for that same service which then adds the section back to the unit you just removed it from?

The idea is to ship exactly the same unit file that multipath-tools itself is shipping, with the long-term goal to be able to drop the file from dracut and use the upstream unit. That would make it fare easier to apply changes in the future, and release the burden of maintaining it from the dracut mainainers ;-)

I have just pushed the respective multipath-tools changes to the patch queue for the forthcoming multipath-tools upstream release. With that change and this PR, the multipathd.service files would be identical. The multipath-tools shipped file can't use ConditionKernelCommandLine=!rd.multipath=0, because rd.FOO parameters are meant to be respected only in the initramfs.

@johannbg
Copy link
Collaborator

johannbg commented Dec 16, 2021

@mwilck it does not change anything using the upstream unit if we then have to ship+maintain an override snippet for that unit. It literally solves nothing from maintainance perspective ( since we still have to maintain a file ) , if anything it complicates it.

Anyways when that happens ( the main unit gets dropped ) it's probably best that we turn the existing multpath module into a meta module ( which we needed to do anyways ) and introduce two new modules, one systemd based and another one for legacy/alternative init systems.

@mwilck
Copy link
Contributor Author

mwilck commented Dec 16, 2021

@johannbg, I don't follow. The rd. stuff is dracut-specific and thus must be shipped with dracut. The snippet is self-contained. The only case in which it would cease working is when we renamed the unit file, in which case all else would fail as well. Your maintenance effort for this snippet would be basically zero. With the snippet in place and the main unit file removed, dracut could simply install the upstream unit file and be sure systemd does the right thing.

I don't know what a "meta module" is. I observed that there are very few modules for which dracut ships its own unit files (AFAICS the only one besides 90multipath is 06rngd). 90multipath is an exception, and I'd like to get rid of that exception status. I expected you'd like the prospect, too.

Even if it makes no difference to you, I'd appreciate if you could accept the fact that it does make a difference for the multipath-tools maintainers.

If the snippet really isn't the way to go, would you rather accept a generator that would remove /etc/systemd/system/sysinit.target.wants/multipathd.service in the initramfs if rd.multipath=0 was set? Or do you have other suggestions?

@mwilck
Copy link
Contributor Author

mwilck commented Dec 21, 2021

So how can we get on the same boat here?

@johannbg
Copy link
Collaborator

johannbg commented Dec 21, 2021

@mwilck generators are always the best course of action for upstreams in general since they will always work on systemd based system regardless of the initramfs generator being used, be it dracut or something else so that's a solution that upstreams should be working towards.

Expecting that systemd does the right thing well that is not always the case, earlier this year they convinced the dbus-broker maintainers that dbus did not belong in the initramfs, upstream dbus-broker made changes based on that assumption and that broke any dbus depent module ( like blutetooth, nm etc ) in the initramfs and upstream dbus-broker reverted that once that misunderstanding got cleared up.

And just recently systemd upstream added an "ConditionNeedsUpdate=" to it's systemd-sysuser upstream unit file, that broke systemd-resolved in the initramfs since it's upstream unit file for systemd-resolved depend on systemd-sysuser and we are currently overriding that questionable decision so fourth and so on.

Which brings us to the second/third best option is to ship our own or include existing upstream unit file(s) via githook ( which we would then ship ) that pulls that from upstream repository when a new dracut is released as opposed to include them from the installed system due to downstream being on different release cycles, different releases of core/baseOS components like shipping newer dracut with old multipath release etc. in otherwords we and other upstreams cant rely on downstream shipping the correct release, files etc.

The reason why the above is not currently being done is lack of time to implement it ( the hooks and or shipping the units ) something that I'm planning on doing with #1483

Which brings us to the meta modules.

We have to start splitting modules into two modules, systemd based and none systemd based ones since upstreams are increasingly becoming systemd only or are support multiple init systems via configuration knob and the best way to deal with that is to split the modules into two ( systemd based and non-systemd based ) and use a meta module ( like 09dbus ) to pick which one of those two modules to use.

If multipath upstream has decided to be systemd only then we can just drop the "legacy" code in the existing module but if it is not systemd only then we need to split the module into two other modules ( since we want to achive clean seperation ) and turn the existing module into a meta module that picks either of those two new modules depending on the init system in use.

Least options for modules are snippets and shell shebangs and the inclution of downstream (read as distro shipped ) units.

@mwilck
Copy link
Contributor Author

mwilck commented Dec 21, 2021

@mwilck generators are always the best course of action for upstreams in general since they will always work on systemd based system regardless of the initramfs generator being used, be it dracut or something else so that's a solution that upstreams should be working towards.

But here, the generator would be dracut-specific. We (as upstream multipath) aren't concerned with rd.multipath=0. I'm not sure if other initramfs generators support this option, and I don't see it as my responsibility to find out.

It's impossible to model this logic ("don't start this service if if multipath=off, or if rd.multipath=0 and we are running in the initramfs") with multiple Condition statements in a single systemd unit file. That's why I still favor the drop-in.

The generator would be possible but far less elegant. The generator couldn't (as usual) create files under /run because that would not only affect the initramfs but also the booted system, which is not intended. So it would need to delete files under /etc/systemd in the initramfs. I don't like that much and it might even fail if the initramfs is mounted read-only (not sure if that's possible).

Expecting that systemd does the right thing well […]

Interesting information. I'm sorry to hear about the troubles you had. But I'm not sure why it would matter here. My point is that if the multipathd.service file worked, the drop-in would do what it's supposed to (cause it to be skipped if rd.multipath=0 was set).

Which brings us to the second/third best option is to ship our own or include existing upstream unit file(s) via githook ( which we would then ship ) […]

Fine with me, tell me what we as upstream have to do to make this possible. You just have to be aware that you'd need to patch your file with the code to support rd.multipath=0.

Which brings us to the meta modules.

If multipath upstream has decided to be systemd only then we can just drop the "legacy" code in the existing module but if it is not systemd only then we need to split the module into two other modules ( since we want to achive clean seperation ) and turn the existing module into a meta module that picks either of those two new modules depending on the init system in use.

So we haven't "decided" anything explicitly. But due to the fact that the two de-facto maintainers of multipath-tools work for distributions that use systemd exclusively, little attention has been paid to non-systemd setups during the last 5+ years. TBH I have no idea how well this works, and I wouldn't be suprised if it doesn't. Multipath setup during boot is notoriously hard to get right, and we've spent quite a bit of work to make it work reliably — with systemd and udev. multipath-tools upstream ships neither a SysV init script nor configuration files for upstart or the like. There have been no complaints about that for years.

OTOH, while I'm uncertain if the old scripts still work, they might as well do, I just don't know, so ditching them doesn't seem right, either.

Least options for modules are snippets and shell shebangs and the inclution of downstream (read as distro shipped ) units.

Ok, I thought you wanted that. If you don't, fine with me. In this case we can just leave the initramfs-specific lines in the file for now. Once you implement the git-hook, we could start the discussion over.

@johannbg
Copy link
Collaborator

@mwilck unfortunetly @haraldh did not comment why disable multipath detection was needed in the first places when he added it in 2014, a need which might be obsoleted at this point since upstream does not need it ( and perhaps never needed it ).

Maybe we can simply drop the ability to disable mpath and be without that mess altogether @haraldh ping @bmarzins do you have any clue why this was needed in the first place ( I suspect Anaconda )?

@mwilck
Copy link
Contributor Author

mwilck commented Dec 21, 2021

@johannbg, disabling multipath may well be useful. But disabling it in the initramfs and not in the booted root is highly dangerous. I see no use case for disabling multipath only in initramfs that couldn't also be handled by disabling multipath globally.

These dependencies are redundant and will be dropped in upstream
multipath-tools, too.
@mwilck
Copy link
Contributor Author

mwilck commented Dec 21, 2021

I've re-pushed without patch moving the dracut-specific directives into the drop-in, hoping that the set is acceptable in this form. This means that the upstream service file and dracut's still differ by these two lines.

If you want the drop-in back, notify me.

@mwilck
Copy link
Contributor Author

mwilck commented Dec 21, 2021

Fixed another issue found by @aafeijoo-suse in openSUSE#118 (if multipathd-configure.service isn't installed it shouldn't be enabled, either).

@hreinecke
Copy link
Contributor

You will need to revisit those changes again since it's failing the shell check.

Done.

If your intent is to remove the device handler kernel modules from the for loop ( which will re-introduce I/O errors being printed to the system log, which in turn will flood RH and Suse's issue trackers with complaints about exactly that, which I'm pretty sure no one wants )

I'm curious where you take this insight from. The code for preloading the handlers was added with 856f826, 7 years ago. I've read throught the bug referenced in the commit; it was actually not about IO errors flooding the logs, but about SCSI device handlers missing. The distro that proposed the change (SUSE) was running kernel 3.12.28. At that time, the autoload mechanism of the kernel for SCSI device handlers was incomplete, and apparently didn't work reliably. In kernel 4.3 (2015), the kernel's autoload logic for SCSI device handlers has been rewritten; it attaches handlers as soon as SCSI devices are detected, and uses a built-in table to select the right handler. This works nicely, and avoids loading modules that aren't necessary.

It works nicely, but out of necessity only after the device has been probed (as the probing mechanism needs the contents of the inquiry data). Which means that during probing no handlers are attached, and we will be seeing I/O errors eg on standby paths.

@mwilck
Copy link
Contributor Author

mwilck commented Dec 22, 2021

It works nicely, but out of necessity only after the device has been probed (as the probing mechanism needs the contents of the inquiry data). Which means that during probing no handlers are attached, and we will be seeing I/O errors eg on standby paths.

I've contacted @hreinecke directly to clarify this, as I am unaware of this sort if issue with recent kernels and multipath-tools versions. Of course I could be overlooking something. @bmarzins, what's your take?

@mwilck
Copy link
Contributor Author

mwilck commented Jan 19, 2022

I've contacted @hreinecke directly to clarify this, as I am unaware of this sort if issue with recent kernels and multipath-tools versions. Of course I could be overlooking something. @bmarzins, what's your take?

After extensive discussions with @hreinecke and testing in various environments, @hreinecke agreed that this patch set (in particular, not early-loading SCSI device handler modules any more) is correct.

@hreinecke, please confirm if you have time.

done
if grep -m 1 -q dm_multipath /proc/modules; then
printf 'rd.driver.pre=%s ' dm_multipath
fi
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder is this is required at all; I'd have thought that multipathd should be able to load 'dm-multipath' on its own. But until then I'm fine with this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - we've got an ExecStartPre=-/sbin/modprobe dm-multipath in the unit file, which should be sufficient.

We can't start multipathd without loading dm-multipath beforehand, because multipathd makes libdm calls to figure out the version of the multipath target, which will fail if the module isn't loaded. While implenting that functionality in multipathd itself would be simple, it's still easier to keep the ExecStartPre line.

@johannbg, if it's ok for you I'll change that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modprobe should never be called from a systemd unit so that ExecStartPre= should be removed not kept or used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never heard this before. Can you provide any backing /reasoning for this statement? What's wrong with calling modprobe in ExecStartPre?

Copy link
Collaborator

@johannbg johannbg Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting I thought that was common knowledge by now since it's been like a decade ago since I went throught this whole dialog when handling sysv to systemd migration ( along with the whole remove modules etc ) anyways modules should be auto-loaded using MODALIAS and or similar constructs in the kernel modules + udev or if people really needed to load a module statically, they should do so via /etc/modules-load.d/ but loading module should never be done in type unit files in systemd.

There should never be any need to load a module statically these days unless for debugging/testing purposes, if there is any need beside that something is broken.

Copy link
Contributor

@hreinecke hreinecke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Not all distributions utilize and ship the mpathconf utilitiy.
Avoid error messages and systemd complaints about
multipathd-configure.service in this case.
auto-merge was automatically disabled January 24, 2022 07:28

Head branch was pushed to by a user without write access

@mwilck
Copy link
Contributor Author

mwilck commented Jan 24, 2022

TL;DR: We still need to force-load the SCSI device handler modules.

Thank you for not having merged this yet. I just pushed a new version of this PR without the broken commit 1ba61906 ("fix(multipath): don't force-load SCSI device handlers").

The auto-detection for SCSI device handlers, which I talked about further up in this conversation, has been reverted before the release of kernel 4.3, with kernel commit d6a32b98099f ("scsi_dh: don't try to load a device handler during async probing"). Unfortunately I'd overlooked that, despite staring at the code intensively. I apologize to everyone, in particular @johannbg, for dismissing their arguments.

If we rely on autoloading in a multipath setup, the handlers are not actually loaded during SCSI probing, but shortly after, when multipathd is started. Consequently,

  1. without multipath, no SCSI device handlers will be loaded;
  2. with multipath, the handlers are effective as usual, except that the sysfs attribute /sys/class/block/sd$X/device/access_state is invisible, because visibility is determined before the handler is loaded — which is fatal e.g. for using the multipathd sysfs prioritizer. Apparently there is no way to make these attributes visible after the fact; not even deleting and re-scanning a device changes the situation.

I believe 2.) is a kernel bug for which I'll create a patch, but for now we need to keep the existing workaround to load the device handlers early.

Note that the SCSI device handlers are not only useful for multipath scenarios. They help detecting and handling certain error conditions correctly (and thus missing them might actually cause the above-mentioned I/O errors). Therefore it would make sense to always load these drivers (at least scsi_dh_alua, which covers basically all modern hardware) before SCSI probing, not only if multipath is active.

@johannbg, wrt your comment above:

modules should be auto-loaded using MODALIAS and or similar constructs in the kernel modules + udev or if people really needed to load a module statically, they should do so via /etc/modules-load.d/ but loading module should never be done in type unit files in systemd.

That's true. Unfortunately there is no suitable modalias for this functionality. modules-load.d works, and multipathd is actually using it, but only in the "real rootfs". As far as the dracut multipath module is concerned, rd.driver.pre= has been used, which is basically equivalent. Would modules-load.d be preferred from your PoV? Wrt the unit files, I am actually not sure why we do this. It was probably meant as additional means to make really sure the device handlers are loaded — while they're nice to have in general, they are crucial if multipath is enabled.

There should never be any need to load a module statically these days unless for debugging/testing purposes, if there is any need beside that something is broken.

What's broken here is that the kernel's autoload mechanism fails with asynchronous SCSI probing (see the commit message for kernel commit 1ba61906).

The correct solution (given that the device handlers are generally userful for SCSI, but more and more systems don't have SCSI hardware any more) would be something like this in modprobe.d:

softdep scsi_mod post: scsi_dh_alua scsi_dh_rdac scsi_dh_emc

But for the time being, I'm just reverting my broken commit.

mwilck added a commit to mwilck/dracut that referenced this pull request Jan 24, 2022
This reverts commit 466d45b.

This commit was broken. It is still necessary to load the SCSI
device handlers early. While there may be better solutions in the long
run, just revert this commit for now.

See dracutdevs#1664 (comment)
for an in-depth explanation.
@johannbg
Copy link
Collaborator

johannbg commented Feb 2, 2022

Yes I prefer modules-load.d snippet but I´ll merge this as is and you can make a seperated PR for that when you find the time.

@johannbg johannbg merged commit 4318533 into dracutdevs:master Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modules Issue tracker for all modules multipath Issues related to the multipath module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants