Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

98dracut-systemd: don't wait for root device if remote cryptsetup active #931

Merged
merged 2 commits into from Oct 12, 2020

Conversation

jlebon
Copy link
Contributor

@jlebon jlebon commented Sep 23, 2020

This is a plain and simple hack around dependency issues between dracut
and systemd.

When using Tang-pinned LUKS root devices, we want to rely on
systemd-cryptsetup@.service to unlock it. However, that service only
runs After=remote-fs-pre.target, while dracut-initqueue.service has
Before=remote-fs-pre.target (which makes sense because we don't want
to attempt networked root devices before networking is up).

However, the rootfs-generator here wants to make sure that the root
device exists before exiting the initqueue via an initqueue/finished
"devexists" hook. This will never work though because by design
systemd-cryptsetup@.service, which unlocks the root device, won't run
until after we exit.

So we have a dependency cycle:

initqueue -> devexists hook -> root device ->
    systemd-cryptsetup@.service -> remote-fs-pre.target -> initqueue

There's no clean way to break this. The root issue is that there's no
way right now to split sequencing of systemd services across the
initqueue/online and initqueue/finished events because it's all bundled
in a single service. (The deeper root issue of course is that we have
two init systems. :) ).

Here we do a tactical fix: if there's a systemd-cryptsetup@.service
instance, let's assume it's for the root device and skip waiting for it
to show up if it depends on remote-fs-pre.target.

@centos-ci
Copy link
Collaborator

Can one of the admins verify this patch?

@jlebon
Copy link
Contributor Author

jlebon commented Sep 23, 2020

Consider this an RFC; I'm likely missing some backstory here. I looked a bit at the git logs for both this file and systemd's fstab-generator but didn't find anything illuminating re. why we need both. I could go for a less invasive patch, though judging from the lack of recent activity on this file, it seems like it's not actually relied upon anymore nowadays? (And if so, in what use cases?)

@github-actions github-actions bot added dracut-systemd Issues related to the dracut-systemd module modules Issue tracker for all modules labels Sep 23, 2020
@jlebon
Copy link
Contributor Author

jlebon commented Sep 23, 2020

/cc @sergio-correia

@jlebon
Copy link
Contributor Author

jlebon commented Sep 28, 2020

/cc @poettering who likely has better context on this.

@haraldh
Copy link
Collaborator

haraldh commented Oct 5, 2020

Hmm, the rootfs-generator.sh was introduced to:

  1. set the device timeouts
  2. let the device be wanted by the initrd.target
  3. if there is no root= on the kernel command line generate a sysroot.mount according to /etc/cmdline.d/*.conf
  4. order the fsck after dracut-pre-mount.service

@jlebon
Copy link
Contributor Author

jlebon commented Oct 6, 2020

Hmm, the rootfs-generator.sh was introduced to:
1. set the device timeouts
2. let the device be wanted by the initrd.target
3. if there is no root= on the kernel command line generate a sysroot.mount according to /etc/cmdline.d/*.conf
4. order the fsck after dracut-pre-mount.service

Ahh OK, the root= in /etc/cmdline/*.conf case makes sense.

The case I'm interested in however is the one where root= is indeed in /proc/cmdline. In that case, systemd-fstab-generator will take care of generating the mount unit, getting it into the transaction, and generating an fsck run. I think the only thing that the rootfs-generator provides in this case is simply ensuring that the root device is available when we finish the initqueue. Which... I think we can get the same effect by adding After=initrd-root-device.target in the next service in line, i.e. dracut-pre-mount.service.

IOW, WDYT of this patch:

diff --git a/modules.d/98dracut-systemd/dracut-pre-mount.service b/modules.d/98dracut-systemd/dracut-pre-mount.service
index 18c9730c..6a0cdb0b 100644
--- a/modules.d/98dracut-systemd/dracut-pre-mount.service
+++ b/modules.d/98dracut-systemd/dracut-pre-mount.service
@@ -7,7 +7,7 @@ Description=dracut pre-mount hook
 Documentation=man:dracut-pre-mount.service(8)
 DefaultDependencies=no
 Before=initrd-root-fs.target sysroot.mount systemd-fsck-root.service
-After=dracut-initqueue.service cryptsetup.target
+After=dracut-initqueue.service cryptsetup.target initrd-root-device.target
 ConditionPathExists=/usr/lib/initrd-release
 ConditionDirectoryNotEmpty=|/lib/dracut/hooks/pre-mount
 ConditionKernelCommandLine=|rd.break=pre-mount
diff --git a/modules.d/98dracut-systemd/rootfs-generator.sh b/modules.d/98dracut-systemd/rootfs-generator.sh
index 4ae693bb..42f56fe1 100755
--- a/modules.d/98dracut-systemd/rootfs-generator.sh
+++ b/modules.d/98dracut-systemd/rootfs-generator.sh
@@ -111,10 +111,11 @@ esac

 GENERATOR_DIR="$1"

-if [ "$rootok" = "1"  ]; then
+# Only handle root= if it's in /etc/cmdline.d. Otherwise, systemd handles it.
+if [ "$rootok" = "1"  ] && ! strstr "$(cat /proc/cmdline)" 'root='; then
    generator_wait_for_dev "${root#block:}" "$RDRETRY"
    generator_fsck_after_pre_mount "${root#block:}"
-   strstr "$(cat /proc/cmdline)" 'root=' || generator_mount_rootfs "${root#block:}" "$(getarg rootfstype=)" "$(getarg rootflags=)"
+   generator_mount_rootfs "${root#block:}" "$(getarg rootfstype=)" "$(getarg rootflags=)"
 fi

 exit 0

?

That way, we make sure systemd and dracut don't step on each other's toes. :)

@haraldh
Copy link
Collaborator

haraldh commented Oct 7, 2020

Looks like a plan. Let's do it and see what fails :-/

@jlebon jlebon changed the title 98dracut-systemd: delete rootfs-generator.sh 98dracut-systemd: reduce scope of rootfs-generator Oct 8, 2020
@jlebon
Copy link
Contributor Author

jlebon commented Oct 8, 2020

OK updated! Sanity-checked also that this fixes my use case.

@haraldh
Copy link
Collaborator

haraldh commented Oct 8, 2020

As you see in the test suite, there are a lot of setups, which are unhappy about the change. As mentioned, the systemd device timeout has to be corrected, so that fallback jobs (like assembling a degraded raid) can be executed.

@jlebon jlebon force-pushed the pr/tang branch 2 times, most recently from 77c9d8a to 43c6f08 Compare October 9, 2020 22:19
This is a plain and simple hack around dependency issues between dracut
and systemd.

When using Tang-pinned LUKS root devices, we want to rely on
`systemd-cryptsetup@.service` to unlock it. However, that service only
runs `After=remote-fs-pre.target`, while `dracut-initqueue.service` has
`Before=remote-fs-pre.target` (which makes sense because we don't want
to attempt networked root devices before networking is up).

However, the rootfs-generator here wants to make sure that the root
device exists *before* exiting the initqueue via an initqueue/finished
"devexists" hook. This will never work though because by design
`systemd-cryptsetup@.service`, which unlocks the root device, won't run
until after we exit.

So we have a dependency cycle:

    initqueue -> devexists hook -> root device ->
        systemd-cryptsetup@.service -> remote-fs-pre.target -> initqueue

There's no clean way to break this. The root issue is that there's no
way right now to split sequencing of systemd services across the
initqueue/online and initqueue/finished events because it's all bundled
in a single service. (The deeper root issue of course is that we have
two init systems. :) ).

Here we do a tactical fix: if there's a `systemd-cryptsetup@.service`
instance, let's assume it's for the root device and skip waiting for it
to show up if it depends on `remote-fs-pre.target`.
@jlebon jlebon changed the title 98dracut-systemd: reduce scope of rootfs-generator 98dracut-systemd: don't wait for root device if remote cryptsetup active Oct 10, 2020
@jlebon
Copy link
Contributor Author

jlebon commented Oct 10, 2020

OK, switched strategy on this now for a more tactical fix. And went a bit more in depth in the commit message.

@haraldh haraldh merged commit 512c51d into dracutdevs:master Oct 12, 2020
@haraldh
Copy link
Collaborator

haraldh commented Oct 12, 2020

@jlebon Thanks! Let's give it a try.

jlebon added a commit to jlebon/coreos-assembler that referenced this pull request Nov 17, 2020
- Drop the TPM test since it's already covered by the external host test
  which are inherited from FCOS
- Move to new way of defining LUKS devices
- Dedupe a bunch of things

The test is currently limited to RHCOS but once we have the requisite
patches in dracut and systemd in FCOS, we should be able to target it
too:

dracutdevs/dracut#931
dracutdevs/dracut#968
systemd/systemd#17467
jlebon added a commit to jlebon/coreos-assembler that referenced this pull request Nov 18, 2020
- Drop the TPM test since it's already covered by the external host test
  which are inherited from FCOS
- Move to new way of defining LUKS devices
- Dedupe a bunch of things

The test is currently limited to RHCOS but once we have the requisite
patches in dracut and systemd in FCOS, we should be able to target it
too:

dracutdevs/dracut#931
dracutdevs/dracut#968
systemd/systemd#17467
openshift-merge-robot pushed a commit to coreos/coreos-assembler that referenced this pull request Nov 18, 2020
- Drop the TPM test since it's already covered by the external host test
  which are inherited from FCOS
- Move to new way of defining LUKS devices
- Dedupe a bunch of things

The test is currently limited to RHCOS but once we have the requisite
patches in dracut and systemd in FCOS, we should be able to target it
too:

dracutdevs/dracut#931
dracutdevs/dracut#968
systemd/systemd#17467
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dracut-systemd Issues related to the dracut-systemd module modules Issue tracker for all modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants