Support for reconfiguring the root storage #94

ajeddeloh · 2018-12-12T21:20:06Z

FCOS will have a static, server-side generated initramfs. This means the initramfs needs to know how to find the root filesystem to mount it, including if that means starting RAID devices. However, we do NOT want to start other raid devices (unless Ignition is starting them as part of the creation process) since users might be writing out configuration to things like /etc/mdadm.conf that they want to use.

On CL we require that RAIDed devices be partitions with the a special GPT partition type guid and have a udev rule to start only those.

For FCOS I can see two options:

Do something similar to what CL did. This has the limitation of being unable to use whole disks for raid and requires partition tables where they would not otherwise be needed.
Have some sort of file that gets written out to /boot which is a "rootmap". This would describe the "path to root" including what services like mdadm need to be started. Ignition could generate this if we include a way of flagging partitions/RAIDs as "containing root" or it could be user written. If it is user written we should unmount root after Ignition files/umount and then let whatever tooling reads it mount the root itself to ensure the first boot is not "special" in any way.

The text was updated successfully, but these errors were encountered:

dustymabe · 2019-05-08T18:48:54Z

We discussed this briefly in the meeting today - There are two things I'd like (personal opinion) to do as part of this:

develop a list of cases we'd like to support
- this will allow us to prove out theories about solutions here
prefer not building something new
- of course if we need to we need to, but would like to have strong justification for doing so

Andrew also mentioned that he'd like the solution to be generic enough that if we decide to support new use cases in the future our current solution doesn't prohibit us from doing so. If we build something new, this is reasonable.

cgwalters · 2019-07-29T14:33:35Z

Lots of discussion around partitioning in #18

Let's repurpose this issue as a tracker for support for reconfiguring the root filesystem in any form; RAID is a subset of that.

Bigger picture, this gets a lot more complex with the combination of FCOS:(ostree, ignition) as opposed to ContainerLinux:(dual partition /usr, ignition), because with CL replacing the rootfs is nearly trivial, but for ostree it is entangled with the OS (which adds flexibility, but does make this problem more complex).

There are two sub-paths to this problem: One where we're not destroying the existing rootfs (e.g. we're just adding a new partition that we want to make into the rootfs), and one where we're intending to replace it.

I think if we're trying to replace the rootfs...we get into an interesting problem domain because we need to pull the ostree repo into RAM temporarily. In most cases...hmm, that should be fine really. But, if we want to optimize things, I could imagine that coreos-installer accepts Ignition, and if a replacement of the root filesystem is requested, we do that at install time as opposed to boot time. We'd then pass the remaining Ignition as /boot/ignition.ign.

And we need support for running ostree in the initramfs - shouldn't be too hard.

The flow would be something like:

ignition-disks.service
ostree-redeploy-rootfs.service
ignition-files.service

ajeddeloh · 2019-07-29T18:43:54Z

Hadn't considered pulling it into RAM; I like that idea. I was thinking about just pulling over network, but RAM has some obvious speed/bandwidth cost advantages. There's some tricky bits:

Unless we always want to pull the ostree into ram, we need to detect whether we're going to blow away root.
We can't do it if ostree size > available ram, where available ram will be slightly smaller than apparent because we need to hold it in ram while ignition disks runs. We could optionally fall back to using network in this case.
Before ignition disks runs, we don't have a rendered config which means we can't know if we're going to blow away root. This could be solved by splitting config rendering into it's own stage
ignition-disks starts before all devices are ready. We can't look at device fields in the config since those could be using those symlinks.

I could imagine that coreos-installer accepts Ignition, and if a replacement of the root filesystem is requested, we do that at install time as opposed to boot time

This doesn't help on clouds. I'd rather have a unified way. In some ways it would be easier to just do all of disks then, but that's got it's own challenges and deserves it's own ticket.

Proposal:

Split ignition disks into render and disks
Detection could work a couple ways:
- Require that the partition containing root has partlabel=root. Cache to ram if we find any partition entries with "label": "root". We can't (reasonably) wait on udev for the device since if it's on raid or similar it won't become available until after ignition-disks. This generates false positives for cases like resizing root to be bigger.
- Require that the filesystem containing root has label=root. Cache to ram if we find any filesystem entries with "label": "root". I can't think of a reason to include a rootfs definition matching the default anyway.
ignition-[u]mount needs to learn to ignore filesystems with "path": "/" since the initramfs owns mounting those
Still do not support things like split /usr or /etc. This means ostree-redeploy-rootfs needs to happen before ignition mount.

Sequence would look like:

ignition-render
detect-root-change (needs to understand ignition spec but probably shouldn't be part of the ignition project)
ignition disks
rootfs mounted
ostree-redeploy-rootfs
ignition mount
ignition files
ignition umount
switch root

This still leaves open the question of "how do we find/mount root on subsequent boots".

An unrelated question is: do we support blowing away /boot? My gut says no. On x86_64 you can blow away /efi when running on bios (may need to mask the mount unit for it) or BIOS-BOOT when running on EFI. I don't think we can get rid of boot though. It's also partition 1 so it shouldn't get in the way of anything and it's not huge. We probably could find a way to allow moving it but I really don't think it's worth the complexity that would entail.

cgwalters · 2019-07-29T18:50:23Z

We can't do it if ostree size > available ram, where available ram will be slightly smaller than apparent because we need to hold it in ram while ignition disks runs

Yeah I had the same hesitation but then I realized two things. First, the current FCOS is 1.5G. I think if you're doing something like RAID we can presume you have a "serious" system and have at least say 8G of RAM.

Second, we could enable zram temporarily.

ajeddeloh · 2019-07-29T19:01:20Z

I realized I initially missed the bit about "what if we're just moving the rootfs and the old one will still be accessible". Do you think it's worth the complexity to implement that? If so I think we ought to do this in two phases, handling the generic case first then treating that as an optimization.

ajeddeloh · 2019-07-29T19:06:37Z

Second, we could enable zram temporarily.

Neat! I think we ought to handle that second and only enable it when we don't have enough ram to do it without.

cgwalters · 2019-07-29T20:41:30Z

We can't (reasonably) wait on udev for the device since if it's on raid or similar it won't become available until after ignition-disks. This generates false positives for cases like resizing root to be bigger.

I'm confused...doesn't this stuff imply Ignition running twice? We know the initial rootfs we're booting from isn't on RAID because our default disk image doesn't do that.

ajeddeloh · 2019-07-29T21:40:15Z

I'm confused...doesn't this stuff imply Ignition running twice? We know the initial rootfs we're booting from isn't on RAID because our default disk image doesn't do that.

I'm not sure why it would imply running Ignition twice? Unless you're referring to getting the config, in which case that's why I'm suggesting we split that into its own stage. I'm saying it's nontrivial to know with 100% certainty that the root is being destroyed since the block device has aliases that we would need to wait for. I suppose we could, but that makes the live pxe case harder since there isn't a device and some aliases aren't known before boot.

Consider if FCOS is installed with /dev/sda4 being root. That can be referenced by /dev/sda4, /dev/disk/by-label/root and /dev/disk/by-partlabel/root (and some others that are system dependent). I'm saying we can't detect if a user is blowing away the partition because:

We don't know what disk we're installed on (maybe we should add a udev rule for that? create something like /dev/disk/installed. Seems useful in general and could improve config portability), so if they're wiping the filesystem with device: sda4 we don't know if that is actually the rootfs or not since we might be installed on sdb instead.
Those symlinks may not exist yet so if the user is using them we don't know where they point. Things like /dev/disk/by-path/* are going to be dependent on the hardware so we really don't know if they're pointing to the rootfs. We can't wait for them either because while /dev/disk/by-label/root pointing to the rootfs is obvious, something like /dev/disk/by-path/<something> is not. We don't have a way to differentiate those from other symlinks that could appear as a result of ignition-disks. We don't know which ones are safe to wait on and which are not.

Because of that we should be detecting that we're creating a new rootfs instead of deleting the old one. We can either do that with partitions looking for a partlabel (which can have false positives when just resizing) or with the filesystem label (which can have false positives if they create an entry that matches the on-disk FS (see https://github.com/coreos/ignition/blob/master/doc/operator-notes.md#filesystem-reuse-semantics), though I'm not sure why you'd do that).

ajeddeloh · 2019-07-31T22:26:31Z

We discussed this at the FCOS IRC meeting today, mostly about how to handle the second boot. There's essentially five options:

Do what Container Linux does and require partitions containing complex devices (e.g luks volumes, parts of RAID, etc) use GPT type guids to specify what they are. The initramfs can then carry udev rules to start these devices
- Devices MUST be GPT partitioned to be used (can't do RAID on the device itself, only partitions)
- Can't carry extra configuration for things like mdadm, encryption
Do what mutable OS's do and regenerate the initramfs to include configuration to start complex devices
- Dracut's default modules can be sloppy (do things like start extra devices we don't need to, see rd.auto)
- Generating new initramfs's means creating new bootloader entries
- Doesn't require carrying a lot of new code in the initramfs
- Open question: would we generate this in the initramfs or in the real root?
- Makes debugging harder since we no longer know exactly what's in the initrd
Write a config file to /boot that maps out how to find the root device
- Requires a new tool to read that config and mount/start/decrypt things (systemd generator would be nice but we'd need to mount /boot which makes it harder and would slow down boot if unused).
- Config file would be easy to extend, can handle arbitrary complexity
- Isn't portable to distros that don't have a separate boot (though complex root is harder in general in that case)
Write kernel command line arguments that specify how to find the root device.
- Requires a new tool to read that config and mount/start/decrypt things. Could be a systemd generator
- kargs aren't great if the user is doing something complex (e.g. LUKS on RAID on RAID).
- kcmdline max length is 4k
- kargs are transparent, matches with "traditional" ways of specifying root devices.
- kargs are simple to read/implement
Do 3, but use an extra initrd instead of writing a file to /boot
- All the same points as 3, except:
- can write a systemd generator
- creating an initrd for that is harder especially if the user writes it by hand (see below)
- would need a default "noop" initrd that can be replaced so grub doesn't throw a fit when it can't find it. Not sure if ostree changes would be needed for that.
- initrd can just be a file ignition writes to /boot

In cases 2-5 there's also the question of "how do we specify what is needed to start the root disk?" Case 1 it's implied that you specify it with type guids. We could:

Try to infer from the ignition config.
- would require users use well known symlinks to devices (e.g. /dev/md/name instead of /dev/mdXXX)
- would be hard to debug if we inferred wrongly
- not very explicit what's going on
Carry extra fields in FCCL for "contains_root" (not married to that name) which FCCT (case 3, 4, 5) or dracut (case 2) uses to write out the config/kargs or create the new initramfs.
- more explicit, but not too much extra work for the end user
- Doesn't leave much room for things like specifying options for encryption, mdadm, etc
Require the user manually create the config (3), kargs (4), or config initrd (5) and use Ignition to write them.
- Nice since it's explicit. We're not doing anything the use doesn't expect
- Bad since a lot of what they write would be largely redundant.

ajeddeloh · 2019-08-20T20:55:59Z

Bit of data: looks like the kcmdline size is 2048 on all the arches we care about except s390x where it is 896.

cmurf · 2019-09-11T06:56:20Z

Carrying over from #18 since it's closed.

Re: dd and GPT: 512n and 512e can share one GPT, 4Kn needs a different GPT. I don't see how they can be merged without invalidating Partition Entry Array CRC32 which is computed on the content between the header and First Useable LBA. And First Useable LBA on 512 byte sector devices must be 34 or higher.

Re: mkfs.xfs uses a 4096 byte sector size on 512e and 4Kn, but uses 512 bytes on 512n.

Those two above combined suggests three raw images, unless something gives.

Also, when dd'd to a different sized drive, the GPT becomes invalid, so that should probably get fixed as a first step.

Each RAID layout should get its own mkfs.xfs for setting stripe width/unit

On 4Kn drives, minimum EFI system partition size is 256MiB if it's FAT32, which the UEFI spec does suggest it should be, sorta. But mkdosfs only starts using FAT32 above something near 550MiB, unless -F 32 is used. fedora-coreos-30.20190905.0-metal.raw has a FAT16 ESP. Obviously it works, and I haven't ever heard of FAT16 not working on anything.

Re: ZRAM dynamically allocates RAM, setting up swap on ZRAM at 1:1 to RAM has inconsequential overhead on a system that doesn't need it, but you could do a test similar to Anaconda's implementation and only start it for low memory devices. I'd like to see this upstream version working and rebase the myriad implementations on it.
https://github.com/systemd/zram-generator

cgwalters · 2019-09-25T14:10:36Z

I'm working on the fundamentals of this in coreos/ignition-dracut#107

First MVP is to support this fcct:

# Example of redeploying the rootfs as ext4
variant: fcos
version: 1.0.0
storage:
  filesystems:
    - device: /dev/disk/by-partlabel/root
      format: ext4
      wipe_filesystem: true
      label: root

Tracker for issues:

Get jlebon's SELinux/initramfs kernel patch into Fedora kernel
Don't hardcode rootflags=defaults,prjquota coreos-assembler#781
Preserve SELinux labels (probably involves writing the code in rpm-ostree)
Documentation of what's supported and what isn't

cgwalters · 2019-09-25T14:11:55Z

To reiterate on some discussion in that PR, this is to help enable openshift/enhancements#15 without requiring the root filesystem to come in a LUKS container by default (and relatedly, ensure that when a user chooses LUKS it can be encrypted from the start with a TPM2-sealed key, etc.)

darkmuggle · 2019-09-25T16:05:13Z

Sorry, I didn't mean to close. I have NO idea why "close and comment" is next to the comment button.

All this does is put the immutable bit on the target directory. The intention is to replace this bit to start: https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229 However, the real goal here is to add code in this file to handle redeploying the rootfs for Fedora CoreOS which combines OSTree+Ignition: coreos/fedora-coreos-tracker#94 Basically doing this in proper Rust is going to be a lot nicer than shell script in dracut modules. Among other details, coreutils `mv` doesn't seem to do the right thing for SELinux labels when policy isn't loaded.

Currently, the SELinux LSM prevents one from setting the `security.selinux` xattr on an inode without a policy first being loaded. However, this restriction is problematic: it makes it impossible to have newly created files with the correct label before actually loading the policy. This is relevant in distributions like Fedora, where the policy is loaded by systemd shortly after pivoting out of the initrd. In such instances, all files created prior to pivoting will be unlabeled. One then has to relabel them after pivoting, an operation which inherently races with other processes trying to access those same files. Going further, there are use cases for creating the entire root filesystem on first boot from the initrd (e.g. Container Linux supports this today[1], and we'd like to support it in Fedora CoreOS as well[2]). One can imagine doing this in two ways: at the block device level (e.g. laying down a disk image), or at the filesystem level. In the former, labeling can simply be part of the image. But even in the latter scenario, one still really wants to be able to set the right labels when populating the new filesystem. This patch enables this by changing behaviour in the following two ways: 1. allow `setxattr` if we're not initialized 2. don't try to set the in-core inode SID if we're not initialized; instead leave it as `LABEL_INVALID` so that revalidation may be attempted at a later time Note the first hunk of this patch is mostly the same as a previously discussed one[3], though it was part of a larger series which wasn't accepted. [1] https://coreos.com/os/docs/latest/root-filesystem-placement.html [2] coreos/fedora-coreos-tracker#94 [3] https://www.spinics.net/lists/linux-initramfs/msg04593.html Co-developed-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

Currently, the SELinux LSM prevents one from setting the `security.selinux` xattr on an inode without a policy first being loaded. However, this restriction is problematic: it makes it impossible to have newly created files with the correct label before actually loading the policy. This is relevant in distributions like Fedora, where the policy is loaded by systemd shortly after pivoting out of the initrd. In such instances, all files created prior to pivoting will be unlabeled. One then has to relabel them after pivoting, an operation which inherently races with other processes trying to access those same files. Going further, there are use cases for creating the entire root filesystem on first boot from the initrd (e.g. Container Linux supports this today[1], and we'd like to support it in Fedora CoreOS as well[2]). One can imagine doing this in two ways: at the block device level (e.g. laying down a disk image), or at the filesystem level. In the former, labeling can simply be part of the image. But even in the latter scenario, one still really wants to be able to set the right labels when populating the new filesystem. This patch enables this by changing behaviour in the following two ways: 1. allow `setxattr` if we're not initialized 2. don't try to set the in-core inode SID if we're not initialized; instead leave it as `LABEL_INVALID` so that revalidation may be attempted at a later time Note the first hunk of this patch is mostly the same as a previously discussed one[3], though it was part of a larger series which wasn't accepted. [1] https://coreos.com/os/docs/latest/root-filesystem-placement.html [2] coreos/fedora-coreos-tracker#94 [3] https://www.spinics.net/lists/linux-initramfs/msg04593.html Co-developed-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

Currently, the SELinux LSM prevents one from setting the `security.selinux` xattr on an inode without a policy first being loaded. However, this restriction is problematic: it makes it impossible to have newly created files with the correct label before actually loading the policy. This is relevant in distributions like Fedora, where the policy is loaded by systemd shortly after pivoting out of the initrd. In such instances, all files created prior to pivoting will be unlabeled. One then has to relabel them after pivoting, an operation which inherently races with other processes trying to access those same files. Going further, there are use cases for creating the entire root filesystem on first boot from the initrd (e.g. Container Linux supports this today[1], and we'd like to support it in Fedora CoreOS as well[2]). One can imagine doing this in two ways: at the block device level (e.g. laying down a disk image), or at the filesystem level. In the former, labeling can simply be part of the image. But even in the latter scenario, one still really wants to be able to set the right labels when populating the new filesystem. This patch enables this by changing behaviour in the following two ways: 1. allow `setxattr` if we're not initialized 2. don't try to set the in-core inode SID if we're not initialized; instead leave it as `LABEL_INVALID` so that revalidation may be attempted at a later time Note the first hunk of this patch is mostly the same as a previously discussed one[3], though it was part of a larger series which wasn't accepted. [1] https://coreos.com/os/docs/latest/root-filesystem-placement.html [2] coreos/fedora-coreos-tracker#94 [3] https://www.spinics.net/lists/linux-initramfs/msg04593.html Co-developed-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>

[ Upstream commit 3e3e24b42043eceb97ed834102c2d094dfd7aaa6 ] Currently, the SELinux LSM prevents one from setting the `security.selinux` xattr on an inode without a policy first being loaded. However, this restriction is problematic: it makes it impossible to have newly created files with the correct label before actually loading the policy. This is relevant in distributions like Fedora, where the policy is loaded by systemd shortly after pivoting out of the initrd. In such instances, all files created prior to pivoting will be unlabeled. One then has to relabel them after pivoting, an operation which inherently races with other processes trying to access those same files. Going further, there are use cases for creating the entire root filesystem on first boot from the initrd (e.g. Container Linux supports this today[1], and we'd like to support it in Fedora CoreOS as well[2]). One can imagine doing this in two ways: at the block device level (e.g. laying down a disk image), or at the filesystem level. In the former, labeling can simply be part of the image. But even in the latter scenario, one still really wants to be able to set the right labels when populating the new filesystem. This patch enables this by changing behaviour in the following two ways: 1. allow `setxattr` if we're not initialized 2. don't try to set the in-core inode SID if we're not initialized; instead leave it as `LABEL_INVALID` so that revalidation may be attempted at a later time Note the first hunk of this patch is mostly the same as a previously discussed one[3], though it was part of a larger series which wasn't accepted. [1] https://coreos.com/os/docs/latest/root-filesystem-placement.html [2] coreos/fedora-coreos-tracker#94 [3] https://www.spinics.net/lists/linux-initramfs/msg04593.html Co-developed-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Reinazhard <reinazhard@gmail.com>

Currently, the SELinux LSM prevents one from setting the `security.selinux` xattr on an inode without a policy first being loaded. However, this restriction is problematic: it makes it impossible to have newly created files with the correct label before actually loading the policy. This is relevant in distributions like Fedora, where the policy is loaded by systemd shortly after pivoting out of the initrd. In such instances, all files created prior to pivoting will be unlabeled. One then has to relabel them after pivoting, an operation which inherently races with other processes trying to access those same files. Going further, there are use cases for creating the entire root filesystem on first boot from the initrd (e.g. Container Linux supports this today[1], and we'd like to support it in Fedora CoreOS as well[2]). One can imagine doing this in two ways: at the block device level (e.g. laying down a disk image), or at the filesystem level. In the former, labeling can simply be part of the image. But even in the latter scenario, one still really wants to be able to set the right labels when populating the new filesystem. This patch enables this by changing behaviour in the following two ways: 1. allow `setxattr` if we're not initialized 2. don't try to set the in-core inode SID if we're not initialized; instead leave it as `LABEL_INVALID` so that revalidation may be attempted at a later time Note the first hunk of this patch is mostly the same as a previously discussed one[3], though it was part of a larger series which wasn't accepted. [1] https://coreos.com/os/docs/latest/root-filesystem-placement.html [2] coreos/fedora-coreos-tracker#94 [3] https://www.spinics.net/lists/linux-initramfs/msg04593.html Co-developed-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

Currently, the SELinux LSM prevents one from setting the `security.selinux` xattr on an inode without a policy first being loaded. However, this restriction is problematic: it makes it impossible to have newly created files with the correct label before actually loading the policy. This is relevant in distributions like Fedora, where the policy is loaded by systemd shortly after pivoting out of the initrd. In such instances, all files created prior to pivoting will be unlabeled. One then has to relabel them after pivoting, an operation which inherently races with other processes trying to access those same files. Going further, there are use cases for creating the entire root filesystem on first boot from the initrd (e.g. Container Linux supports this today[1], and we'd like to support it in Fedora CoreOS as well[2]). One can imagine doing this in two ways: at the block device level (e.g. laying down a disk image), or at the filesystem level. In the former, labeling can simply be part of the image. But even in the latter scenario, one still really wants to be able to set the right labels when populating the new filesystem. This patch enables this by changing behaviour in the following two ways: 1. allow `setxattr` if we're not initialized 2. don't try to set the in-core inode SID if we're not initialized; instead leave it as `LABEL_INVALID` so that revalidation may be attempted at a later time Note the first hunk of this patch is mostly the same as a previously discussed one[3], though it was part of a larger series which wasn't accepted. [1] https://coreos.com/os/docs/latest/root-filesystem-placement.html [2] coreos/fedora-coreos-tracker#94 [3] https://www.spinics.net/lists/linux-initramfs/msg04593.html Co-developed-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>

[ Upstream commit 3e3e24b42043eceb97ed834102c2d094dfd7aaa6 ] Currently, the SELinux LSM prevents one from setting the `security.selinux` xattr on an inode without a policy first being loaded. However, this restriction is problematic: it makes it impossible to have newly created files with the correct label before actually loading the policy. This is relevant in distributions like Fedora, where the policy is loaded by systemd shortly after pivoting out of the initrd. In such instances, all files created prior to pivoting will be unlabeled. One then has to relabel them after pivoting, an operation which inherently races with other processes trying to access those same files. Going further, there are use cases for creating the entire root filesystem on first boot from the initrd (e.g. Container Linux supports this today[1], and we'd like to support it in Fedora CoreOS as well[2]). One can imagine doing this in two ways: at the block device level (e.g. laying down a disk image), or at the filesystem level. In the former, labeling can simply be part of the image. But even in the latter scenario, one still really wants to be able to set the right labels when populating the new filesystem. This patch enables this by changing behaviour in the following two ways: 1. allow `setxattr` if we're not initialized 2. don't try to set the in-core inode SID if we're not initialized; instead leave it as `LABEL_INVALID` so that revalidation may be attempted at a later time Note the first hunk of this patch is mostly the same as a previously discussed one[3], though it was part of a larger series which wasn't accepted. [1] https://coreos.com/os/docs/latest/root-filesystem-placement.html [2] coreos/fedora-coreos-tracker#94 [3] https://www.spinics.net/lists/linux-initramfs/msg04593.html Co-developed-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

This patch does for `getxattr` what commit 3e3e24b42043 ("selinux: allow labeling before policy is loaded") did for `setxattr`; it allows querying the current SELinux label on disk before the policy is loaded. One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.[1] Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part. [1] coreos/fedora-coreos-tracker#94 Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Reinazhard <reinazhard@gmail.com> Signed-off-by: meydiwahendra <meydiwahendra@gmail.com>

[ Upstream commit 3e3e24b42043eceb97ed834102c2d094dfd7aaa6 ] Currently, the SELinux LSM prevents one from setting the `security.selinux` xattr on an inode without a policy first being loaded. However, this restriction is problematic: it makes it impossible to have newly created files with the correct label before actually loading the policy. This is relevant in distributions like Fedora, where the policy is loaded by systemd shortly after pivoting out of the initrd. In such instances, all files created prior to pivoting will be unlabeled. One then has to relabel them after pivoting, an operation which inherently races with other processes trying to access those same files. Going further, there are use cases for creating the entire root filesystem on first boot from the initrd (e.g. Container Linux supports this today[1], and we'd like to support it in Fedora CoreOS as well[2]). One can imagine doing this in two ways: at the block device level (e.g. laying down a disk image), or at the filesystem level. In the former, labeling can simply be part of the image. But even in the latter scenario, one still really wants to be able to set the right labels when populating the new filesystem. This patch enables this by changing behaviour in the following two ways: 1. allow `setxattr` if we're not initialized 2. don't try to set the in-core inode SID if we're not initialized; instead leave it as `LABEL_INVALID` so that revalidation may be attempted at a later time Note the first hunk of this patch is mostly the same as a previously discussed one[3], though it was part of a larger series which wasn't accepted. [1] https://coreos.com/os/docs/latest/root-filesystem-placement.html [2] coreos/fedora-coreos-tracker#94 [3] https://www.spinics.net/lists/linux-initramfs/msg04593.html Co-developed-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

ajeddeloh added the meeting topics for meetings label May 1, 2019

ajeddeloh removed the meeting topics for meetings label May 15, 2019

bgilbert mentioned this issue Jul 3, 2019

Provide options to create a RAID on ROOT disk coreos/coreos-installer#48

Closed

cgwalters changed the title ~~Root on raid and other complex devices.~~ Support for reconfiguring the root storage Jul 29, 2019

cgwalters mentioned this issue Jul 29, 2019

Partition Layout #18

Closed

ajeddeloh added meeting topics for meetings and removed meeting topics for meetings labels Jul 30, 2019

lucab added the kind/design label Aug 27, 2019

spaced mentioned this issue Sep 17, 2019

Configure composite device #275

Closed

cgwalters self-assigned this Sep 25, 2019

darkmuggle closed this as completed Sep 25, 2019

darkmuggle reopened this Sep 25, 2019

cgwalters mentioned this issue Sep 25, 2019

Add hidden coreos-rootfs seal command coreos/rpm-ostree#1911

Merged

ajeddeloh mentioned this issue Sep 25, 2019

Don't hardcode rootflags=defaults,prjquota coreos/coreos-assembler#781

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for reconfiguring the root storage #94

Support for reconfiguring the root storage #94

ajeddeloh commented Dec 12, 2018

dustymabe commented May 8, 2019

cgwalters commented Jul 29, 2019

ajeddeloh commented Jul 29, 2019

cgwalters commented Jul 29, 2019

ajeddeloh commented Jul 29, 2019

ajeddeloh commented Jul 29, 2019

cgwalters commented Jul 29, 2019

ajeddeloh commented Jul 29, 2019

ajeddeloh commented Jul 31, 2019 •

edited

ajeddeloh commented Aug 20, 2019

cmurf commented Sep 11, 2019

cgwalters commented Sep 25, 2019 •

edited

cgwalters commented Sep 25, 2019

darkmuggle commented Sep 25, 2019

Support for reconfiguring the root storage #94

Support for reconfiguring the root storage #94

Comments

ajeddeloh commented Dec 12, 2018

dustymabe commented May 8, 2019

cgwalters commented Jul 29, 2019

ajeddeloh commented Jul 29, 2019

cgwalters commented Jul 29, 2019

ajeddeloh commented Jul 29, 2019

ajeddeloh commented Jul 29, 2019

cgwalters commented Jul 29, 2019

ajeddeloh commented Jul 29, 2019

ajeddeloh commented Jul 31, 2019 • edited

ajeddeloh commented Aug 20, 2019

cmurf commented Sep 11, 2019

cgwalters commented Sep 25, 2019 • edited

cgwalters commented Sep 25, 2019

darkmuggle commented Sep 25, 2019

ajeddeloh commented Jul 31, 2019 •

edited

cgwalters commented Sep 25, 2019 •

edited